HSK东西
  • Home
  • News
  • Resources
    • Graphs
    • Word Lists
    • Shanka
    • Analyse
    • Dictionaries
  • Contribute
  • Contact

Chinese Vocabulary and Text Analysis

Paste some text or your Skritter word lists into the edit box below, and click to find out how many words and characters you know.
  • Outputs a whole load of statistics about the words and characters, and can suggest high frequency words and characters that should be learned.
  • Analyses the HSK level of the words and characters. All operations are performed using Python arrays, sets, and dictionaries, which are very fast.
  • Only supports simplified characters (you could use a converter to convert to and from traditional before and after using this tool).  May support traditional at some time in the future; let me know if you know of a similar word/character frequency database for traditional characters.
  • Uses the SUBTLEX-CH word frequency data to order words, and to determine if a word/character exists or not. This frequency list has the advantage that it is very up to date, but it's definition of what is and isn't a word is sometimes a bit strange. Let me know if you know of any better lists to use!.
  • Let me know at alan@hskhsk.com if you manage to break it, or if it gives any strange results.

Analyse Your 汉字 Vocabulary/Text

Actions

Analyse words/characters in input
Analyse HSK words in input
Analyse HSK characters in input
Suggest HSK words not in input
Suggest HSK characters not in input
Suggest words not input
Suggest words using characters in input
Suggest characters not in input
Annotate words with HSK levels and frequency
Annotate characters with HSK levels and frequency

Vocabulary/Text Input Options

One word/character per line (anything after first whitespace ignored, use this for Skritter word lists)
Big block of text (use this if pasting from a web page etc.)

Hanzi List Output Options

Add SUBTLEX-CH frequency index (1 for highest frequency word/character, higher values are less frequent)
Add SUBTLEX-CH raw word/character frequency (higher values are more frequent)

Input your simpflified Chinese vocabulary or text here

To resolve ambiguous words, place a | character (vertical bar) between words.


You may copy, modify, and distribute any works on this site for non-commercial purposes if you credit me as the original author. Please don't use anything on this site for commercial purposes without obtaining my permission.
Alan Davies, alan@hskhsk.com 2013-.