Chinese Vocabulary and Text Analysis
Paste some text or your Skritter word lists into the edit box below, and click to find out how many words and characters you know.
- Outputs a whole load of statistics about the words and characters, and can suggest high frequency words and characters that should be learned.
- Analyses the HSK level of the words and characters. All operations are performed using Python arrays, sets, and dictionaries, which are very fast.
- Only supports simplified characters (you could use a converter to convert to and from traditional before and after using this tool). May support traditional at some time in the future; let me know if you know of a similar word/character frequency database for traditional characters.
- Uses the SUBTLEX-CH word frequency data to order words, and to determine if a word/character exists or not. This frequency list has the advantage that it is very up to date, but it's definition of what is and isn't a word is sometimes a bit strange. Let me know if you know of any better lists to use!.
- Let me know at alan@hskhsk.com if you manage to break it, or if it gives any strange results.