Word Count is not correct for Chinese Characters


#1

hi,

Word Count numbers are not correct when FoldingText is dealing with Chinese Characters. It seems that counting number is determined by space.

For English, a word is separated by space, not for Chinese,


#2

Can you post some example Chinese text and the correct word count? I think I read somewhere that for Chinese character count is better?


#3

3 English words, separated by 2 spaces.

Actually, it’s 4 Chinese Words, not 1


#4

Thanks. I’ve been reading more about word count… gets complex quickly when you add in other languages. It’s on my list now, but I don’t have a fix yet.


#5

中文数字
Actually, it’s 4 Chinese Words, not 1

Or, of course, two … (中文 + 数字)

The difficulty here is the distinction between ‘word counts’ (‘word’ boundaries are not marked in Chinese script) and ‘character’ counts.

A character count (汉字 rather than 单词) for Chinese could be written as a plugin, or the existing code could report character counts (better for CJK) as well as word counts (better for Roman scripts with space-delimited ‘words’).

(As long as you don’t mind 葡萄 being counted as 2 rather than 1 :-)
(It’s two glyphs, but one morpheme, and one word …)


#6

The latest dev release makes some changes to word count that I “think” should fix this issue.