-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contemplate how to handle words with non-Chinese characters #42
Comments
Could try a rule: |
@james-clark-5 A valid idea. But we will have to consider where in the code this is done, as I have a suspicion we may extract non-Chinese characters earlier in the process. Also, there exist words that start with non-Chinese characters |
What examples do we have? Should help get the ball rolling. |
That's the example that came to mind too, though if you look at the
beginning of the file, you will find more examples. I think the first step
would be to find out if there are any words that *end* with non Hanzi
characters, as this might make the implementation a little (though not
much) more complicated.
…On 22 September 2017 at 04:16, IdiosApps ***@***.***> wrote:
What examples do we have? Should help get the ball rolling.
AA制 (to go 50/50)
.......
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOmv5AFAcwBBAMiCPmqlYkZEyYDul118ks5skyargaJpZM4OBl4j>
.
|
I guess we'll be able to find about potential words ending with non-hanzi characters by reusing the recursive "try-all-character-combinations" code. Would also bring up words starting with non-hanzi characters. |
Finding them would be very easy, something like:
for Word word in (the set of words){
for(char c in word.toCharArray()){
if (c is not Unicode.HAN){
System.out.println(word.getSimplifiedChinese);
continue;
}
}
}
…On 22 September 2017 at 10:18, IdiosApps ***@***.***> wrote:
I guess we'll be able to find about potential words ending with non-hanzi
characters by reusing the recursive "try-all-character-combinations"
code. Would also bring up words starting with non-hanzi characters.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOmv5Ly3HkHC2otRLwUrafDGc86EOIXJks5sk3t9gaJpZM4OBl4j>
.
|
less cedict_ts.u8 | cut -d ' ' -f 2 |grep '[A-Z]|[A-Z]' |
I'm curious what this would pull up: |
I'm curious what this would pull up: |
and bear in mind there are entries with other characters too, such as this: |
No description provided.
The text was updated successfully, but these errors were encountered: