-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Support] Is there an english version of the docs #274
Comments
There is no English documentation available. |
Could you explain to me how to use the different segmentation modes? func main() {
t, err := tokenizer.New(ipa.Dict(), tokenizer.OmitBosEos())
if err != nil {
panic(err)
}
// wakati
fmt.Println("---wakati---")
seg := t.Wakati("すもももももももものうち")
fmt.Println(seg)
// tokenize
fmt.Println("---tokenize---")
tokens := t.Tokenize("すもももももももものうち")
for _, token := range tokens {
features := strings.Join(token.Features(), ",")
fmt.Printf("%s\t%v\n", token.Surface, features)
}
} Could you also tell me what are the pros/cons of using the different dictionaries? Thank you very much! |
Ok, figured the segmentation modes out myself. |
As you may know, most Asian texts are not word-separated. The word "
The The
In order to do the The difference between dictionaries is simply the number of words. The default built-in dictionary supports most of the important proper names, nouns, verbs, etc. The "pros" of using different dictionaries is, therefore, that they can separate words more accurately. Imagine the following.
And the "cons" would be memory usage and slowness. I hope this helps. 🤞 |
@KEINOS Thank you very much! |
@CaptainDario Indeed. There is nothing better than better documentation! @ikawaha, if the above explanation is ok, I would like to PR somewhere, where should I write? In the Wiki, maybe? |
Thank you for this great project!
I really like this project and would like to understand its capabilities better. Therefore I am wondering if there is an English version available of the docs?
The text was updated successfully, but these errors were encountered: