Add supports for additional Japanese tokenizers. #1786
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related: #1296
Hello all! I'm the author of #1267
Recently, I updated konoha adding supports for new tokenizers.
In this PR, I add supports for two Japanese tokenizers to flair.tokenization.JapaneseTokenizer; Janome and SudachiPy.
These tokenizers work without building any external software outside
pip install
.So I'm wondering if I can add support for built-in Japanese tokenization to flair.
(Of course, it's not a strong opinion. I'd like feedback from the flair team!)
I attach examples for using JapaneseTokenizer in some cases:
Case1. New available tokenizers for Japanese!: Janome and SudachiPy
Case2. If it doesn't install konoha, the library for Japanese tokenizer. (almost the same as current message)
Case3. If users specify a tokenizer which is not supported.
Thanks as always for maintaining flair,