You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have a CSV training/test files that i use CSVClassificationCorpus to load and then train etc. The evaluate that runs after training works fine. Then i manually load the CSV file and for each line, i call Sentence(...) and then pass it to predict function. This time the results are arbitrary and poor.
I looked at it a bit, and it turned out that by default Sentence uses SpaceTokenizer (if no use_tokenizer parameter) is passed.
OTOH, CSVClassificationCorpus uses SegtokTokenizer by default ...
Leading to completely different results in the default case of not specifying these parameters.
So i fixed it by passing use_tokenize=SegtokTokenizer to my Sentence call before invoking predict
Quite counter-intutitive .. not necessarily a bug but posting in case some one else runs into same issue
The text was updated successfully, but these errors were encountered:
Describe the bug
I have a CSV training/test files that i use CSVClassificationCorpus to load and then train etc. The evaluate that runs after training works fine. Then i manually load the CSV file and for each line, i call Sentence(...) and then pass it to predict function. This time the results are arbitrary and poor.
I looked at it a bit, and it turned out that by default Sentence uses SpaceTokenizer (if no use_tokenizer parameter) is passed.
OTOH, CSVClassificationCorpus uses SegtokTokenizer by default ...
Leading to completely different results in the default case of not specifying these parameters.
So i fixed it by passing use_tokenize=SegtokTokenizer to my Sentence call before invoking predict
Quite counter-intutitive .. not necessarily a bug but posting in case some one else runs into same issue
The text was updated successfully, but these errors were encountered: