-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions, but if you think things are ok, then go for it.
@@ -258,7 +269,7 @@ def test_token_idx_sentence_pairs(self): | |||
".", | |||
"</s>", | |||
"</s>", | |||
"It", | |||
"ĠIt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why it's adding spaces to the first token? Is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a change in defaults.
At first, people thought that Ġ
means that this is the beginning of the word (opposite of ##
in BERT). So then people were complaining about the "bug" where the Ġ
wasn't there at the beginning of the first token, and they made a fix. But now it means "space", so not having it makes sense ... I want to not get involved and just stick with the huggingface default.
setup.py
Outdated
@@ -119,7 +119,7 @@ | |||
"flaky", | |||
"responses>=0.7", | |||
"conllu==2.3.2", | |||
"transformers>=2.4.0,<2.5.0", | |||
"transformers>=2.6.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you don't want to keep an upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea. Added in 3264dcc.
No description provided.