New version of transformers #4018

dirkgr · 2020-04-04T00:24:11Z

No description provided.

matt-gardner

A couple of questions, but if you think things are ok, then go for it.

matt-gardner · 2020-04-04T00:31:37Z

allennlp/tests/data/tokenizers/pretrained_transformer_tokenizer_test.py

@@ -258,7 +269,7 @@ def test_token_idx_sentence_pairs(self):
            ".",
            "</s>",
            "</s>",
-            "It",
+            "ĠIt",


Do you know why it's adding spaces to the first token? Is that expected?

I think it's a change in defaults.

At first, people thought that Ġ means that this is the beginning of the word (opposite of ## in BERT). So then people were complaining about the "bug" where the Ġ wasn't there at the beginning of the first token, and they made a fix. But now it means "space", so not having it makes sense ... I want to not get involved and just stick with the huggingface default.

matt-gardner · 2020-04-04T00:31:48Z

setup.py

@@ -119,7 +119,7 @@
        "flaky",
        "responses>=0.7",
        "conllu==2.3.2",
-        "transformers>=2.4.0,<2.5.0",
+        "transformers>=2.6.0",


Are you sure you don't want to keep an upper bound?

Probably a good idea. Added in 3264dcc.

New version of transformers

09ec635

dirkgr requested a review from matt-gardner April 4, 2020 00:24

dirkgr mentioned this pull request Apr 4, 2020

Update transformers requirement from <2.5.0,>=2.4.0 to >=2.4.0,<2.8.0 #4003

Closed

matt-gardner approved these changes Apr 4, 2020

View reviewed changes

dirkgr added 2 commits April 3, 2020 17:40

Specify an upper bound for the transformers

3264dcc

Merge remote-tracking branch 'origin/master' into TransformersUpdate2

7b2af5d

dirkgr merged commit 49c17a3 into master Apr 4, 2020

dirkgr deleted the TransformersUpdate2 branch April 4, 2020 01:23

MaksymDel mentioned this pull request Apr 4, 2020

Pretrained vocabulary from transformers is sometimes not saved in our Vocabulary object #3456

Closed

matt-gardner mentioned this pull request Apr 5, 2020

Can PretrainedTransformerTokenizer track character offset like WordTokenizer？ #3458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New version of transformers #4018

New version of transformers #4018

dirkgr commented Apr 4, 2020

matt-gardner left a comment

matt-gardner Apr 4, 2020

dirkgr Apr 4, 2020

matt-gardner Apr 4, 2020

dirkgr Apr 4, 2020

@@ @@ -258,7 +269,7 @@ def test_token_idx_sentence_pairs(self): @@
                           ".",
                           "</s>",
                           "</s>",
-                          "It",
+                          "ĠIt",

New version of transformers #4018

New version of transformers #4018

Conversation

dirkgr commented Apr 4, 2020

matt-gardner left a comment

Choose a reason for hiding this comment

matt-gardner Apr 4, 2020

Choose a reason for hiding this comment

dirkgr Apr 4, 2020

Choose a reason for hiding this comment

matt-gardner Apr 4, 2020

Choose a reason for hiding this comment

dirkgr Apr 4, 2020

Choose a reason for hiding this comment