-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-873: PyTorch-Transformers update #941
Conversation
The following Transformer-based architectures are now supported via pytorch-transformers: - BertEmbeddings (Updated API) - OpenAIGPTEmbeddings (Updated API, various fixes) - OpenAIGPT2Embeddings (New) - TransformerXLEmbeddings (Updated API, tokenization fixes) - XLNetEmbeddings (New) - XLMEmbeddings (New) - RoBERTaEmbeddings (New, via torch.hub module) It also possible to use a scalar mix of specified layers from the Transformer-based models. Scalar mix is proposed by Liu et al. (2019). The scalar mix implementation is copied and slightly modified from the allennlp repo (Apache 2.0 license).
7202075
to
3bb7f00
Compare
try: | ||
self.model = torch.hub.load("pytorch/fairseq", model) | ||
except: | ||
log_line(log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_line needs to be imported, otherwise this fails.
from flair.training_utils import log_line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed :) I think I have to change my PyCharm default theme...
# method to avoid modifying the original state. | ||
state = self.__dict__.copy() | ||
# Remove the unpicklable entries. | ||
state["model"] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line does nothing, since "model" is part of "_modules". However, the saved model is still huge, which is strange because in __setstate__
the RoBERTa model is re-loaded from torch.hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanakbik I compared the model sizes for BERT and RoBERTa:
BERT (base): 429MB
RoBERTa (base): 487M
I will check the state["model"]
now.
However, there's an upcoming PR in the PyTorch-Transformers repo that adds RoBERTa 🔥 So in near future it won't be necessary to use the torch.hub
wrapper here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, in this case don't spend too much time on this. It works already, so no need to fix something that will be fixed upstream :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I'll just leave it as it is now and will update the RoBERTaEmbeddings
implementation whenever it is available in pytorch-transformers
!
…eeded when RoBERTa embeddings are used)
👍 |
1 similar comment
👍 |
Awesome - thank you @stefan-it!! |
Yes I love this!!!!! |
You are great! @stefan-it Thank you for your generosity!!! |
Hi,
this PR updates the old
pytorch-pretrained-BERT
library to the latest version ofpytorch-transformers
to support various new Transformer-based architectures for embeddings.A total of 7 (new/updated) embeddings can be used in Flair now:
Detailed benchmarks on the downsampled CoNLL-2003 NER dataset for English can be found in #873 . This PR is the first working attempt to include various new Transformer-based embeddings.
Unit tests can be executed with
pytest --runslow tests
. These unit tests for Transformer embeddings will take ~ 4 minutes using GPU.