-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ernie-M Model to huggingface #21349
Conversation
Great work @susnato ! Looking forward to reviewing your PR :) |
Hi @younesbelkada the official paddlenlp implementation of ErnieM does not have any LM head class, since it was neither trained on Causal nor on Masked LM. It was pretrained on both Cross Attention Masked LM and Back Translation Masked LM (both implementations are missing in paddlenlp). Do I need to add MaskedLM in this huggingface implementation since it's a encoder based model or should I bypass it and don't include any LM head like the paddlenlp implementation did? |
Hi @susnato ! |
@younesbelkada Ok, then I will not add any LMhead for now, and also the rest of the model is ready(with all tests passed), I am currently looking why circleci tests are failing. |
Thanks! |
Hi @younesbelkada I added that and did bunch of others changes with make repo-consistency, make style, but when I run make fixup still it says this error The values I set are - |
The documentation is not available anymore as the PR was closed or merged. |
Hi @younesbelkada all checks are successful! the PR is ready to review, please review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the great addition!
I left a couple of comments, mostly nits, my main comments being to avoid hard-coded key-word arguments ( return_dict
is hardcoded sometimes), try to put type hints as much as you can !
It is very nice that the model supports various training strategies, better to add simple tests testing these!
Also does ErnieMSelfOutput
copies the structure from another module? If this is the case, better to use # Copied from
statement
Great efforts on the integration side, we should be really close merging this!
README_ja.md
Outdated
@@ -349,6 +349,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ | |||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University から) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning から公開された研究論文: [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) | |||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research から) Sascha Rothe, Shashi Narayan, Aliaksei Severyn から公開された研究論文: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu から) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu から公開された研究論文: [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) | |||
1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that the translation was not successful, can you maybe try to rebase with main
branch and run make fix-copies
?
README_ko.md
Outdated
@@ -264,6 +264,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 | |||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다. | |||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다. | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. | |||
1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above here
docs/source/en/model_doc/ernie_m.mdx
Outdated
@@ -0,0 +1,106 @@ | |||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | |
<!--Copyright 2023 The HuggingFace and Baidu Team. All rights reserved. |
Same comment for all headers that are applicable
# There's no way to ignore "F401 '...' imported but unused" warnings in this | ||
# module, but to preserve other warnings. So, don't check this module at all. | ||
|
||
# Copyright 2020 The HuggingFace Team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2020 The HuggingFace Team. All rights reserved. | |
# Copyright 2023 The HuggingFace and Baidu Team. All rights reserved. |
This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE | ||
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE | |
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
This is the configuration class to store the configuration of a [*ErnieMModel*]. It is used to instantiate a ERNIE-M | |
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the |
self.vocab = self.load_vocab(filepath=vocab_file) | ||
self.reverse_vocab = dict((v, k) for k, v in self.vocab.items()) | ||
|
||
assert len(self.vocab) == len(self.reverse_vocab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using assert
please, test the condition and raise a relevant error instead ;)
input_ids (Tensor): | ||
See [`ErnieMModel`]. | ||
attention_mask (Tensor, optional): | ||
See [`ErnieMModel`]. | ||
position_ids (Tensor, optional): | ||
See [`ErnieMModel`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add more description here and follow the transformers
convention? You can check some examples on the other modeling files
the pooled output and a softmax) e.g. for RocStories/SWAG tasks.""", | ||
ERNIE_M_START_DOCSTRING, | ||
) | ||
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py | |
# Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice with Bert->ErnieM |
layers on top of the hidden-states output to compute `span start logits` and `span end logits`).""", | ||
ERNIE_M_START_DOCSTRING, | ||
) | ||
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py | |
# Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering with Bert->ErnieM |
the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.""", | ||
ERNIE_M_START_DOCSTRING, | ||
) | ||
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py | |
# Copied from transformers.bert.modeling_bert.BertForTokenClassification with Bert->ErnieM |
Hi, @younesbelkada I made all the changes that you requested, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing most of my comments! I left few final comments, mostly nits for better readability & to have an implementation that is close enough to other HF models (especially about how you deal with return_dict
etc --> let's remove them from the config)
Looking forward to merging this PR!
README_ko.md
Outdated
@@ -263,7 +263,8 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 | |||
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. | |||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다. | |||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다. | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. | |
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. |
This modification should not be here
README_zh-hans.md
Outdated
@@ -287,7 +287,8 @@ conda install -c huggingface transformers | |||
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (来自 Snap Research) 伴随论文 [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) 由 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 发布。 | |||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。 | |||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。 | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。 | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。 | |
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。 |
README_zh-hant.md
Outdated
@@ -299,7 +299,8 @@ conda install -c huggingface transformers | |||
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. | |||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. | |||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. | |||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. | |
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. |
docs/source/en/model_doc/ernie_m.mdx
Outdated
*Recent studies have demonstrated that pre- | ||
trained cross-lingual models achieve impres- | ||
sive performance in downstream cross-lingual | ||
tasks. This improvement benefits from learn- | ||
ing a large amount of monolingual and par- | ||
allel corpora. Although it is generally ac- | ||
knowledged that parallel corpora are critical | ||
for improving the model performance, ex- | ||
isting methods are often constrained by the | ||
size of parallel corpora, especially for low- | ||
resource languages. In this paper, we pro- | ||
pose ERNIE-M, a new training method that | ||
encourages the model to align the representa- | ||
tion of multiple languages with monolingual | ||
corpora, to overcome the constraint that the | ||
parallel corpus size places on the model per- | ||
formance. Our key insight is to integrate | ||
back-translation into the pre-training process. | ||
We generate pseudo-parallel sentence pairs on | ||
a monolingual corpus to enable the learning | ||
of semantic alignments between different lan- | ||
guages, thereby enhancing the semantic mod- | ||
eling of cross-lingual models. Experimental | ||
results show that ERNIE-M outperforms ex- | ||
isting cross-lingual models and delivers new | ||
state-of-the-art results in various cross-lingual | ||
downstream tasks.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Recent studies have demonstrated that pre- | |
trained cross-lingual models achieve impres- | |
sive performance in downstream cross-lingual | |
tasks. This improvement benefits from learn- | |
ing a large amount of monolingual and par- | |
allel corpora. Although it is generally ac- | |
knowledged that parallel corpora are critical | |
for improving the model performance, ex- | |
isting methods are often constrained by the | |
size of parallel corpora, especially for low- | |
resource languages. In this paper, we pro- | |
pose ERNIE-M, a new training method that | |
encourages the model to align the representa- | |
tion of multiple languages with monolingual | |
corpora, to overcome the constraint that the | |
parallel corpus size places on the model per- | |
formance. Our key insight is to integrate | |
back-translation into the pre-training process. | |
We generate pseudo-parallel sentence pairs on | |
a monolingual corpus to enable the learning | |
of semantic alignments between different lan- | |
guages, thereby enhancing the semantic mod- | |
eling of cross-lingual models. Experimental | |
results show that ERNIE-M outperforms ex- | |
isting cross-lingual models and delivers new | |
state-of-the-art results in various cross-lingual | |
downstream tasks.* | |
*Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. | |
This improvement benefits from learning a large amount of monolingual and par- | |
allel corpora. Although it is generally acknowledged that parallel corpora are critical | |
for improving the model performance, ex- | |
isting methods are often constrained by the | |
size of parallel corpora, especially for low- | |
resource languages. In this paper, we pro- | |
pose ERNIE-M, a new training method that | |
encourages the model to align the representa- | |
tion of multiple languages with monolingual | |
corpora, to overcome the constraint that the | |
parallel corpus size places on the model per- | |
formance. Our key insight is to integrate | |
back-translation into the pre-training process. | |
We generate pseudo-parallel sentence pairs on | |
a monolingual corpus to enable the learning | |
of semantic alignments between different lan- | |
guages, thereby enhancing the semantic mod- | |
eling of cross-lingual models. Experimental | |
results show that ERNIE-M outperforms ex- | |
isting cross-lingual models and delivers new | |
state-of-the-art results in various cross-lingual | |
downstream tasks.* |
etc. the additional dashes should not be there ;)
defaults will yield a similar configuration to that of the ERNIE ernie-3.0-medium-zh architecture. Configuration | ||
objects inherit from [*PretrainedConfig*] and can be used to control the model outputs. Read the documentation from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To adapt with the correct checkpoint name! (i.e. replace ERNIE ernie-3.0-medium-zh
with the expected one)
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions | ||
output_hidden_states = ( | ||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states | ||
) | ||
return_dict = return_dict if return_dict is not None else self.config.return_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions | |
output_hidden_states = ( | |
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states | |
) | |
return_dict = return_dict if return_dict is not None else self.config.return_dict |
I don't think these lines are needed, usually we just retrieve these values from the arguments of the forward pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younesbelkada I am sorry but I think we still need those lines since there are some tests where the code tries to include return_dict
, output_hidden_states
and output_attentions
in config and then tries to check the results.
I checked and thats true without those lines I am getting error(where the test is requiring a dict as output but since return_dict is in config(not as an argument) it's giving an error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions |
I think this is fine we can leave it as it is
# Copied from transformers.models.bert.modeling_bert.BertSelfAttention with Bert->ErnieM,self.value->self.v_proj,self.key->self.k_proj,self.query->self.q_proj | ||
class ErnieMSelfAttention(nn.Module): | ||
def __init__(self, config, position_embedding_type=None): | ||
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe just move the entire class above, for example after the definition of ErnieMPooler
utils/check_repo.py
Outdated
"DetaEncoder", # Building part of bigger (tested) model. | ||
"DetaDecoder", # Building part of bigger (tested) model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DetaEncoder", # Building part of bigger (tested) model. | |
"DetaDecoder", # Building part of bigger (tested) model. |
these are duplicates probably
d318860
to
c3f9391
Compare
Hi, @younesbelkada I made all the changes as you requested. The tests are now all successful! Please check it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for iterating quickly! LGTM with only few nits!
Leaving it now to @sgugger and/or @ArthurZucker for final approvals ;)
__all__ = ["ERNIE_M_PRETRAINED_INIT_CONFIGURATION", "ErnieMConfig", "ERNIE_M_PRETRAINED_RESOURCE_FILES_MAP"] | ||
|
||
|
||
ERNIE_M_PRETRAINED_INIT_CONFIGURATION = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we don't need them(we only need ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP
), I will also remove __all__
.
hidden_states = residual + self.dropout1(hidden_states) | ||
hidden_states = self.norm1(hidden_states) | ||
residual = hidden_states | ||
hidden_states = self.linear2(self.dropout(self.activation(self.linear1(hidden_states)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can break down this in several lines
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions | ||
output_hidden_states = ( | ||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states | ||
) | ||
return_dict = return_dict if return_dict is not None else self.config.return_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions |
I think this is fine we can leave it as it is
elif return_dict: | ||
sequence_output = encoder_outputs["last_hidden_state"] | ||
pooler_output = self.pooler(sequence_output) if self.pooler is not None else None | ||
hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"] | ||
attentions = None if not output_attentions else encoder_outputs["attentions"] | ||
|
||
return BaseModelOutputWithPoolingAndCrossAttentions( | ||
last_hidden_state=sequence_output, | ||
pooler_output=pooler_output, | ||
hidden_states=hidden_states, | ||
attentions=attentions, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the elif
condition is not needed
Hi @younesbelkada, I made all those changes that you requested. |
Thanks a lot @susnato ! Again great work on the integration so far! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR and for adding this model! Make sure to follow our documentation style guide for the docstrings and there are a few things to fix with the tokenizer.
docs/source/en/model_doc/ernie_m.mdx
Outdated
|
||
Tips: | ||
|
||
1. Ernie-M is BERT-like model so it is stacked Transformer Encoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Ernie-M is BERT-like model so it is stacked Transformer Encoder. | |
1. Ernie-M is a BERT-like model so it is a stacked Transformer Encoder. |
docs/source/en/model_doc/ernie_m.mdx
Outdated
Tips: | ||
|
||
1. Ernie-M is BERT-like model so it is stacked Transformer Encoder. | ||
2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling` | |
2. Instead of using MaskedLM for pretraining (like BERT) the authors used two novel techniques: `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling` |
src/transformers/__init__.py
Outdated
@@ -947,6 +948,7 @@ | |||
_import_structure["modeling_utils"] = ["PreTrainedModel"] | |||
|
|||
# PyTorch models structure | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/transformers/__init__.py
Outdated
"ErnieMForTokenClassification", | ||
"ErnieMModel", | ||
"ErnieMPreTrainedModel", | ||
"ErnieMUIEM", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this ends up being public, it needs a better name. What does UIEM stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UIEM stands for Universal Information Extraction Model
(it was implemented in the original paddlenlp implementation of Ernie-M) here
Should I change it to full name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ErnieMForInformationExtraction
then. It will be more understandable to a user than UIEM
src/transformers/__init__.py
Outdated
|
||
# PyTorch model imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to stay here, please revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not addressed.
token_ids_0 (List[int]): | ||
List of IDs to which the special tokens will be added. | ||
token_ids_1 (List[int], optional): | ||
Optional second list of IDs for sequence pairs. Defaults to *None*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above.
offset_mapping_ids_0 (List[tuple]): | ||
List of char offsets to which the special tokens will be added. | ||
offset_mapping_ids_1 (List[tuple], optional): | ||
Optional second list of wordpiece offsets for offset mapping pairs. Defaults to *None*. | ||
Returns: | ||
List[tuple]: List of wordpiece offsets with the appropriate offsets of special tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above.
@@ -803,6 +807,7 @@ | |||
("distilbert", "DistilBertForMultipleChoice"), | |||
("electra", "ElectraForMultipleChoice"), | |||
("ernie", "ErnieForMultipleChoice"), | |||
("ernie_m", "ErnieMForMultipleChoice"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenization auto file should also be updated with the new tokenizer.
src/transformers/__init__.py
Outdated
@@ -260,6 +260,7 @@ | |||
"ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP", | |||
"ErnieConfig", | |||
], | |||
"models.ernie_m": ["ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP", "ErnieMConfig", "ErnieMTokenizer"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer requires sentencepice, so its import should be protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I understood your previous comment - The tokenization auto file should also be updated with the new tokenizer.
but didn't get this one(The tokenizer requires sentencepice, so its import should be protected.
) could you please elaborate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer should only be in the if is_sentencepiece_available
part of the init.
src/transformers/__init__.py
Outdated
@@ -3731,6 +3745,7 @@ | |||
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer | |||
from .models.encoder_decoder import EncoderDecoderConfig | |||
from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig | |||
from .models.ernie_m import ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieMConfig, ErnieMTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
1121d45
to
5b8e8b6
Compare
Hi @sgugger I made those changes as you requested and the tests are passed too, please review them. |
Hi @ArthurZucker I pushed the changes please check! |
Okay! LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all your work on this!
This reverts commit 0c9c847.
What does this PR do?
Ports Ernie-M from paddle to huggingface(pytorch) and also Fixes #21123
I have uploaded the pytorch converted weights here and here. The paddle2pytorch weights conversion script has been provided there too.
Work done till now -
ported the weights.
Added
configuration_ernie_m.py
from transformers import AutoConfig
config = AutoConfig.from_pretrained("susnato/ernie-m-base_pytorch")
)Added
tokenization_ernie_m.py
(Only Slow Tokenizer implemented)from transformers import ErnieMTokenizer
tokenizer = ErnieMTokenizer.from_pretrained("susnato/ernie-m-base_pytorch")
ErnieMModel in now working.
from transformers import AutoModel
model = AutoModel.from_pretrained("susnato/ernie-m-base_pytorch") # susnato/ernie-m-large_pytorch
Before submitting
Pull Request section?
to it if that's the case. link here
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker and @younesbelkada