Add Ernie-M Model to huggingface #21349

susnato · 2023-01-28T18:42:25Z

What does this PR do?

Ports Ernie-M from paddle to huggingface(pytorch) and also Fixes #21123
I have uploaded the pytorch converted weights here and here. The paddle2pytorch weights conversion script has been provided there too.

Work done till now -

ported the weights.
Added configuration_ernie_m.py
from transformers import AutoConfig
config = AutoConfig.from_pretrained("susnato/ernie-m-base_pytorch"))
Added tokenization_ernie_m.py (Only Slow Tokenizer implemented)
from transformers import ErnieMTokenizer
tokenizer = ErnieMTokenizer.from_pretrained("susnato/ernie-m-base_pytorch")
ErnieMModel in now working.
from transformers import AutoModel
model = AutoModel.from_pretrained("susnato/ernie-m-base_pytorch") # susnato/ernie-m-large_pytorch

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. link here
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker and @younesbelkada

younesbelkada · 2023-01-28T19:07:12Z

Great work @susnato ! Looking forward to reviewing your PR :)
Let us know when you think the PR is ready

susnato · 2023-02-02T22:16:14Z

Hi @younesbelkada the official paddlenlp implementation of ErnieM does not have any LM head class, since it was neither trained on Causal nor on Masked LM. It was pretrained on both Cross Attention Masked LM and Back Translation Masked LM (both implementations are missing in paddlenlp). Do I need to add MaskedLM in this huggingface implementation since it's a encoder based model or should I bypass it and don't include any LM head like the paddlenlp implementation did?

younesbelkada · 2023-02-03T09:30:05Z

Hi @susnato !
Thanks for your your message
I think this quite depends on the use case of your model. I'd expect most of the users will rely on ErnieMModel since it's the model that is present at paddlepaddle. If there is an interest to add these models in the future we can always open follow-up PRs

susnato · 2023-02-03T09:38:51Z

Hi @susnato ! Thanks for your your message I think this quite depends on the use case of your model. I'd expect most of the users will rely on ErnieMModel since it's the model that is present at paddlepaddle. If there is an interest to add these models in the future we can always open follow-up PRs

@younesbelkada Ok, then I will not add any LMhead for now, and also the rest of the model is ready(with all tests passed), I am currently looking why circleci tests are failing.

younesbelkada · 2023-02-03T09:46:14Z

Thanks!
Currently some tests are not passing because you need to define a ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP insideconfiguration_ernie_m.py, check here how it is done for bert: https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/configuration_bert.py

susnato · 2023-02-03T10:42:38Z

Hi @younesbelkada I added that and did bunch of others changes with make repo-consistency, make style, but when I run make fixup still it says this error
python utils/check_config_docstrings.py Traceback (most recent call last): File "/home/susnato/temp_files/transformers/utils/check_config_docstrings.py", line 89, in <module> check_config_docstrings_have_checkpoints() File "/home/susnato/temp_files/transformers/utils/check_config_docstrings.py", line 85, in check_config_docstrings_have_checkpoints raise ValueError(f"The following configurations don't contain any valid checkpoint:\n{message}") ValueError: The following configurations don't contain any valid checkpoint: ErnieMConfig

The values I set are - ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP = { "ernie-m-base_pytorch": "https://huggingface.co/susnato/ernie-m-base_pytorch/blob/main/config.json", "ernie-m-large_pytorch": "https://huggingface.co/susnato/ernie-m-large_pytorch/blob/main/config.json", }

HuggingFaceDocBuilderDev · 2023-02-03T14:35:22Z

The documentation is not available anymore as the PR was closed or merged.

susnato · 2023-02-03T15:00:31Z

Hi @younesbelkada all checks are successful! the PR is ready to review, please review it.

younesbelkada

Thanks a lot for the great addition!
I left a couple of comments, mostly nits, my main comments being to avoid hard-coded key-word arguments ( return_dict is hardcoded sometimes), try to put type hints as much as you can !
It is very nice that the model supports various training strategies, better to add simple tests testing these!
Also does ErnieMSelfOutput copies the structure from another module? If this is the case, better to use # Copied from statement
Great efforts on the integration side, we should be really close merging this!

younesbelkada · 2023-02-03T17:13:52Z

README_ja.md

@@ -349,6 +349,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University から) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning から公開された研究論文: [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555)
 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research から) Sascha Rothe, Shashi Narayan, Aliaksei Severyn から公開された研究論文: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461)
 1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu から) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu から公開された研究論文: [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223)
+1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. 


it seems that the translation was not successful, can you maybe try to rebase with main branch and run make fix-copies?

younesbelkada · 2023-02-03T17:14:05Z

README_ko.md

@@ -264,6 +264,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다.
 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다.
 1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
+1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. 


same as above here

younesbelkada · 2023-02-03T17:15:34Z

docs/source/en/model_doc/ernie_m.mdx

@@ -0,0 +1,106 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.


Suggested change

<!--Copyright 2022 The HuggingFace Team. All rights reserved.

<!--Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

Same comment for all headers that are applicable

younesbelkada · 2023-02-03T17:17:44Z

src/transformers/models/ernie_m/__init__.py

+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+
+# Copyright 2020 The HuggingFace Team. All rights reserved.


Suggested change

# Copyright 2020 The HuggingFace Team. All rights reserved.

# Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

younesbelkada · 2023-02-03T18:33:35Z

src/transformers/models/ernie_m/configuration_ernie_m.py

+    This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE
+    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the


Suggested change

This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE

model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

This is the configuration class to store the configuration of a [*ErnieMModel*]. It is used to instantiate a ERNIE-M

model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

younesbelkada · 2023-02-03T18:48:51Z

src/transformers/models/ernie_m/tokenization_ernie_m.py

+        self.vocab = self.load_vocab(filepath=vocab_file)
+        self.reverse_vocab = dict((v, k) for k, v in self.vocab.items())
+
+        assert len(self.vocab) == len(self.reverse_vocab)


avoid using assert please, test the condition and raise a relevant error instead ;)

younesbelkada · 2023-02-03T18:50:31Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+        input_ids (Tensor):
+            See [`ErnieMModel`].
+        attention_mask (Tensor, optional):
+            See [`ErnieMModel`].
+        position_ids (Tensor, optional):
+            See [`ErnieMModel`].


maybe add more description here and follow the transformers convention? You can check some examples on the other modeling files

younesbelkada · 2023-02-03T18:51:54Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+    the pooled output and a softmax) e.g. for RocStories/SWAG tasks.""",
+    ERNIE_M_START_DOCSTRING,
+)
+# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py


Suggested change

# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py

# Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice with Bert->ErnieM

younesbelkada · 2023-02-03T18:52:26Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+    layers on top of the hidden-states output to compute `span start logits` and `span end logits`).""",
+    ERNIE_M_START_DOCSTRING,
+)
+# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py


Suggested change

# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py

# Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering with Bert->ErnieM

younesbelkada · 2023-02-03T18:52:53Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+    the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.""",
+    ERNIE_M_START_DOCSTRING,
+)
+# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py


Suggested change

# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py

# Copied from transformers.bert.modeling_bert.BertForTokenClassification with Bert->ErnieM

susnato · 2023-02-04T14:15:29Z

Hi, @younesbelkada I made all the changes that you requested,
Let me know if any other changes are needed or not and if I have missed any!

younesbelkada

Thanks a lot for addressing most of my comments! I left few final comments, mostly nits for better readability & to have an implementation that is close enough to other HF models (especially about how you deal with return_dict etc --> let's remove them from the config)
Looking forward to merging this PR!

younesbelkada · 2023-02-06T13:23:34Z

README_ko.md

@@ -263,7 +263,8 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
 1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다.
 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다.
-1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
+1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다. 


Suggested change

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.

This modification should not be here

younesbelkada · 2023-02-06T13:24:04Z

README_zh-hans.md

@@ -287,7 +287,8 @@ conda install -c huggingface transformers
 1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (来自 Snap Research) 伴随论文 [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) 由 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 发布。
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
-1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
+1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。 


nit: same as above

Suggested change

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。

younesbelkada · 2023-02-06T13:24:13Z

README_zh-hant.md

@@ -299,7 +299,8 @@ conda install -c huggingface transformers
 1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
-1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
+1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. 


Suggested change

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.

1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.

younesbelkada · 2023-02-06T13:29:12Z

docs/source/en/model_doc/ernie_m.mdx

+*Recent studies have demonstrated that pre-
+trained cross-lingual models achieve impres-
+sive performance in downstream cross-lingual
+tasks. This improvement benefits from learn-
+ing a large amount of monolingual and par-
+allel corpora. Although it is generally ac-
+knowledged that parallel corpora are critical
+for improving the model performance, ex-
+isting methods are often constrained by the
+size of parallel corpora, especially for low-
+resource languages. In this paper, we pro-
+pose ERNIE-M, a new training method that
+encourages the model to align the representa-
+tion of multiple languages with monolingual
+corpora, to overcome the constraint that the
+parallel corpus size places on the model per-
+formance. Our key insight is to integrate
+back-translation into the pre-training process.
+We generate pseudo-parallel sentence pairs on
+a monolingual corpus to enable the learning
+of semantic alignments between different lan-
+guages, thereby enhancing the semantic mod-
+eling of cross-lingual models. Experimental
+results show that ERNIE-M outperforms ex-
+isting cross-lingual models and delivers new
+state-of-the-art results in various cross-lingual
+downstream tasks.*


Suggested change

*Recent studies have demonstrated that pre-

trained cross-lingual models achieve impres-

sive performance in downstream cross-lingual

tasks. This improvement benefits from learn-

ing a large amount of monolingual and par-

allel corpora. Although it is generally ac-

knowledged that parallel corpora are critical

for improving the model performance, ex-

isting methods are often constrained by the

size of parallel corpora, especially for low-

resource languages. In this paper, we pro-

pose ERNIE-M, a new training method that

encourages the model to align the representa-

tion of multiple languages with monolingual

corpora, to overcome the constraint that the

parallel corpus size places on the model per-

formance. Our key insight is to integrate

back-translation into the pre-training process.

We generate pseudo-parallel sentence pairs on

a monolingual corpus to enable the learning

of semantic alignments between different lan-

guages, thereby enhancing the semantic mod-

eling of cross-lingual models. Experimental

results show that ERNIE-M outperforms ex-

isting cross-lingual models and delivers new

state-of-the-art results in various cross-lingual

downstream tasks.*

*Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks.

This improvement benefits from learning a large amount of monolingual and par-

allel corpora. Although it is generally acknowledged that parallel corpora are critical

for improving the model performance, ex-

isting methods are often constrained by the

size of parallel corpora, especially for low-

resource languages. In this paper, we pro-

pose ERNIE-M, a new training method that

encourages the model to align the representa-

tion of multiple languages with monolingual

corpora, to overcome the constraint that the

parallel corpus size places on the model per-

formance. Our key insight is to integrate

back-translation into the pre-training process.

We generate pseudo-parallel sentence pairs on

a monolingual corpus to enable the learning

of semantic alignments between different lan-

guages, thereby enhancing the semantic mod-

eling of cross-lingual models. Experimental

results show that ERNIE-M outperforms ex-

isting cross-lingual models and delivers new

state-of-the-art results in various cross-lingual

downstream tasks.*

etc. the additional dashes should not be there ;)

younesbelkada · 2023-02-06T13:30:56Z

src/transformers/models/ernie_m/configuration_ernie_m.py

+    defaults will yield a similar configuration to that of the ERNIE ernie-3.0-medium-zh architecture. Configuration
+    objects inherit from [*PretrainedConfig*] and can be used to control the model outputs. Read the documentation from


To adapt with the correct checkpoint name! (i.e. replace ERNIE ernie-3.0-medium-zh with the expected one)

src/transformers/models/ernie_m/modeling_ernie_m.py

younesbelkada · 2023-02-06T13:45:14Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.return_dict


Suggested change

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

output_hidden_states = (

output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states

)

return_dict = return_dict if return_dict is not None else self.config.return_dict

I don't think these lines are needed, usually we just retrieve these values from the arguments of the forward pass

@younesbelkada I am sorry but I think we still need those lines since there are some tests where the code tries to include return_dict, output_hidden_states and output_attentions in config and then tries to check the results.

I checked and thats true without those lines I am getting error(where the test is requiring a dict as output but since return_dict is in config(not as an argument) it's giving an error)

I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:

transformers/src/transformers/models/bart/modeling_bart.py

Line 790 in 10056d8

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

I think this is fine we can leave it as it is

younesbelkada · 2023-02-06T13:47:00Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+# Copied from transformers.models.bert.modeling_bert.BertSelfAttention with Bert->ErnieM,self.value->self.v_proj,self.key->self.k_proj,self.query->self.q_proj
+class ErnieMSelfAttention(nn.Module):
+    def __init__(self, config, position_embedding_type=None):
+        super().__init__()


nit: maybe just move the entire class above, for example after the definition of ErnieMPooler

src/transformers/models/ernie_m/modeling_ernie_m.py

younesbelkada · 2023-02-06T13:53:28Z

utils/check_repo.py

+    "DetaEncoder",  # Building part of bigger (tested) model.
+    "DetaDecoder",  # Building part of bigger (tested) model.


Suggested change

"DetaEncoder", # Building part of bigger (tested) model.

"DetaDecoder", # Building part of bigger (tested) model.

these are duplicates probably

susnato · 2023-02-06T16:12:06Z

Hi, @younesbelkada I made all the changes as you requested. The tests are now all successful! Please check it.

younesbelkada

Thanks a lot for iterating quickly! LGTM with only few nits!
Leaving it now to @sgugger and/or @ArthurZucker for final approvals ;)

younesbelkada · 2023-02-06T18:37:03Z

src/transformers/models/ernie_m/configuration_ernie_m.py

+__all__ = ["ERNIE_M_PRETRAINED_INIT_CONFIGURATION", "ErnieMConfig", "ERNIE_M_PRETRAINED_RESOURCE_FILES_MAP"]
+
+
+ERNIE_M_PRETRAINED_INIT_CONFIGURATION = {


Is this still needed?

Oh we don't need them(we only need ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP), I will also remove __all__.

younesbelkada · 2023-02-06T18:38:03Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+        hidden_states = residual + self.dropout1(hidden_states)
+        hidden_states = self.norm1(hidden_states)
+        residual = hidden_states
+        hidden_states = self.linear2(self.dropout(self.activation(self.linear1(hidden_states))))


nit: you can break down this in several lines

younesbelkada · 2023-02-06T18:41:46Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.return_dict


I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:

transformers/src/transformers/models/bart/modeling_bart.py

Line 790 in 10056d8

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

I think this is fine we can leave it as it is

younesbelkada · 2023-02-06T18:42:29Z

src/transformers/models/ernie_m/modeling_ernie_m.py

+        elif return_dict:
+            sequence_output = encoder_outputs["last_hidden_state"]
+            pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
+            hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
+            attentions = None if not output_attentions else encoder_outputs["attentions"]
+
+            return BaseModelOutputWithPoolingAndCrossAttentions(
+                last_hidden_state=sequence_output,
+                pooler_output=pooler_output,
+                hidden_states=hidden_states,
+                attentions=attentions,
+            )


nit: the elif condition is not needed

susnato · 2023-02-06T20:34:38Z

Hi @younesbelkada, I made all those changes that you requested.

younesbelkada · 2023-02-06T20:36:46Z

Thanks a lot @susnato ! Again great work on the integration so far!
Please wait for the next reviewers to add their review and we should be good merging the PR ;)

sgugger

Thanks for your PR and for adding this model! Make sure to follow our documentation style guide for the docstrings and there are a few things to fix with the tokenizer.

sgugger · 2023-02-06T21:44:22Z

docs/source/en/model_doc/ernie_m.mdx

+
+Tips:
+
+1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.


Suggested change

1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.

1. Ernie-M is a BERT-like model so it is a stacked Transformer Encoder.

sgugger · 2023-02-06T21:44:35Z

docs/source/en/model_doc/ernie_m.mdx

+Tips:
+
+1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.
+2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`


Suggested change

2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`

2. Instead of using MaskedLM for pretraining (like BERT) the authors used two novel techniques: `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`

sgugger · 2023-02-06T21:45:01Z

src/transformers/__init__.py

@@ -947,6 +948,7 @@
    _import_structure["modeling_utils"] = ["PreTrainedModel"]

    # PyTorch models structure
+


Suggested change

sgugger · 2023-02-06T21:45:46Z

src/transformers/__init__.py

+            "ErnieMForTokenClassification",
+            "ErnieMModel",
+            "ErnieMPreTrainedModel",
+            "ErnieMUIEM",


If this ends up being public, it needs a better name. What does UIEM stand for?

UIEM stands for Universal Information Extraction Model (it was implemented in the original paddlenlp implementation of Ernie-M) here

Should I change it to full name?

Maybe ErnieMForInformationExtraction then. It will be more understandable to a user than UIEM

sgugger · 2023-02-06T21:45:59Z

src/transformers/__init__.py

-
-        # PyTorch model imports


Needs to stay here, please revert.

This was not addressed.

sgugger · 2023-02-06T21:52:31Z

src/transformers/models/ernie_m/tokenization_ernie_m.py

+            token_ids_0 (List[int]):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (List[int], optional):
+                Optional second list of IDs for sequence pairs. Defaults to *None*.


Same comments as above.

sgugger · 2023-02-06T21:52:39Z

src/transformers/models/ernie_m/tokenization_ernie_m.py

+            offset_mapping_ids_0 (List[tuple]):
+                List of char offsets to which the special tokens will be added.
+            offset_mapping_ids_1 (List[tuple], optional):
+                Optional second list of wordpiece offsets for offset mapping pairs. Defaults to *None*.
+        Returns:
+            List[tuple]: List of wordpiece offsets with the appropriate offsets of special tokens.


Same comments as above.

sgugger · 2023-02-06T21:53:03Z

src/transformers/models/auto/modeling_auto.py

@@ -803,6 +807,7 @@
        ("distilbert", "DistilBertForMultipleChoice"),
        ("electra", "ElectraForMultipleChoice"),
        ("ernie", "ErnieForMultipleChoice"),
+        ("ernie_m", "ErnieMForMultipleChoice"),


The tokenization auto file should also be updated with the new tokenizer.

sgugger · 2023-02-06T21:53:41Z

src/transformers/__init__.py

@@ -260,6 +260,7 @@
        "ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP",
        "ErnieConfig",
    ],
+    "models.ernie_m": ["ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP", "ErnieMConfig", "ErnieMTokenizer"],


The tokenizer requires sentencepice, so its import should be protected.

I am sorry, I understood your previous comment - The tokenization auto file should also be updated with the new tokenizer. but didn't get this one(The tokenizer requires sentencepice, so its import should be protected.) could you please elaborate it?

The tokenizer should only be in the if is_sentencepiece_available part of the init.

sgugger · 2023-02-06T21:53:51Z

src/transformers/__init__.py

@@ -3731,6 +3745,7 @@
    from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
    from .models.encoder_decoder import EncoderDecoderConfig
    from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig
+    from .models.ernie_m import ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieMConfig, ErnieMTokenizer


susnato · 2023-02-07T18:57:07Z

Hi @sgugger I made those changes as you requested and the tests are passed too, please review them.

susnato · 2023-02-14T19:26:34Z

Hi @ArthurZucker I pushed the changes please check!

ArthurZucker · 2023-02-15T09:10:32Z

Okay! LGTM
@sgugger feel free to merge if you think this is ok! 😉

sgugger

Thanks again for all your work on this!

This reverts commit 0c9c847.

susnato changed the title ~~[WIP] Add Ernie-M Model to huggingface~~ Add Ernie-M Model to huggingface Feb 3, 2023

susnato marked this pull request as ready for review February 3, 2023 15:00

younesbelkada reviewed Feb 3, 2023

View reviewed changes

susnato force-pushed the erniem branch from 36566ee to e802367 Compare February 5, 2023 17:20

younesbelkada reviewed Feb 6, 2023

View reviewed changes

susnato force-pushed the erniem branch 3 times, most recently from d318860 to c3f9391 Compare February 6, 2023 16:06

younesbelkada approved these changes Feb 6, 2023

View reviewed changes

susnato force-pushed the erniem branch from 4eb7362 to 60d10d2 Compare February 6, 2023 19:53

younesbelkada requested a review from sgugger February 6, 2023 20:36

sgugger reviewed Feb 6, 2023

View reviewed changes

susnato force-pushed the erniem branch 4 times, most recently from 1121d45 to 5b8e8b6 Compare February 7, 2023 18:00

susnato requested a review from sgugger February 8, 2023 05:51

susnato added 18 commits February 15, 2023 00:05

fixed check_code_quality

da9ec50

fixed Build PR Documentation issue

46aa155

Added changes(comments) and also updated to the latest upstream/main

4a9346e

Added fixup

c9d3f16

Added # Copied comments

8288ae7

Added fixup

04f7390

Added more comments and some nits

037cd08

Added fixup

3cdf2b5

Fixed README_hd.md

f6d3029

Added more fixes

aeac79b

ErnieMTokenizer (being sentencepiece) protected and other docs edited

3fab9c3

Added code_quality fix

780a953

Fixed for

2d5d1d9

Added more fix

9deea82

modified AZ

d30b801

ernie-m tokenization test added!

d768832

attention mask part fixed(with 0->self.config.pad_token_id)

e07ca44

applied make fixup

14367d0

susnato force-pushed the erniem branch from 96bf61c to 14367d0 Compare February 14, 2023 18:38

sgugger approved these changes Feb 15, 2023

View reviewed changes

sgugger merged commit 0c9c847 into huggingface:main Feb 15, 2023

amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Feb 17, 2023

Revert "Add Ernie-M Model to huggingface (huggingface#21349)"

c2d61f8

This reverts commit 0c9c847.

		@@ -0,0 +1,106 @@
		<!--Copyright 2022 The HuggingFace Team. All rights reserved.

	<!--Copyright 2022 The HuggingFace Team. All rights reserved.
	<!--Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

	# Copyright 2020 The HuggingFace Team. All rights reserved.
	# Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

		This is the configuration class to store the configuration of a [ErnieModel]. It is used to instantiate a ERNIE
		model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

	# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
	# Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice with Bert->ErnieM

	# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
	# Copied from transformers.bert.modeling_bert.BertForTokenClassification with Bert->ErnieM

	1. [ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie) (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
	1. [ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie) (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.

		defaults will yield a similar configuration to that of the ERNIE ernie-3.0-medium-zh architecture. Configuration
		objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from

		"DetaEncoder", # Building part of bigger (tested) model.
		"DetaDecoder", # Building part of bigger (tested) model.

		__all__ = ["ERNIE_M_PRETRAINED_INIT_CONFIGURATION", "ErnieMConfig", "ERNIE_M_PRETRAINED_RESOURCE_FILES_MAP"]


		ERNIE_M_PRETRAINED_INIT_CONFIGURATION = {


		Tips:

		1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.

	1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.
	1. Ernie-M is a BERT-like model so it is a stacked Transformer Encoder.

	2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`
	2. Instead of using MaskedLM for pretraining (like BERT) the authors used two novel techniques: `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`

		@@ -947,6 +948,7 @@
		_import_structure["modeling_utils"] = ["PreTrainedModel"]

		# PyTorch models structure

Add Ernie-M Model to huggingface #21349

Add Ernie-M Model to huggingface #21349

Conversation

susnato commented Jan 28, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

younesbelkada commented Jan 28, 2023

susnato commented Feb 2, 2023 • edited Loading

younesbelkada commented Feb 3, 2023

susnato commented Feb 3, 2023 • edited Loading

younesbelkada commented Feb 3, 2023

susnato commented Feb 3, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Feb 3, 2023 • edited Loading

susnato commented Feb 3, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Feb 4, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Feb 6, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Feb 6, 2023

younesbelkada commented Feb 6, 2023

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Feb 7, 2023

susnato commented Feb 14, 2023

ArthurZucker commented Feb 15, 2023

sgugger left a comment

Choose a reason for hiding this comment

susnato commented Jan 28, 2023 •

edited

Loading

susnato commented Feb 2, 2023 •

edited

Loading

susnato commented Feb 3, 2023 •

edited

Loading

susnato commented Feb 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 3, 2023 •

edited

Loading

susnato commented Feb 4, 2023 •

edited

Loading

susnato commented Feb 6, 2023 •

edited

Loading

susnato Feb 7, 2023 •

edited

Loading