Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ernie-M Model to huggingface #21349

Merged
merged 30 commits into from
Feb 15, 2023
Merged

Add Ernie-M Model to huggingface #21349

merged 30 commits into from
Feb 15, 2023

Conversation

susnato
Copy link
Contributor

@susnato susnato commented Jan 28, 2023

What does this PR do?

Ports Ernie-M from paddle to huggingface(pytorch) and also Fixes #21123
I have uploaded the pytorch converted weights here and here. The paddle2pytorch weights conversion script has been provided there too.

Work done till now -

  1. ported the weights.

  2. Added configuration_ernie_m.py
    from transformers import AutoConfig
    config = AutoConfig.from_pretrained("susnato/ernie-m-base_pytorch"))

  3. Added tokenization_ernie_m.py (Only Slow Tokenizer implemented)
    from transformers import ErnieMTokenizer
    tokenizer = ErnieMTokenizer.from_pretrained("susnato/ernie-m-base_pytorch")

  4. ErnieMModel in now working.
    from transformers import AutoModel
    model = AutoModel.from_pretrained("susnato/ernie-m-base_pytorch") # susnato/ernie-m-large_pytorch

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker and @younesbelkada

@younesbelkada
Copy link
Contributor

Great work @susnato ! Looking forward to reviewing your PR :)
Let us know when you think the PR is ready

@susnato
Copy link
Contributor Author

susnato commented Feb 2, 2023

Hi @younesbelkada the official paddlenlp implementation of ErnieM does not have any LM head class, since it was neither trained on Causal nor on Masked LM. It was pretrained on both Cross Attention Masked LM and Back Translation Masked LM (both implementations are missing in paddlenlp). Do I need to add MaskedLM in this huggingface implementation since it's a encoder based model or should I bypass it and don't include any LM head like the paddlenlp implementation did?

@younesbelkada
Copy link
Contributor

Hi @susnato !
Thanks for your your message
I think this quite depends on the use case of your model. I'd expect most of the users will rely on ErnieMModel since it's the model that is present at paddlepaddle. If there is an interest to add these models in the future we can always open follow-up PRs

@susnato
Copy link
Contributor Author

susnato commented Feb 3, 2023

Hi @susnato ! Thanks for your your message I think this quite depends on the use case of your model. I'd expect most of the users will rely on ErnieMModel since it's the model that is present at paddlepaddle. If there is an interest to add these models in the future we can always open follow-up PRs

@younesbelkada Ok, then I will not add any LMhead for now, and also the rest of the model is ready(with all tests passed), I am currently looking why circleci tests are failing.

@younesbelkada
Copy link
Contributor

Thanks!
Currently some tests are not passing because you need to define a ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP insideconfiguration_ernie_m.py, check here how it is done for bert: https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/configuration_bert.py

@susnato
Copy link
Contributor Author

susnato commented Feb 3, 2023

Hi @younesbelkada I added that and did bunch of others changes with make repo-consistency, make style, but when I run make fixup still it says this error
python utils/check_config_docstrings.py Traceback (most recent call last): File "/home/susnato/temp_files/transformers/utils/check_config_docstrings.py", line 89, in <module> check_config_docstrings_have_checkpoints() File "/home/susnato/temp_files/transformers/utils/check_config_docstrings.py", line 85, in check_config_docstrings_have_checkpoints raise ValueError(f"The following configurations don't contain any valid checkpoint:\n{message}") ValueError: The following configurations don't contain any valid checkpoint: ErnieMConfig

The values I set are - ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP = { "ernie-m-base_pytorch": "https://huggingface.co/susnato/ernie-m-base_pytorch/blob/main/config.json", "ernie-m-large_pytorch": "https://huggingface.co/susnato/ernie-m-large_pytorch/blob/main/config.json", }

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 3, 2023

The documentation is not available anymore as the PR was closed or merged.

@susnato
Copy link
Contributor Author

susnato commented Feb 3, 2023

Hi @younesbelkada all checks are successful! the PR is ready to review, please review it.

@susnato susnato changed the title [WIP] Add Ernie-M Model to huggingface Add Ernie-M Model to huggingface Feb 3, 2023
@susnato susnato marked this pull request as ready for review February 3, 2023 15:00
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great addition!
I left a couple of comments, mostly nits, my main comments being to avoid hard-coded key-word arguments ( return_dict is hardcoded sometimes), try to put type hints as much as you can !
It is very nice that the model supports various training strategies, better to add simple tests testing these!
Also does ErnieMSelfOutput copies the structure from another module? If this is the case, better to use # Copied from statement
Great efforts on the integration side, we should be really close merging this!

README_ja.md Outdated
@@ -349,6 +349,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University から) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning から公開された研究論文: [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555)
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research から) Sascha Rothe, Shashi Narayan, Aliaksei Severyn から公開された研究論文: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461)
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu から) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu から公開された研究論文: [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223)
1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that the translation was not successful, can you maybe try to rebase with main branch and run make fix-copies?

README_ko.md Outdated
@@ -264,6 +264,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
1. **[ErnieM](https://huggingface.co/docs/transformers/main/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above here

@@ -0,0 +1,106 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

Same comment for all headers that are applicable

# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2020 The HuggingFace Team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2020 The HuggingFace Team. All rights reserved.
# Copyright 2023 The HuggingFace and Baidu Team. All rights reserved.

Comment on lines 71 to 34
This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is the configuration class to store the configuration of a [*ErnieModel*]. It is used to instantiate a ERNIE
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
This is the configuration class to store the configuration of a [*ErnieMModel*]. It is used to instantiate a ERNIE-M
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

self.vocab = self.load_vocab(filepath=vocab_file)
self.reverse_vocab = dict((v, k) for k, v in self.vocab.items())

assert len(self.vocab) == len(self.reverse_vocab)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid using assert please, test the condition and raise a relevant error instead ;)

Comment on lines 827 to 832
input_ids (Tensor):
See [`ErnieMModel`].
attention_mask (Tensor, optional):
See [`ErnieMModel`].
position_ids (Tensor, optional):
See [`ErnieMModel`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add more description here and follow the transformers convention? You can check some examples on the other modeling files

the pooled output and a softmax) e.g. for RocStories/SWAG tasks.""",
ERNIE_M_START_DOCSTRING,
)
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
# Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice with Bert->ErnieM

layers on top of the hidden-states output to compute `span start logits` and `span end logits`).""",
ERNIE_M_START_DOCSTRING,
)
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
# Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering with Bert->ErnieM

the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.""",
ERNIE_M_START_DOCSTRING,
)
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copied from https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py
# Copied from transformers.bert.modeling_bert.BertForTokenClassification with Bert->ErnieM

@susnato
Copy link
Contributor Author

susnato commented Feb 4, 2023

Hi, @younesbelkada I made all the changes that you requested,
Let me know if any other changes are needed or not and if I have missed any!

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing most of my comments! I left few final comments, mostly nits for better readability & to have an implementation that is close enough to other HF models (especially about how you deal with return_dict etc --> let's remove them from the config)
Looking forward to merging this PR!

README_ko.md Outdated
@@ -263,7 +263,8 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University 에서) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 의 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 논문과 함께 발표했습니다.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research 에서) Sascha Rothe, Shashi Narayan, Aliaksei Severyn 의 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 논문과 함께 발표했습니다.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu 에서) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 의 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) 논문과 함께 발표했습니다.

This modification should not be here

@@ -287,7 +287,8 @@ conda install -c huggingface transformers
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (来自 Snap Research) 伴随论文 [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) 由 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 发布。
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above

Suggested change
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。

@@ -299,7 +299,8 @@ conda install -c huggingface transformers
1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.

Comment on lines 23 to 49
*Recent studies have demonstrated that pre-
trained cross-lingual models achieve impres-
sive performance in downstream cross-lingual
tasks. This improvement benefits from learn-
ing a large amount of monolingual and par-
allel corpora. Although it is generally ac-
knowledged that parallel corpora are critical
for improving the model performance, ex-
isting methods are often constrained by the
size of parallel corpora, especially for low-
resource languages. In this paper, we pro-
pose ERNIE-M, a new training method that
encourages the model to align the representa-
tion of multiple languages with monolingual
corpora, to overcome the constraint that the
parallel corpus size places on the model per-
formance. Our key insight is to integrate
back-translation into the pre-training process.
We generate pseudo-parallel sentence pairs on
a monolingual corpus to enable the learning
of semantic alignments between different lan-
guages, thereby enhancing the semantic mod-
eling of cross-lingual models. Experimental
results show that ERNIE-M outperforms ex-
isting cross-lingual models and delivers new
state-of-the-art results in various cross-lingual
downstream tasks.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Recent studies have demonstrated that pre-
trained cross-lingual models achieve impres-
sive performance in downstream cross-lingual
tasks. This improvement benefits from learn-
ing a large amount of monolingual and par-
allel corpora. Although it is generally ac-
knowledged that parallel corpora are critical
for improving the model performance, ex-
isting methods are often constrained by the
size of parallel corpora, especially for low-
resource languages. In this paper, we pro-
pose ERNIE-M, a new training method that
encourages the model to align the representa-
tion of multiple languages with monolingual
corpora, to overcome the constraint that the
parallel corpus size places on the model per-
formance. Our key insight is to integrate
back-translation into the pre-training process.
We generate pseudo-parallel sentence pairs on
a monolingual corpus to enable the learning
of semantic alignments between different lan-
guages, thereby enhancing the semantic mod-
eling of cross-lingual models. Experimental
results show that ERNIE-M outperforms ex-
isting cross-lingual models and delivers new
state-of-the-art results in various cross-lingual
downstream tasks.*
*Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks.
This improvement benefits from learning a large amount of monolingual and par-
allel corpora. Although it is generally acknowledged that parallel corpora are critical
for improving the model performance, ex-
isting methods are often constrained by the
size of parallel corpora, especially for low-
resource languages. In this paper, we pro-
pose ERNIE-M, a new training method that
encourages the model to align the representa-
tion of multiple languages with monolingual
corpora, to overcome the constraint that the
parallel corpus size places on the model per-
formance. Our key insight is to integrate
back-translation into the pre-training process.
We generate pseudo-parallel sentence pairs on
a monolingual corpus to enable the learning
of semantic alignments between different lan-
guages, thereby enhancing the semantic mod-
eling of cross-lingual models. Experimental
results show that ERNIE-M outperforms ex-
isting cross-lingual models and delivers new
state-of-the-art results in various cross-lingual
downstream tasks.*

etc. the additional dashes should not be there ;)

Comment on lines 73 to 74
defaults will yield a similar configuration to that of the ERNIE ernie-3.0-medium-zh architecture. Configuration
objects inherit from [*PretrainedConfig*] and can be used to control the model outputs. Read the documentation from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To adapt with the correct checkpoint name! (i.e. replace ERNIE ernie-3.0-medium-zh with the expected one)

Comment on lines +406 to +552
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.return_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.return_dict

I don't think these lines are needed, usually we just retrieve these values from the arguments of the forward pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younesbelkada I am sorry but I think we still need those lines since there are some tests where the code tries to include return_dict, output_hidden_states and output_attentions in config and then tries to check the results.

I checked and thats true without those lines I am getting error(where the test is requiring a dict as output but since return_dict is in config(not as an argument) it's giving an error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

I think this is fine we can leave it as it is

Comment on lines +927 to +930
# Copied from transformers.models.bert.modeling_bert.BertSelfAttention with Bert->ErnieM,self.value->self.v_proj,self.key->self.k_proj,self.query->self.q_proj
class ErnieMSelfAttention(nn.Module):
def __init__(self, config, position_embedding_type=None):
super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just move the entire class above, for example after the definition of ErnieMPooler

Comment on lines 59 to 61
"DetaEncoder", # Building part of bigger (tested) model.
"DetaDecoder", # Building part of bigger (tested) model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"DetaEncoder", # Building part of bigger (tested) model.
"DetaDecoder", # Building part of bigger (tested) model.

these are duplicates probably

@susnato susnato force-pushed the erniem branch 3 times, most recently from d318860 to c3f9391 Compare February 6, 2023 16:06
@susnato
Copy link
Contributor Author

susnato commented Feb 6, 2023

Hi, @younesbelkada I made all the changes as you requested. The tests are now all successful! Please check it.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating quickly! LGTM with only few nits!
Leaving it now to @sgugger and/or @ArthurZucker for final approvals ;)

__all__ = ["ERNIE_M_PRETRAINED_INIT_CONFIGURATION", "ErnieMConfig", "ERNIE_M_PRETRAINED_RESOURCE_FILES_MAP"]


ERNIE_M_PRETRAINED_INIT_CONFIGURATION = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we don't need them(we only need ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP), I will also remove __all__.

hidden_states = residual + self.dropout1(hidden_states)
hidden_states = self.norm1(hidden_states)
residual = hidden_states
hidden_states = self.linear2(self.dropout(self.activation(self.linear1(hidden_states))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can break down this in several lines

Comment on lines +406 to +552
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.return_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thank you for explaining, I was confused because this was not implemented in our Ernie implementation but can confirm that this is implemented for other architectures such as BART:

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

I think this is fine we can leave it as it is

Comment on lines 586 to 597
elif return_dict:
sequence_output = encoder_outputs["last_hidden_state"]
pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
attentions = None if not output_attentions else encoder_outputs["attentions"]

return BaseModelOutputWithPoolingAndCrossAttentions(
last_hidden_state=sequence_output,
pooler_output=pooler_output,
hidden_states=hidden_states,
attentions=attentions,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the elif condition is not needed

@susnato
Copy link
Contributor Author

susnato commented Feb 6, 2023

Hi @younesbelkada, I made all those changes that you requested.

@younesbelkada younesbelkada requested a review from sgugger February 6, 2023 20:36
@younesbelkada
Copy link
Contributor

Thanks a lot @susnato ! Again great work on the integration so far!
Please wait for the next reviewers to add their review and we should be good merging the PR ;)

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR and for adding this model! Make sure to follow our documentation style guide for the docstrings and there are a few things to fix with the tokenizer.


Tips:

1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.
1. Ernie-M is a BERT-like model so it is a stacked Transformer Encoder.

Tips:

1. Ernie-M is BERT-like model so it is stacked Transformer Encoder.
2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Instead of using MaskedLM for pretraining(like BERT) the authors used two novel techniques such as `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`
2. Instead of using MaskedLM for pretraining (like BERT) the authors used two novel techniques: `Cross-attention Masked Language Modeling` and `Back-translation Masked Language Modeling`

@@ -947,6 +948,7 @@
_import_structure["modeling_utils"] = ["PreTrainedModel"]

# PyTorch models structure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

"ErnieMForTokenClassification",
"ErnieMModel",
"ErnieMPreTrainedModel",
"ErnieMUIEM",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this ends up being public, it needs a better name. What does UIEM stand for?

Copy link
Contributor Author

@susnato susnato Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UIEM stands for Universal Information Extraction Model (it was implemented in the original paddlenlp implementation of Ernie-M) here

Should I change it to full name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ErnieMForInformationExtraction then. It will be more understandable to a user than UIEM

Comment on lines 4329 to 4350

# PyTorch model imports
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to stay here, please revert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not addressed.

Comment on lines 226 to 229
token_ids_0 (List[int]):
List of IDs to which the special tokens will be added.
token_ids_1 (List[int], optional):
Optional second list of IDs for sequence pairs. Defaults to *None*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.

Comment on lines 247 to 252
offset_mapping_ids_0 (List[tuple]):
List of char offsets to which the special tokens will be added.
offset_mapping_ids_1 (List[tuple], optional):
Optional second list of wordpiece offsets for offset mapping pairs. Defaults to *None*.
Returns:
List[tuple]: List of wordpiece offsets with the appropriate offsets of special tokens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.

@@ -803,6 +807,7 @@
("distilbert", "DistilBertForMultipleChoice"),
("electra", "ElectraForMultipleChoice"),
("ernie", "ErnieForMultipleChoice"),
("ernie_m", "ErnieMForMultipleChoice"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenization auto file should also be updated with the new tokenizer.

@@ -260,6 +260,7 @@
"ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP",
"ErnieConfig",
],
"models.ernie_m": ["ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP", "ErnieMConfig", "ErnieMTokenizer"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer requires sentencepice, so its import should be protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I understood your previous comment - The tokenization auto file should also be updated with the new tokenizer. but didn't get this one(The tokenizer requires sentencepice, so its import should be protected.) could you please elaborate it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer should only be in the if is_sentencepiece_available part of the init.

@@ -3731,6 +3745,7 @@
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
from .models.encoder_decoder import EncoderDecoderConfig
from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig
from .models.ernie_m import ERNIE_M_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieMConfig, ErnieMTokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@susnato susnato force-pushed the erniem branch 4 times, most recently from 1121d45 to 5b8e8b6 Compare February 7, 2023 18:00
@susnato
Copy link
Contributor Author

susnato commented Feb 7, 2023

Hi @sgugger I made those changes as you requested and the tests are passed too, please review them.

@susnato susnato requested a review from sgugger February 8, 2023 05:51
@susnato
Copy link
Contributor Author

susnato commented Feb 14, 2023

Hi @ArthurZucker I pushed the changes please check!

@ArthurZucker
Copy link
Collaborator

Okay! LGTM
@sgugger feel free to merge if you think this is ok! 😉

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all your work on this!

@sgugger sgugger merged commit 0c9c847 into huggingface:main Feb 15, 2023
amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ernie-M
5 participants