Skip to content

Commit

Permalink
Honor contributors to models (#11329)
Browse files Browse the repository at this point in the history
* Honor contributors to models

* Fix typo

* Address review comments

* Add more authors
  • Loading branch information
sgugger authored and Rocketknight1 committed Apr 21, 2021
1 parent 5c96445 commit 41788ba
Show file tree
Hide file tree
Showing 57 changed files with 121 additions and 55 deletions.
3 changes: 2 additions & 1 deletion docs/source/model_doc/albert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ Tips:
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.

The original code can be found `here <https://github.com/google-research/ALBERT>`__.
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/ALBERT>`__.

AlbertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/bart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.

The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.


Examples
Expand Down
5 changes: 3 additions & 2 deletions docs/source/model_doc/barthez.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ BARThez
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model`
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
2020.

Expand All @@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*

The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__.
This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
<https://github.com/moussaKam/BARThez>`__.


Examples
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/bert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ Tips:
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.

The original code can be found `here <https://github.com/google-research/bert>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/google-research/bert>`__.

BertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 2 additions & 0 deletions docs/source/model_doc/bert_japanese.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ Tips:
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
<bert>` for more usage examples.

This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.

BertJapaneseTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/bertgeneration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ Tips:
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.

The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.

BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions docs/source/model_doc/bertweet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ Example of use:
>>> # from transformers import TFAutoModel
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
<https://github.com/VinAIResearch/BERTweet>`__.

BertweetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/bigbird.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ Tips:
- Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**

The original code can be found `here <https://github.com/google-research/bigbird>`__.
This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
`here <https://github.com/google-research/bigbird>`__.

BigBirdConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/blenderbot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*

The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
<https://github.com/facebookresearch/ParlAI>`__ .


Implementation Notes
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/blenderbot_small.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*

The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
found `here <https://github.com/facebookresearch/ParlAI>`__ .

BlenderbotSmallConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/bort.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,5 @@ Tips:
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
algorithm to make BORT fine-tuning work.

The original code can be found `here <https://github.com/alexa/bort/>`__.
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
<https://github.com/alexa/bort/>`__.
3 changes: 2 additions & 1 deletion docs/source/model_doc/camembert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ Tips:
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
as well as the information relative to the inputs and outputs.

The original code can be found `here <https://camembert-model.fr/>`__.
This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
<https://camembert-model.fr/>`__.

CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
6 changes: 4 additions & 2 deletions docs/source/model_doc/convbert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
using less than 1/4 training cost. Code and pre-trained models will be released.*

ConvBERT training tips are similar to those of BERT. The original implementation can be found here:
https://github.com/yitu-opensource/ConvBert
ConvBERT training tips are similar to those of BERT.

This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
here: https://github.com/yitu-opensource/ConvBert

ConvBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/cpm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
NLP tasks in the settings of few-shot (even zero-shot) learning.*

The original implementation can be found here: https://github.com/TsinghuaAI/CPM-Generate
This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
here: https://github.com/TsinghuaAI/CPM-Generate

Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.

Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/ctrl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ Tips:
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument.

The original code can be found `here <https://github.com/salesforce/ctrl>`__.
This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
`here <https://github.com/salesforce/ctrl>`__.


CTRLConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/deberta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*


The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.


DebertaConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/deberta_v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ New in v2:
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
performance of downstream tasks.

The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.


DebertaV2Config
Expand Down
2 changes: 2 additions & 0 deletions docs/source/model_doc/deit.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ Tips:
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
prepare images for the model.

This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.


DeiTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion docs/source/model_doc/distilbert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Tips:
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.

The original code can be found `here
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.


Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/dpr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.*

The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
<https://github.com/facebookresearch/DPR>`__.


DPRConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/electra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ Tips:
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator).

The original code can be found `here <https://github.com/google-research/electra>`__.
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/electra>`__.


ElectraConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/flaubert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
community for further reproducible experiments in French NLP.*

The original code can be found `here <https://github.com/getalp/Flaubert>`__.
This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
<https://github.com/getalp/Flaubert>`__.


FlaubertConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/fsmt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.*

The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.

Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/funnel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ Tips:
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`.

The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
<https://github.com/laiguokun/Funnel-Transformer>`__.


FunnelConfig
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/gpt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ Tips:
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
showcasing the generative capabilities of several models. GPT is one of them.

The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/openai/finetune-transformer-lm>`__.

Note:

Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/gpt2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ Tips:
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.

The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://openai.com/blog/better-language-models/>`__.


GPT2Config
Expand Down
2 changes: 2 additions & 0 deletions docs/source/model_doc/gpt_neo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
256 tokens.

This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.

Generation
_______________________________________________________________________________________________________________________

Expand Down
4 changes: 3 additions & 1 deletion docs/source/model_doc/herbert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ Examples of use:
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
`here <https://github.com/allegro/HerBERT>`__.


HerbertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/ibert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
been open-sourced.*

This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
<https://github.com/kssteven418/I-BERT>`__.

The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.

IBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/layoutlm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
It includes an inference part, which shows how to use Google's Tesseract on a new document.

The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.


LayoutLMConfig
Expand Down
2 changes: 2 additions & 0 deletions docs/source/model_doc/led.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ Tips:
- A notebook showing how to fine-tune LED, can be accessed `here
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.

This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.


LEDConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/longformer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ Tips:
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
:obj:`</s>`).

The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
<https://github.com/allenai/longformer>`__.

Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/lxmert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ Tips:
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded.

The original code can be found `here <https://github.com/airsplay/lxmert>`__.
This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
<https://github.com/airsplay/lxmert>`__.


LxmertConfig
Expand Down
2 changes: 2 additions & 0 deletions docs/source/model_doc/m2m_100.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga
translating between non-English directions while performing competitively to the best single systems of WMT. We
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*

This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.


Training and Generation
_______________________________________________________________________________________________________________________
Expand Down
1 change: 1 addition & 0 deletions docs/source/model_doc/marian.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Implementation Notes
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
:obj:`<s/>`),
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.

Naming
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/mbart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
on the encoder, decoder, or reconstructing parts of the text.

The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__

Training of MBart
_______________________________________________________________________________________________________________________
Expand Down
Loading

0 comments on commit 41788ba

Please sign in to comment.