Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch return_dict to True by default. #8530

Merged
merged 18 commits into from
Nov 16, 2020
2 changes: 1 addition & 1 deletion docs/source/model_doc/bertgeneration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Usage:
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids

# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels, return_dict=True).loss
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
loss.backward()


Expand Down
4 changes: 2 additions & 2 deletions docs/source/model_doc/t5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids=input_ids, labels=labels).loss

- Supervised training

Expand All @@ -77,7 +77,7 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids=input_ids, labels=labels).loss


T5Config
Expand Down
32 changes: 16 additions & 16 deletions docs/source/task_summary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ each other. The process is the following:
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=True)
>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

>>> classes = ["not paraphrase", "is paraphrase"]

Expand Down Expand Up @@ -122,7 +122,7 @@ each other. The process is the following:
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=True)
>>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

>>> classes = ["not paraphrase", "is paraphrase"]

Expand Down Expand Up @@ -211,7 +211,7 @@ Here is an example of question answering using a model and a tokenizer. The proc
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad", return_dict=True)
>>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
Expand Down Expand Up @@ -253,7 +253,7 @@ Here is an example of question answering using a model and a tokenizer. The proc
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad", return_dict=True)
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
Expand Down Expand Up @@ -373,7 +373,7 @@ Here is an example of doing masked language modeling using a model and a tokeniz
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")

>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."

Expand All @@ -389,7 +389,7 @@ Here is an example of doing masked language modeling using a model and a tokeniz
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")

>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."

Expand Down Expand Up @@ -437,7 +437,7 @@ of tokens.
>>> from torch.nn import functional as F

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelWithLMHead.from_pretrained("gpt2", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("gpt2")

>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "

Expand All @@ -461,7 +461,7 @@ of tokens.
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = TFAutoModelWithLMHead.from_pretrained("gpt2", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("gpt2")

>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "

Expand Down Expand Up @@ -520,7 +520,7 @@ Here is an example of text generation using ``XLNet`` and its tokenizer.
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer

>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
Expand All @@ -545,7 +545,7 @@ Here is an example of text generation using ``XLNet`` and its tokenizer.
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer

>>> model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
Expand Down Expand Up @@ -664,7 +664,7 @@ Here is an example of doing named entity recognition, using a model and a tokeni
>>> from transformers import AutoModelForTokenClassification, AutoTokenizer
>>> import torch

>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english", return_dict=True)
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

>>> label_list = [
Expand Down Expand Up @@ -692,7 +692,7 @@ Here is an example of doing named entity recognition, using a model and a tokeni
>>> from transformers import TFAutoModelForTokenClassification, AutoTokenizer
>>> import tensorflow as tf

>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english", return_dict=True)
>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

>>> label_list = [
Expand Down Expand Up @@ -790,7 +790,7 @@ CNN / Daily Mail), it yields very good results.
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer

>>> model = AutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")

>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
Expand All @@ -799,7 +799,7 @@ CNN / Daily Mail), it yields very good results.
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer

>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")

>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
Expand Down Expand Up @@ -843,15 +843,15 @@ Here is an example of doing translation using a model and a tokenizer. The proce
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer

>>> model = AutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")

>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer

>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")

>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="tf")
Expand Down
2 changes: 1 addition & 1 deletion docs/source/training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ head on top of the encoder with an output size of 2. Models are initialized in `
.. code-block:: python

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.train()

This is useful because it allows us to make use of the pre-trained BERT encoder and easily train it on whatever
Expand Down
4 changes: 1 addition & 3 deletions examples/lxmert/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,6 @@
" visual_feats=features,\n",
" visual_pos=normalized_boxes,\n",
" token_type_ids=inputs.token_type_ids,\n",
" return_dict=True,\n",
" output_attentions=False,\n",
" )\n",
" output_vqa = lxmert_vqa(\n",
Expand All @@ -219,7 +218,6 @@
" visual_feats=features,\n",
" visual_pos=normalized_boxes,\n",
" token_type_ids=inputs.token_type_ids,\n",
" return_dict=True,\n",
" output_attentions=False,\n",
" )\n",
" # get prediction\n",
Expand Down Expand Up @@ -266,4 +264,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
2 changes: 1 addition & 1 deletion examples/question-answering/run_squad.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ def evaluate(args, model, tokenizer, prefix=""):
eval_feature = features[feature_index.item()]
unique_id = int(eval_feature.unique_id)

output = [to_list(output[i]) for output in outputs]
output = [to_list(output[i]) for output in outputs.to_tuple()]

# Some models (XLNet, XLM) use 5 arguments for their predictions, while the other "simpler"
# models only use two.
Expand Down
2 changes: 1 addition & 1 deletion examples/rag/eval_rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def strip_title(title):
truncation=True,
)["input_ids"].to(args.device)

question_enc_outputs = rag_model.rag.question_encoder(retriever_input_ids, return_dict=True)
question_enc_outputs = rag_model.rag.question_encoder(retriever_input_ids)
question_enc_pool_output = question_enc_outputs.pooler_output

result = rag_model.retriever(
Expand Down
1 change: 0 additions & 1 deletion examples/rag/finetune.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,6 @@ def _step(self, batch: dict) -> Tuple:
decoder_input_ids=decoder_input_ids,
use_cache=False,
labels=lm_labels,
return_dict=True,
**rag_kwargs,
)

Expand Down
2 changes: 1 addition & 1 deletion examples/rag/use_own_knowledge_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def embed(documents: dict, ctx_encoder: DPRContextEncoder, ctx_tokenizer: DPRCon
input_ids = ctx_tokenizer(
documents["title"], documents["text"], truncation=True, padding="longest", return_tensors="pt"
)["input_ids"]
embeddings = ctx_encoder(input_ids.to(device=device), return_dict=True).pooler_output
embeddings = ctx_encoder(input_ids.to(device=device)).pooler_output
return {"embeddings": embeddings.detach().cpu().numpy()}


Expand Down
3 changes: 0 additions & 3 deletions examples/seq2seq/distillation.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,6 @@ def _step(self, batch: dict) -> tuple:
output_hidden_states=self.do_calc_hidden_loss,
output_attentions=False,
use_cache=False,
return_dict=True,
)
lm_logits = student_outputs.logits

Expand All @@ -179,7 +178,6 @@ def zero_tensor():
input_ids,
attention_mask=src_mask,
output_hidden_states=self.do_calc_hidden_loss,
return_dict=True,
)
if self.different_base_models:
teacher_enc_outputs = all_teacher_encoder_outputs.last_hidden_state
Expand All @@ -199,7 +197,6 @@ def zero_tensor():
decoder_input_ids=decoder_input_ids,
output_hidden_states=self.do_calc_hidden_loss,
use_cache=False, # since we are not passing labels, never let this default to True
return_dict=True,
)
dec_mask = decoder_input_ids.ne(pad_token_id)
loss_ce = self.calc_ce_loss(dec_mask, lm_logits, teacher_outputs.logits)
Expand Down
2 changes: 1 addition & 1 deletion examples/seq2seq/test_seq2seq_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def test_distill_checkpointing_with_teacher(self):

@require_torch_non_multi_gpu_but_fix_me
def test_loss_fn(self):
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY, return_dict=True)
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY)
input_ids, mask = model.dummy_inputs["input_ids"], model.dummy_inputs["attention_mask"]
target_ids = torch.tensor([[0, 4, 8, 2], [0, 8, 2, 1]], dtype=torch.long, device=model.device)
decoder_input_ids = target_ids[:, :-1].contiguous() # Why this line?
Expand Down
2 changes: 1 addition & 1 deletion model_cards/microsoft/prophetnet-large-uncased/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ target_str = "us rejects charges against its ambassador in bolivia"
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
labels = tokenizer(target_str, return_tensors="pt").input_ids

loss = model(input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids, labels=labels).loss
```

### Citation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ target_str = "us rejects charges against its ambassador in bolivia"
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
labels = tokenizer(target_str, return_tensors="pt").input_ids

loss = model(input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids, labels=labels).loss
```

Note that since this model is a multi-lingual model it can be fine-tuned on all kinds of other languages.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code', return_dict=True)
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')

inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
Expand Down
2 changes: 1 addition & 1 deletion model_cards/sentence-transformers/LaBSE/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ sentences = ["Hello World", "Hallo Welt"]
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')

with torch.no_grad():
model_output = model(**encoded_input, return_dict=True)
model_output = model(**encoded_input)

embeddings = model_output.pooler_output
embeddings = torch.nn.functional.normalize(embeddings)
Expand Down
2 changes: 1 addition & 1 deletion scripts/fsmt/fsmt-make-super-tiny-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@

# Test
batch = tokenizer.prepare_seq2seq_batch(["Making tiny model"])
outputs = tiny_model(**batch, return_dict=True)
outputs = tiny_model(**batch)

print("test output:", len(outputs.logits[0]))

Expand Down
2 changes: 1 addition & 1 deletion scripts/fsmt/fsmt-make-tiny-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

# Test
batch = tokenizer.prepare_seq2seq_batch(["Making tiny model"])
outputs = tiny_model(**batch, return_dict=True)
outputs = tiny_model(**batch)

print("test output:", len(outputs.logits[0]))

Expand Down
4 changes: 2 additions & 2 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ class PretrainedConfig(object):
Whether or not the model should returns all attentions.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models).
return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
return_dict (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a plain
tuple.
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Expand Down Expand Up @@ -163,7 +163,7 @@ class PretrainedConfig(object):

def __init__(self, **kwargs):
# Attributes with defaults
self.return_dict = kwargs.pop("return_dict", False)
self.return_dict = kwargs.pop("return_dict", True)
self.output_hidden_states = kwargs.pop("output_hidden_states", False)
self.output_attentions = kwargs.pop("output_attentions", False)
self.use_cache = kwargs.pop("use_cache", True) # Not used by all models
Expand Down
Loading