"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336

mespla · 2020-12-29T09:44:42Z

Environment info

Platform: Linux-4.15.0-123-generic-x86_64-with-glibc2.10
Tried transformers versions 4.1.1 (installed with pip) and 4.2.2 (installed from master branch of the repository)
Python version: 3.7
PyTorch version: 1.7
Tensorflow version: 2.4
Number of available GPU: 2 (GeForce RTX 2080 Ti, with ~11GB of memory each)

Information

Model I am using (Bert, XLNet ...): MBart -> facebook/mbart-large-cc25

The problem arises when using: the official example scripts: (details below)

The tasks I am working on is: my own task or dataset: (details below)

I am fine-tuning MBart using my own dataset, using the examples/seq2seq/finetune.sh script. When I run it on a single GPU, I get a memory error, as one GPU has not enough memory to load the MBart model. When I try to distribute the model on two GPUs, I get a RuntimeError:
RuntimeError: Input, output and indices must be on the current device

To reproduce

I am running the script in the following way:
CUDA_VISIBLE_DEVICES=0,1 transformers/examples/seq2seq/finetune.sh --model_name_or_path "facebook/mbart-large-cc25" --output_dir output --data_dir data --overwrite_output_dir --model_parallel --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --freeze_encoder --freeze_embeds --tgt_lang "en"

I have also tried:
CUDA_VISIBLE_DEVICES=0,1 transformers/examples/seq2seq/finetune.sh --model_name_or_path "facebook/mbart-large-cc25" --output_dir output --data_dir data --overwrite_output_dir --model_parallel --tgt_lang "en"

I also tried limiting the length of source and target sentences by trying several values for --max_target_length and --max_source_length'. In addition, I tried using more GPUs (up to 4).

If I run wc -l on my data directory, I get:

3004 data/test.source
3004 data/test.target
686623 data/train.source
686623 data/train.target
2999 data/val.source
2999 data/val.target

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-12-29T11:51:28Z

Hey @mespla,

Thanks for your issue! I'm afraid at the moment, we're really unsure whether we want to keep supporting all the bash scripts in examples/seq2seq. In a couple of weeks, we plan on having a single concise training script for seq2seq models.

cc @sgugger

Also tagging @stas00, @patil-suraj in case you know a quick fix to this problem or have encountered this before as well.

stas00 · 2020-12-29T18:23:36Z

When I run it on a single GPU, I get a memory error, as one GPU has not enough memory to load the MBart model. When I try to distribute the model on two GPUs, I get a RuntimeError:
RuntimeError: Input, output and indices must be on the current device

Are you implying you've changed modeling_bart.py to support Model Parallelism? Surely that would explain that error. You probably switched the layers to different devices but not the inputs/indices.

I'm currently in the process of studying t5 MP we already have and about to do the same for Bart - i.e. add MP to Bart and its sub-classes (so MBART is included).

If you mean something else by " I try to distribute the model on two GPUs" please clarify what you mean.

If you're just trying to use 2 GPUs to solve the problem of not being able to load even one batch onto a single GPU, then just using 2 gpus won't do any good. In fact what you did (your command line) takes even more memory, since it activates DataParallel which is less memory efficient than DistributedDataParallel. See README.md in that folder for how to run DDP.

But fear not, have a look at these 2 possible solutions for you not being able to fit the model onto a single GPU:
#9311 (comment)
and another one will join soon once DeepSpeed has been integrated.

stas00 · 2020-12-29T18:53:28Z

oh, wait a sec, I have only now noticed you used --model_parallel. This flag currently would work only for t5 and gpt2 - as the only 2 models that have been ported to support MP.

So trainer should assert if this flag is used and arch isn't supporting MP.

This PR #9347 adds this assert.

And hopefully Bart will support MP soon as well. Until then try my suggestions in the comment above.

stas00 mentioned this issue Dec 29, 2020

[trainer] --model_parallel hasn't been implemented for most models #9347

Merged

2 tasks

LysandreJik closed this as completed in #9347 Jan 5, 2021

PeterAJansen mentioned this issue Jan 21, 2021

T5 Model Parallelism in 4.3.0 #9718

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336

"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336

mespla commented Dec 29, 2020

patrickvonplaten commented Dec 29, 2020

stas00 commented Dec 29, 2020 •

edited

Loading

stas00 commented Dec 29, 2020 •

edited

Loading

"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336

"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336

Comments

mespla commented Dec 29, 2020

Environment info

Information

To reproduce

patrickvonplaten commented Dec 29, 2020

stas00 commented Dec 29, 2020 • edited Loading

stas00 commented Dec 29, 2020 • edited Loading

stas00 commented Dec 29, 2020 •

edited

Loading

stas00 commented Dec 29, 2020 •

edited

Loading