-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"RuntimeError: Input, output and indices must be on the current device" when trying to finetune MBart #9336
Comments
Hey @mespla, Thanks for your issue! I'm afraid at the moment, we're really unsure whether we want to keep supporting all the bash scripts in cc @sgugger Also tagging @stas00, @patil-suraj in case you know a quick fix to this problem or have encountered this before as well. |
Are you implying you've changed modeling_bart.py to support Model Parallelism? Surely that would explain that error. You probably switched the layers to different devices but not the inputs/indices. I'm currently in the process of studying t5 MP we already have and about to do the same for Bart - i.e. add MP to Bart and its sub-classes (so MBART is included). If you mean something else by " I try to distribute the model on two GPUs" please clarify what you mean. If you're just trying to use 2 GPUs to solve the problem of not being able to load even one batch onto a single GPU, then just using 2 gpus won't do any good. In fact what you did (your command line) takes even more memory, since it activates DataParallel which is less memory efficient than DistributedDataParallel. See README.md in that folder for how to run DDP. But fear not, have a look at these 2 possible solutions for you not being able to fit the model onto a single GPU: |
oh, wait a sec, I have only now noticed you used So trainer should assert if this flag is used and arch isn't supporting MP. This PR #9347 adds this assert. And hopefully Bart will support MP soon as well. Until then try my suggestions in the comment above. |
Environment info
transformers
versions 4.1.1 (installed with pip) and 4.2.2 (installed from master branch of the repository)Information
Model I am using (Bert, XLNet ...): MBart -> facebook/mbart-large-cc25
The problem arises when using: the official example scripts: (details below)
The tasks I am working on is: my own task or dataset: (details below)
I am fine-tuning MBart using my own dataset, using the
examples/seq2seq/finetune.sh
script. When I run it on a single GPU, I get a memory error, as one GPU has not enough memory to load the MBart model. When I try to distribute the model on two GPUs, I get a RuntimeError:RuntimeError: Input, output and indices must be on the current device
To reproduce
I am running the script in the following way:
CUDA_VISIBLE_DEVICES=0,1 transformers/examples/seq2seq/finetune.sh --model_name_or_path "facebook/mbart-large-cc25" --output_dir output --data_dir data --overwrite_output_dir --model_parallel --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --freeze_encoder --freeze_embeds --tgt_lang "en"
I have also tried:
CUDA_VISIBLE_DEVICES=0,1 transformers/examples/seq2seq/finetune.sh --model_name_or_path "facebook/mbart-large-cc25" --output_dir output --data_dir data --overwrite_output_dir --model_parallel --tgt_lang "en"
I also tried limiting the length of source and target sentences by trying several values for
--max_target_length
and--max_source_length'
. In addition, I tried using more GPUs (up to 4).If I run
wc -l
on mydata
directory, I get:The text was updated successfully, but these errors were encountered: