auto_scale_batch_size fails with datamodule in pl==1.2.*, succeeds in pl==1.1.8 #6335

colllin · 2021-03-03T22:29:03Z

🐛 Bug

Running trainer = pl.Trainer(auto_scale_batch_size=True); trainer.tune(model, datamodule=dm) succeeds in pl==1.1.8, but fails in pl==1.2.* (tested both 1.2.0 and 1.2.1) with error:

Traceback (most recent call last):
  File "train.py", line 42, in <module>
    trainer.tune(model, datamodule=dm)
  File "/home/ubuntu/.local/share/virtualenvs/__project__/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in tune
    self.tuner.tune(model, train_dataloader, val_dataloaders, datamodule)
  File "/home/ubuntu/.local/share/virtualenvs/__project__/lib/python3.8/site-packages/pytorch_lightning/tuner/tuning.py", line 46, in tune
    self.scale_batch_size(
  File "/home/ubuntu/.local/share/virtualenvs/__project__/lib/python3.8/site-packages/pytorch_lightning/tuner/tuning.py", line 104, in scale_batch_size
    return scale_batch_size(
  File "/home/ubuntu/.local/share/virtualenvs/__project__/lib/python3.8/site-packages/pytorch_lightning/tuner/batch_size_scaling.py", line 79, in scale_batch_size
    raise MisconfigurationException(f'Field {batch_arg_name} not found in both `model` and `model.hparams`')
pytorch_lightning.utilities.exceptions.MisconfigurationException: Field batch_size not found in both `model` and `model.hparams`

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1vgPLCwLg7uACtb3fxVp-t-__NtZ3onsD?usp=sharing

Expected behavior

Prior to pl==1.2.0, it successfully detected and tuned the dm.batch_size property.

Environment

CUDA:
- GPU:
  - Tesla T4
- available: True
- version: 10.1
Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.7.1+cu101
- pytorch-lightning: 1.2.1
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

The text was updated successfully, but these errors were encountered:

akihironitta · 2021-03-04T08:50:33Z

@colllin Thank you for reporting the issue. I confirmed the bug in 1.2.* which didn't happen in <1.2. I think this issue should be addressed in #5968 by @awaelchli.

rohitgr7 · 2021-03-05T14:38:54Z

closing this then. duplicate: #5967

colllin added bug Something isn't working help wanted Open to be worked on labels Mar 3, 2021

akihironitta added priority: 1 Medium priority task tuner with code labels Mar 4, 2021

rohitgr7 closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_scale_batch_size fails with datamodule in pl==1.2.*, succeeds in pl==1.1.8 #6335

auto_scale_batch_size fails with datamodule in pl==1.2.*, succeeds in pl==1.1.8 #6335

colllin commented Mar 3, 2021 •

edited

Loading

akihironitta commented Mar 4, 2021

rohitgr7 commented Mar 5, 2021

auto_scale_batch_size fails with datamodule in pl==1.2.*, succeeds in pl==1.1.8 #6335

auto_scale_batch_size fails with datamodule in pl==1.2.*, succeeds in pl==1.1.8 #6335

Comments

colllin commented Mar 3, 2021 • edited Loading

🐛 Bug

Please reproduce using the BoringModel

Expected behavior

Environment

akihironitta commented Mar 4, 2021

rohitgr7 commented Mar 5, 2021

colllin commented Mar 3, 2021 •

edited

Loading