[BUG] RuntimeError in minimalistic example #87

yuxiaooye · 2023-07-27T07:51:18Z

Thanks for the amazing repo! I encounter a bug when I run the minimalistic example given in README:

Describe the bug

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

To Reproduce

minimalistic example in README.md

Questions

I find the bug can be fixed if I add DDPStrategy(find_unused_parameters=True) according to the suggestion:

trainer = L.Trainer(
    max_epochs=1,  # only few epochs
    accelerator="gpu",  # use GPU if available, else you can use others as "cpu"
    logger=None,  # can replace with WandbLogger, TensorBoardLogger, etc.
    precision="16-mixed",  # Lightning will handle faster training with mixed precision
    gradient_clip_val=1.0,  # clip gradients to avoid exploding gradients
    reload_dataloaders_every_n_epochs=1,  # necessary for sampling new data

    strategy=DDPStrategy(find_unused_parameters=True),  # TODO can we add it?
)

But I'm wondering whether the training can still work well. Is there any bad effect introduced by DDPStrategy(find_unused_parameters=True)?

The text was updated successfully, but these errors were encountered:

fedebotu · 2023-08-01T16:37:11Z

Hi there!
To answer your question: no effect in terms of accuracy, there might be some minor overhead as described here, and this may be solved with future TorchRL versions.
The minimalistic example was not tested with multiple GPUs (as I believe is your case) since we normally train with Hydra that set that up automatically. We now included some better automatic DDP (that sets that parameter automatically which is due to RL environments), and you may use the newer RL4COTrainer see new minimalistic example here. We will release it soon with v0.1.1, you may already use it with the nightly version in the meantime with pip install git+https://github.com/kaist-silab/rl4co.

Thanks for spotting the bug ;)

yuxiaooye · 2023-08-02T06:40:37Z

I see! Thanks for your reply!

fedebotu · 2023-08-02T12:58:56Z

Marking as closed. Note that we released v0.1.1!

yuxiaooye added the bug Something isn't working label Jul 27, 2023

yuxiaooye assigned fedebotu Jul 27, 2023

fedebotu added a commit that referenced this issue Aug 1, 2023

[Feat] add better automatic DDP #87

ccf0d33

fedebotu closed this as completed Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] RuntimeError in minimalistic example #87

[BUG] RuntimeError in minimalistic example #87

yuxiaooye commented Jul 27, 2023

fedebotu commented Aug 1, 2023 •

edited

Loading

yuxiaooye commented Aug 2, 2023

fedebotu commented Aug 2, 2023

[BUG] RuntimeError in minimalistic example #87

[BUG] RuntimeError in minimalistic example #87

Comments

yuxiaooye commented Jul 27, 2023

Describe the bug

To Reproduce

Questions

fedebotu commented Aug 1, 2023 • edited Loading

yuxiaooye commented Aug 2, 2023

fedebotu commented Aug 2, 2023

fedebotu commented Aug 1, 2023 •

edited

Loading