You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the amazing repo! I encounter a bug when I run the minimalistic example given in README:
Describe the bug
RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.
To Reproduce
minimalistic example in README.md
Questions
I find the bug can be fixed if I add DDPStrategy(find_unused_parameters=True) according to the suggestion:
trainer = L.Trainer(
max_epochs=1, # only few epochs
accelerator="gpu", # use GPU if available, else you can use others as "cpu"
logger=None, # can replace with WandbLogger, TensorBoardLogger, etc.
precision="16-mixed", # Lightning will handle faster training with mixed precision
gradient_clip_val=1.0, # clip gradients to avoid exploding gradients
reload_dataloaders_every_n_epochs=1, # necessary for sampling new data
strategy=DDPStrategy(find_unused_parameters=True), # TODO can we add it?
)
But I'm wondering whether the training can still work well. Is there any bad effect introduced by DDPStrategy(find_unused_parameters=True)?
The text was updated successfully, but these errors were encountered:
Hi there!
To answer your question: no effect in terms of accuracy, there might be some minor overhead as described here, and this may be solved with future TorchRL versions.
The minimalistic example was not tested with multiple GPUs (as I believe is your case) since we normally train with Hydra that set that up automatically. We now included some better automatic DDP (that sets that parameter automatically which is due to RL environments), and you may use the newer RL4COTrainer see new minimalistic example here. We will release it soon with v0.1.1, you may already use it with the nightly version in the meantime with pip install git+https://github.com/kaist-silab/rl4co.
Thanks for the amazing repo! I encounter a bug when I run the minimalistic example given in README:
Describe the bug
To Reproduce
minimalistic example in README.md
Questions
I find the bug can be fixed if I add
DDPStrategy(find_unused_parameters=True)
according to the suggestion:But I'm wondering whether the training can still work well. Is there any bad effect introduced by
DDPStrategy(find_unused_parameters=True)
?The text was updated successfully, but these errors were encountered: