-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code freezes before validation sanity check when using DDP #7336
Comments
Hi, You can't use the ddp accelerator in notebooks. Use ddp_spawn or dp for multi gpu training. Hop this helps. |
@awaelchli I tried to run the code again on PyCharm, using
Same thing happens if I use
EDIT: if I use EDIT 2: I also tried to launch the script from terminal, code still not working. |
Is it maybe because of Comet? Have you tried turning off the logger? Not sure what's going on |
🐛 Bug
Grretings from Italy!
I recently moved to PyTorch and a friend of mine introduced me to PL.
I'm coding an autoencoder (whose architecture is still pretty simple) using a custom loss function
which works on the hidden layer output. The link below leads to the github repo:
https://github.com/notprime/custom_autoencoder/blob/main/autoenc_torch.ipynb
I read the documentation about the Multi-GPU Training, so I used 'ddp' as accelerator,
and used
gpus = -1
to select all the gpus.However, when I launch the script, the code freezes there:
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
I tried to wait 10-15 minutes, but nothing happened.
Instead, if I use 'dp' as accelerator, everything works fine, and the script doesn't freeze.
The documentation says that ddp is preferred over dp because it's faster:
is there something I did wrong? I really don't know why the code stucks if I use ddp !
Thanks in advance!
The text was updated successfully, but these errors were encountered: