You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Hi, I am trying to train a ResNet50 for the initial representation learning stage on Python 3.7 and PyTorch version 0.4.1.post2, and for some reason every time I run this I get an out of memory error, even when running on multiple GPUs with 24 GB memory. This most likely has to do with backwards passing, as the script finishes running if I just comment out loss.backward(). I've also looked into running with no_grad() but with errors. The command I used was:
The code worked perfectly with ResNet10, so I was wondering if there could be a solution to this issue? Do I have to run it on a different version of PyTorch or is it possible to fix with my current setup? Thank you for your help!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi, I am trying to train a ResNet50 for the initial representation learning stage on Python 3.7 and PyTorch version 0.4.1.post2, and for some reason every time I run this I get an out of memory error, even when running on multiple GPUs with 24 GB memory. This most likely has to do with backwards passing, as the script finishes running if I just comment out loss.backward(). I've also looked into running with no_grad() but with errors. The command I used was:
python ./main.py --model ResNet50
--traincfg base_classes_train_template.yaml
--valcfg base_classes_val_template.yaml
--print_freq 10 --save_freq 10
--aux_loss_wt 0.02 --aux_loss_type sgm
--checkpoint_dir checkpoints/ResNet50_sgm
The code worked perfectly with ResNet10, so I was wondering if there could be a solution to this issue? Do I have to run it on a different version of PyTorch or is it possible to fix with my current setup? Thank you for your help!
The text was updated successfully, but these errors were encountered: