DDP GPU utilization problem #10670
Unanswered
dragondx
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 4 replies
-
Dear @dragondx, Any chance you could provide a reproducible code snippet with mocked data? Best, |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Training MLP using DDP with 2gpus. Pretty standard code.
We observe an effect where the training speed slows down over time. At initial iterations, we get 1s/it. Both GPUs gets 100% utilization with occasional low utilization presumably it is doing gathering/backprop ops. As training goes on for a few thousands iterations, we get 2s/it. One gpu get 100% utilization consistently (the master?), the other gpu waits a long time with low utilization (1-2%) before getting short burst of 100% utilization. We also noticed that CPU utilization (80% along all cores) tends to be much higher at earlier iterations. After a few thousand iterations, CPU utilization is low (20%, with occasional 90% in some cores). Is this some sort of timing problem for dataloading? There is no preprocessing for CPU other than reading data from disk.
We use a custom dataloader for our use case, since we have a lot of data. Using numpy memmap to get the datapoint.
Dataloder params:
torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, persistent_workers=True, pin_memory=True, num_workers=16, prefetch_factor=128)
We wonder what is causing this. Any help will be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions