CombinedLoader changes sampling in DDP #7013
Labels
bug
Something isn't working
data handling
Generic data-related topic
distributed
Generic distributed-related topic
help wanted
Open to be worked on
priority: 2
Low priority task
🐛 Bug
The behavior of the validation dataloader sampling changes if you use the
CombinedLoader
with ddp in comparison to using a single dataloader. TheCombinedLoader
does not split and distribute validation dataset on the gpus, but all gpus get the full validation set. The problem is resolved when you explicitly pass theDistributedSampler
to the dataloader.Please reproduce using the BoringModel
https://gist.github.com/lukashermann/b19964ba32c9bde241be3e54deea01ad
To Reproduce
To reproduce run the file and check cmd line output.
Expected behavior
Single dataloader:
device cuda:1 [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63]
device cuda:0 [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62]
Combined dataloader:
device cuda:0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
device cuda:1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
Is this behavior intended?
Environment
- GPU:
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- available: True
- version: 11.1
- numpy: 1.19.2
- pyTorch_debug: False
- pyTorch_version: 1.8.0
- pytorch-lightning: 1.3.0rc1
- tqdm: 4.53.0
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.5
- version: removed reduce on non-loss outputs from dp #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021
The text was updated successfully, but these errors were encountered: