You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your excellent work. I found that the training is quite slow and it seems that the data loading time is the bottleneck. Could you please tell me how long it takes to train a model? Besides, what are the values of CPUS_PER_TASK and workers_per_gpu when training with SLURM? BTW, are there any other measures you used to speed up the training processure?
The text was updated successfully, but these errors were encountered:
@linjing7 Each experiment took roughly 2 days on 8 GPUs. I have also experienced the long data loading bottleneck, which seems to be caused by recent upgrades in the mmhuman3d pipeline. I resolved it by training with cache.
I have added an example config file here 34641c6 . Before training, create an empty folder data/cache and the cache files will be generated automatically during training.
workers_per_gpu in the config refers to the number of worker to pre-fetch data for each single GPU. It needs to match your CPU cores.
CPUS_PER_TASK in the slurm script refers to the number of CPUs to be allocated per task.
Hi, thanks for your excellent work. I found that the training is quite slow and it seems that the data loading time is the bottleneck. Could you please tell me how long it takes to train a model? Besides, what are the values of
CPUS_PER_TASK
andworkers_per_gpu
when training with SLURM? BTW, are there any other measures you used to speed up the training processure?The text was updated successfully, but these errors were encountered: