You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Error when training PSPNet on Cityscapes dataset using GluonCV #17439
Problem Description
The problem is when I train a PSPNet using GluonCV semantic segmentation library on the Cityscapes dataset, the training will stuck (hang) right after it started.
Debugging
After bisect the date of failure, I find the first bad commit is PR 13896, which introduced this problem.
In addition, I guess is that the root cause is related to multiprocessing + cudnn dropout. Thus, we will need a minimal reproducible code snippet first.
barry-jin
changed the title
[RFC] Turn Off CuDNN in Dropout When Training PSPNet
[RFC] To Fix the Hang Problem in Training PSPNet
Sep 1, 2020
Error when training PSPNet on Cityscapes dataset using GluonCV #17439
Problem Description
The problem is when I train a PSPNet using GluonCV semantic segmentation library on the Cityscapes dataset, the training will stuck (hang) right after it started.
Debugging
After bisect the date of failure, I find the first bad commit is PR 13896, which introduced this problem.
Proposed solutions
Need more efforts.
References
Issue #17439, PR #13896
The text was updated successfully, but these errors were encountered: