-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan during training. #11
Comments
@Caius-Lu @songdejia has it occurred to you too? I am trying to debug. suggestions welcome. |
Getting the same issue |
I guess due to some sort of issues caused by data augmentation, some data became unpredictably wrong, and causes the loss of this batch become nan. Seeking which specific training images may be the reason can be tedious, so Mm solution is to check if the loss is nan before back propagation, and if so, skip this batch without any updates. Specifically, I modified the code in loss_check = loss1.cpu().detach().numpy()
if np.any(np.isnan(loss_check)):
print('loss = nan, skip this batch')
optimizer.zero_grad()
continue |
@BYJRK What were your results on the ICDAR dataset. |
@saharudra I can at most achieve 0.7 hmean after modifying the thresholds in |
Hi @songdejia, thanks for trying to port EAST from tensorflow. But while trying to train this model on COCO 2014 or Oxford syn text, I get nan during training. Any ideas?
Please see below training Log:
Cross point does not exist
point dist to line raise Exception
point dist to line raise Exception
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
point dist to line raise Exception
point dist to line raise Exception
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
point dist to line raise Exception
point dist to line raise Exception
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
point dist to line raise Exception
point dist to line raise Exception
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Cross point does not exist
Exception continue
Exception in getitem, and choose another index:4393
EAST <==> TRAIN <==> Epoch: [0][1/227] Loss 0.0231 Avg Loss 0.0250)
EAST <==> TRAIN <==> Epoch: [0][2/227] Loss 0.0282 Avg Loss 0.0260)
EAST <==> TRAIN <==> Epoch: [0][3/227] Loss 0.0313 Avg Loss 0.0273)
EAST <==> TRAIN <==> Epoch: [0][4/227] Loss 0.0271 Avg Loss 0.0273)
EAST <==> TRAIN <==> Epoch: [0][5/227] Loss 0.0206 Avg Loss 0.0262)
EAST <==> TRAIN <==> Epoch: [0][6/227] Loss 0.0300 Avg Loss 0.0267)
EAST <==> TRAIN <==> Epoch: [0][7/227] Loss 0.0239 Avg Loss 0.0264)
EAST <==> TRAIN <==> Epoch: [0][8/227] Loss 0.0271 Avg Loss 0.0265)
EAST <==> TRAIN <==> Epoch: [0][9/227] Loss 0.0284 Avg Loss 0.0266)
EAST <==> TRAIN <==> Epoch: [0][10/227] Loss 0.0197 Avg Loss 0.0260)
EAST <==> TRAIN <==> Epoch: [0][11/227] Loss nan Avg Loss nan)
EAST <==> TRAIN <==> Epoch: [0][12/227] Loss nan Avg Loss nan)
The text was updated successfully, but these errors were encountered: