-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN's when training COCO datset #16
Comments
It's label issue. Following this repo leads me got zeros outputs on coco dataset too. After serveral centuries debug I finally found the label is totally wrong:
take the 26x26 grid size as an example, as you can see, it all becomes 0s. However I do not got nan when training.............. but can not do prediction cause output is also 0s........ |
Do you solved it? |
Same here |
I think it's learing rate issue, I have no time to test it but suggest lower down learing rate a little bit. |
@smohan10 Any insights? having same issue.. |
@nivsmall You can checkout my new yolov3 branch here, I have tested on a detection-minist dataset and it works, as it needs tensorflow2.0 you can training on coco for testing, I also provided some scripts generate annotation format which model training needs. https://github.com/jinfagang/yolov3_tf2 If you succeed trained a detector, further, you can export model as tflite and try some int8 or fp16 stuff which provided in tensorflow2.0 can accelerate inference speed. |
Not the labels to blame. But output object score is pretty low. Modify the score_threhold in tf.image.combined_non_max_suppression(say from 0.1 to 0.0001) to check if any potential detections. |
@makercob Hi, |
@AnaRhisT94 In my case, NaN was encountered when training with SGD optimizer. Currently, I'm training the model with RectifiedAdam, but confidence score is so much lower than expected. |
if you guys are still having problem, this guy's experience might help #128 |
Getting nan when training COCO dataset. Generated tf records using object detection's create_coco_tf_record script.$(DATA_PATH)/coco_train.record* --val_dataset $ (DATA_PATH)/coco_val.record* --epochs 100 --mode eager_tf --transfer fine_tune
From your repo, followed the instructions to download weights and convert them. Ran training with the following command line:
python train.py --batch_size 8 --dataset
This is python3.0, tensorflow 2.0 gpu version.
The text was updated successfully, but these errors were encountered: