-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why the recognition accuracy different from paper? #4
Comments
Hi @zobeirraisi I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried. |
|
Thanks @zobeirraisi |
There are label nosie in IC15 test set, and I have relabeled. |
I checked my prediction result, I don't know why our result different. For example, word_26_00.png##Kappa##Kappa## |
I think you should crop the test image by coords.txt first, then predict. |
@Jyouhou @zobeirraisi Hi, can you tell us more about your pretrained model |
According to my guess, the performance of this implementation should be 85% on IIIT-5K. |
It is not because of the data preprocessing, the evaluation of this code is wrong. |
@delveintodetail Is there wrong in the predict.py file? |
I have been training this model on the ICDAR 2015 Word Recognition dataset (IC15) with no relabeling of the mislabeled data using the code provided. In order to recognized all the characters in the datasets, the vocab used was: If one keeps training and relies only on the loss on the test dataset, the model will overfit and I have obtained different models with 100% on the test dataset. Typically, such models may have relatively poor performance on the training data itself: Starting from scratch, training and saving only the models that improves both the inference performance on both the test data and the training data, then one can get results like this after 1533 epochs using batch_size = 64: Inspection shows that some of these models give the same answer as the human on some of the mislabeled data, at least on the test dataset. As training progresses and new models are saved, the inference performance particularly improves on the training dataset while more slowly improving on the testing dataset. Thus this models seems an overkill on the ICDAR 2015 dataset and the mislabeling makes comparison difficult. Update: The model continued training and these are the results: inference on test data: Other training and tests with synthetic images suggest that it does not generalize so well. |
The results above were obtained with the code provided as is. |
I applied the pre-trained model on ICDAR15 datasets, but the results are different from the reported ones in the paper?
The text was updated successfully, but these errors were encountered: