Why the recognition accuracy different from paper? #4

zobeirraisi · 2020-03-21T21:49:30Z

I applied the pre-trained model on ICDAR15 datasets, but the results are different from the reported ones in the paper?

Jyouhou · 2020-03-22T00:33:25Z

I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

zobeirraisi · 2020-03-22T00:44:46Z

Hi @zobeirraisi

I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou
This is my results for ICDAR15 dataset:
Link

Jyouhou · 2020-03-22T00:53:00Z

Thanks @zobeirraisi
So the actual accuracy is ~71%
We can wait for responses from the authors

fengxinjie · 2020-03-22T00:57:54Z

There are label nosie in IC15 test set, and I have relabeled.

fengxinjie · 2020-03-22T01:55:03Z

Hi @zobeirraisi
I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou
This is my results for ICDAR15 dataset:
Link

I checked my prediction result, I don't know why our result different. For example, word_26_00.png##Kappa##Kappa##
word_27_00.png##CAUTION##CAUTION##
word_50_00.png##l:HOU##:HOU##
... are all corrent in my prediction.

fengxinjie · 2020-03-22T02:25:31Z

Hi @zobeirraisi
I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou
This is my results for ICDAR15 dataset:
Link

I think you should crop the test image by coords.txt first, then predict.

li10141110 · 2020-03-27T10:14:20Z

@Jyouhou @zobeirraisi Hi, can you tell us more about your pretrained model

delveintodetail · 2020-03-31T09:08:19Z

According to my guess, the performance of this implementation should be 85% on IIIT-5K.

delveintodetail · 2020-04-01T01:32:42Z

@delveintodetail have you trained.
the developer didnot reply clearly in the matter of training.
whether he crops the icdar words, or what....

It is not because of the data preprocessing, the evaluation of this code is wrong.

li10141110 · 2020-04-01T07:31:01Z

@delveintodetail Is there wrong in the predict.py file?

gussmith · 2020-04-21T19:11:40Z

I have been training this model on the ICDAR 2015 Word Recognition dataset (IC15) with no relabeling of the mislabeled data using the code provided.

In order to recognized all the characters in the datasets, the vocab used was:
vocab = "<=,.+:;-!?$%#&*' ()@éÉ/\[]0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ>"+'"'+"´"+"΅"

If one keeps training and relies only on the loss on the test dataset, the model will overfit and I have obtained different models with 100% on the test dataset.
This means that even the mislabeled data is reproduced exactly just as the human labeled it with the errors.
(Note: the model is only trained on the training dataset! Never on the test dataset! Yet, the best models on the inference on the test datasets were saved as training progressed).

Typically, such models may have relatively poor performance on the training data itself:
On testing data:
Summary: # wrong: 0 # total: 2077 wrong 0.0%
On training data:
Summary: # wrong: 1959 # total: 4468 wrong 43.85%

Starting from scratch, training and saving only the models that improves both the inference performance on both the test data and the training data, then one can get results like this after 1533 epochs using batch_size = 64:
on test data:
Summary: #wrong: 11 #total: 2077 wrong 0.5%
on training data:
Summary: #wrong: 620 #total: 4468 wrong 13.9%

Inspection shows that some of these models give the same answer as the human on some of the mislabeled data, at least on the test dataset.

As training progresses and new models are saved, the inference performance particularly improves on the training dataset while more slowly improving on the testing dataset.

Thus this models seems an overkill on the ICDAR 2015 dataset and the mislabeling makes comparison difficult.

Update: The model continued training and these are the results:
loss for test during training: 0.006546
loss for training data during training: 0.027809

inference on test data:
Summary: #wrong: 0 #total: 2077 wrong 0.0%
inference on training data:
Summary: #wrong: 129 #total: 4468 wrong 2.887%

Other training and tests with synthetic images suggest that it does not generalize so well.

gussmith · 2020-04-23T04:48:20Z

The results above were obtained with the code provided as is.
Since then, I realized from my results and reading others that
nevertheless, there is apparently an error in the code, which essentially trains the network when the validation is run. It is part of the initial code provided in the Annotated Transformer that the authors refer to.
see issue testloss would lead to model update on eval mode #7
#7

ghost mentioned this issue Mar 31, 2020

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte #8

Closed

delveintodetail mentioned this issue Apr 1, 2020

@delveintodetail Is there wrong in the predict.py file? #9

Open

gussmith mentioned this issue Apr 23, 2020

Documentation on training and predicting #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the recognition accuracy different from paper? #4

Why the recognition accuracy different from paper? #4

zobeirraisi commented Mar 21, 2020 •

edited

Loading

Jyouhou commented Mar 22, 2020

zobeirraisi commented Mar 22, 2020 •

edited

Loading

Jyouhou commented Mar 22, 2020

fengxinjie commented Mar 22, 2020 •

edited

Loading

fengxinjie commented Mar 22, 2020

fengxinjie commented Mar 22, 2020

li10141110 commented Mar 27, 2020

delveintodetail commented Mar 31, 2020

delveintodetail commented Apr 1, 2020

li10141110 commented Apr 1, 2020

gussmith commented Apr 21, 2020 •

edited

Loading

gussmith commented Apr 23, 2020

Why the recognition accuracy different from paper? #4

Why the recognition accuracy different from paper? #4

Comments

zobeirraisi commented Mar 21, 2020 • edited Loading

Jyouhou commented Mar 22, 2020

zobeirraisi commented Mar 22, 2020 • edited Loading

Jyouhou commented Mar 22, 2020

fengxinjie commented Mar 22, 2020 • edited Loading

fengxinjie commented Mar 22, 2020

fengxinjie commented Mar 22, 2020

li10141110 commented Mar 27, 2020

delveintodetail commented Mar 31, 2020

delveintodetail commented Apr 1, 2020

li10141110 commented Apr 1, 2020

gussmith commented Apr 21, 2020 • edited Loading

gussmith commented Apr 23, 2020

zobeirraisi commented Mar 21, 2020 •

edited

Loading

zobeirraisi commented Mar 22, 2020 •

edited

Loading

fengxinjie commented Mar 22, 2020 •

edited

Loading

gussmith commented Apr 21, 2020 •

edited

Loading