Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on post-processing table structure with text bounding boxes #61

Open
RobAcc22 opened this issue Jul 28, 2022 · 3 comments
Open
Labels
question Further information is requested

Comments

@RobAcc22
Copy link

Hello,
I am working with the table structure detection model, using it over table images. I extract the structure and the text, using CRAFT for the detection of the text bounding boxes and the table-transformer model for the table structure. To post-process the table structure prediction I use the text bounding boxes with the postprocess functions.

I encounter the following problem when following this approach. For some table images in which the text in a cell is a single character, CRAFT commonly detects those individual characters as together, producing large text bounding boxes like in the image below (second column).
22_07_28_18_17_20_high

The issue is when I use these bounding boxes, some of the predicted rows are enlarged so as they contain this large OCR bounding boxes. In the image below you see the raw predicted rows, without any postprocessing.

Empty table-07_in_table row

As you can see the predicted rows are accurate. But when I take the predicted table structure and put it together with the OCR bounding boxes, using the postprocess module and the function objects_to_cells, the rows transform to this:
Empty table-07_out_rows

I hope it is visible that there is a green dotted row that goes from B to H characters, including exactly the text bounding box. I have been looking at this problem and it seems to be produced in the table_structure_to_cells function, in lines 810-844 of postprocess module.

I was wondering if you could suggest of a way to improve the postprocessing operations so this does not occur. Maybe adding a further step of postprocessing or modifying those lines of code. Or if you know of an algorithm that works better than CRAFT to detect text I am also interested.

Many thanks in advance

.

@bsmock
Copy link
Collaborator

bsmock commented Jul 28, 2022

First of all, congrats on integrating OCR with the model code. This looks very well done and we hope it inspires others to do the same!

As far as your problem with the OCR is concerned, I don't see any easy way to overcome it using post-processing. If OCR does not give you a bounding box for B and C separately, you have no way to split that large text bounding box and know where B is and where C is within the box. So then you have no way to slot B and C into their correct cells using the model output.

One thing you could do is tell the post-processing code to ignore the word bounding boxes and keep its cell bounding boxes as-is. Then you could crop your input image at each cell bounding box and pass each individually to an OCR function to get the text of each cell. It sounds like a painful solution to me but could get the job done.

In my view, the best solution would be to get better OCR. Your case is a tricky one, it's easy to understand why the OCR naively thinks vertical characters stacked over each other would go together as a word.

You could try PyTesseract as an open source solution. I've also been very impressed with OCR from Azure Cognitive Services. I suggest giving these a try.

Cheers,
Brandon

@bsmock bsmock added the question Further information is requested label Jul 28, 2022
@RobAcc22
Copy link
Author

RobAcc22 commented Aug 1, 2022

Thanks for the answer.

The thing is the post-processing is quite useful in some other cases, so I'd prefer keeping this step. I will try to find a way to improve OCR bounding boxes.

Cheers,
Roberto

@zackwylde-cmd
Copy link

hello,
@RobAcc22 can u share the inference code for TSR
thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants