Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine quality of OCR for tables #866

Open
geoHeil opened this issue Feb 2, 2025 · 1 comment
Open

refine quality of OCR for tables #866

geoHeil opened this issue Feb 2, 2025 · 1 comment
Labels
question Further information is requested table structure

Comments

@geoHeil
Copy link

geoHeil commented Feb 2, 2025

Given the files of #806 (comment)

see #806 (comment) for the files

how can the table detection be streamlined so

396 |                 |
397 | 8173.3 >16666.7 |

this error does not occur?

@geoHeil geoHeil added the question Further information is requested label Feb 2, 2025
@geoHeil
Copy link
Author

geoHeil commented Feb 2, 2025

In fact, here is a 2nd example - here the columns are detected in a flipped way for both easyocr and rapidocr and rapidocr with the EN model.

Table 5

| MRTX849 (nM)   |         | Example 5 (nm)   |
|----------------|---------|------------------|
| >3000          |   32    | G12A             |
| 16.62          |   28.1  | G12C             |
| >3000          |   20.25 |                  |
| >3000          | 1742    | G12R             |
| >3000          |   94    |                  |
| >3000          |   50    | G12w             |
| >3000          |  610    | G13D             |
| >3000          |   58    | Q61H             |

given an input of

WO2022132200-t5.pdf

observations:

  • columns are flipped
  • header is no longer aligned with the columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested table structure
Projects
None yet
Development

No branches or pull requests

2 participants