Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some images are not extracted as viewable images #434

Closed
pietermarsman opened this issue May 24, 2020 · 0 comments · Fixed by #737
Closed

Some images are not extracted as viewable images #434

pietermarsman opened this issue May 24, 2020 · 0 comments · Fixed by #737

Comments

@pietermarsman
Copy link
Member

pietermarsman commented May 24, 2020

Bug report

If an image encoding is not recognized the image is written as a .img file. But the pdf can show the images so the extracted images should also be viewable.

Example pdf: https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf

Command:

pdf2txt.py example.pdf --output-dir cats-and-dogs

It extracts Im1.8.16x237.img which cannot be shown by an image viewer. Also all the bmp images seems to be broken.

Solution: detect more encodings and use those to write the images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant