-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jbig2 extractor does not seem to be working correctly #652
Comments
unfortunately, pdfimages (as part of poppler) is licensed under GPL 3, so we cannot read and translate their code and maintain this projects's MIT license |
comparing the output of pdfimages, it looks like For jbig2 images, we would produce two files, a jb2g and jb2e. Then a user could use jbig2dec to convert these files, together, into pbm or png file. Obviously, it would be better if pdfminer could output a single jb2 file, but I don't know how to construct a valid file. It seems better to output two files that can be used, versus one file that is invalid, as is the status quo. Thoughts @pietermarsman? |
i ended up figuring out how to put everything together in a single jb2 file, and have submitted a pull request. #653 |
@fgregg Thanks for figuring this out! Will review / merge the PR. |
The jb2 file that is extracted from a pdf with jbig2 encoded images does not seem correct. This includes the sample file
This produces a blank PNG file: XIPLAYER0.png
The original pdf file is okay, and we can extract the image using pdfimages.
This problem is not due to a mistranslation of @side2k's original PR.
I checked out their original Python 2.7 branch, and the jb2 file that they produce is exactly the same as what's on the HEAD of pdfminer.six's
develop
branch.The text was updated successfully, but these errors were encountered: