We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfortunately I cannot include the PDF as it is a bank statement, but hopefully the details below are enough.
The error is as follows:
DEBUG:pdfminer.pdfdocument:trailer={'Size': 70, 'Root': <PDFObjRef:69>, 'Info': <PDFObjRef:3>, 'Encrypt': <PDFObjRef:2>} INFO:pdfminer.pdfdocument:trailer: {'Size': 70, 'Root': <PDFObjRef:69>, 'Info': <PDFObjRef:3>, 'Encrypt': <PDFObjRef:2>} Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/tools/pdf2txt.py", line 195, in <module> sys.exit(main()) File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/tools/pdf2txt.py", line 189, in main outfp = extract_text(**vars(A)) File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/tools/pdf2txt.py", line 57, in extract_text pdfminer.high_level.extract_text_to_fp(fp, **locals()) File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/pdfminer/high_level.py", line 79, in extract_text_to_fp for page in PDFPage.get_pages(inf, File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/pdfminer/pdfpage.py", line 128, in get_pages doc = PDFDocument(parser, password=password, caching=caching) File "/home/alexk/Development/playground/pdfminer_six/pdfminer.six/pdfminer/pdfdocument.py", line 589, in __init__ self.encryption = (list_value(trailer['ID']), KeyError: 'ID'
The error occurs when attempting to access the 'ID' property of the File Trailer, but as can be seen in the DEBUG line in the above output, 'ID' is not in the trailer. Note that 'ID' is listed as optional in the PDF spec: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf#page=88.
I managed to workaround the issue by making the following change in pdfminer/pdfdocument.py (line 588-590):
if 'Encrypt' in trailer: # self.encryption = (list_value(trailer['ID']), self.encryption = (list_value(trailer['ID']) if 'ID' in trailer else [''.encode('utf-8'), ''.encode('utf-8')], dict_value(trailer['Encrypt']))
This simply provides empty utf-8 encoded strings as the ID. I'm not sure if this would be the right "fix" but it appeared to work in my case.
The text was updated successfully, but these errors were encountered:
Since it is
Optional, but strongly recommended; PDF 1.1)
we should indeed make this more robust by assuming the value can be missing.
It looks like there is no sensible default. So using a tuple of two empty bytes is ok.
I suggest using trailer.get('ID', [b'', b'']).
trailer.get('ID', [b'', b''])
Sorry, something went wrong.
No branches or pull requests
Unfortunately I cannot include the PDF as it is a bank statement, but hopefully the details below are enough.
The error is as follows:
The error occurs when attempting to access the 'ID' property of the File Trailer, but as can be seen in the DEBUG line in the above output, 'ID' is not in the trailer. Note that 'ID' is listed as optional in the PDF spec: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf#page=88.
I managed to workaround the issue by making the following change in pdfminer/pdfdocument.py (line 588-590):
This simply provides empty utf-8 encoded strings as the ID. I'm not sure if this would be the right "fix" but it appeared to work in my case.
The text was updated successfully, but these errors were encountered: