You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyMuPDF has made the design decision to log all messages by default to stdout (see pymupdf/PyMuPDF#3135). This is a problem for us for two reasons:
First, in the document to pixels phase, we assume that the stdout of the conversion process contains only the pixels of the rasterized document (plus some page/size info). If a library that we use also logs to stdout, then this will corrupt our pixel stream, and our conversion will fail.
This is actually biting us in practice. We have a malformed file (no-pages-types.pdf) in our large test set that causes PyMuPDF to log this error to stdout:
MuPDF error: format error: non-page object in page tree
The error above is mixed with the pixel stream, and Dangerzone fails.
Second, in the pixels to PDF phase, we had a case (#700) where PyMuPDF was logging to stdout, from which we expect to read conversion progress reports in JSON format. We circumvented this using Python's contextlib.redirect_stdout(), but we lose error messages this way.
Solution
A solution to the above problem that works for all configurations is to take advantage of the environment variables that PyMuPDF devs have added to control logging. An important detail here is that we have to use these environment variables in our dangerzone.conversion.* modules, before we load the fitz module, which is ugly, but doable.
The text was updated successfully, but these errors were encountered:
PyMUPDF logs to stdout by default, which is problematic because we use
the stdout of the conversion process to read the pixel stream of a
document.
Make PyMuPDF always log to stderr, by setting the following environment
variables: PYMUPDF_MESSAGE and PYMUPDF_LOG.
Fixes#877
Add a doc that contains an MP4 video in it, which has an audio and video
stream. This type of document could not be converted with the latest
Dangerzone releases, because PyMuPDF threw this error in the container's
stdout:
MuPDF error: unsupported error: cannot create appearance stream for
Screen annotations
This error message was treated literally by our client code, which
parsed the first few bytes in order to find out the page height/width.
This resulted to a misleading Dangerzone error, e.g.:
A page exceeded the maximum height
This issue started occurring since 0.6.0, which added streaming support,
and was fixed by commit 3f86e7b. That
fix was not accompanied by a test document that would ensure we would
not have this regression from now on, so we add it in this
commit.
Refs #877Closes#917
Problem
PyMuPDF has made the design decision to log all messages by default to stdout (see pymupdf/PyMuPDF#3135). This is a problem for us for two reasons:
First, in the document to pixels phase, we assume that the stdout of the conversion process contains only the pixels of the rasterized document (plus some page/size info). If a library that we use also logs to stdout, then this will corrupt our pixel stream, and our conversion will fail.
This is actually biting us in practice. We have a malformed file (
no-pages-types.pdf
) in our large test set that causes PyMuPDF to log this error to stdout:Tip
You can reproduce it with:
The error above is mixed with the pixel stream, and Dangerzone fails.
Second, in the pixels to PDF phase, we had a case (#700) where PyMuPDF was logging to stdout, from which we expect to read conversion progress reports in JSON format. We circumvented this using Python's
contextlib.redirect_stdout()
, but we lose error messages this way.Solution
A solution to the above problem that works for all configurations is to take advantage of the environment variables that PyMuPDF devs have added to control logging. An important detail here is that we have to use these environment variables in our
dangerzone.conversion.*
modules, before we load thefitz
module, which is ugly, but doable.The text was updated successfully, but these errors were encountered: