Removing Timeouts #687

deeplow · 2024-01-25T09:34:04Z

Background on Timeouts and server async code

According to Micah (the Dangerzone creator) timeouts were originally introduced because some commands in the conversion were hanging. The idea of removing timeouts had been explored already by @yveszoundi in issue issue #149.

In versions up to 0.5.1, Dangerzone did in fact make use of multiple binaries to perform the conversion (LibreOffice, pdftoppm/PDFtk, convert, ps2pdf, etc.). However, in 0.6.0 we will replace almost all of these (except LibreOffice) with a native python module called PyMuPDF.

Because we always had timeouts, we kept them. In particular in January 2023 when we replaced PDFtk with PDFtoPPM, we had to turn our subprocess-related calls with timeouts into async + timeouts. We had to do this in order to parse the output PDFtoPPM (which indicated progress) without blocking it. This particular combination ended up being quite complex for the relatively simple job

The last chapter of the timeout story was when we moved the timeouts the server-side (aka. attacker-controlled) timeouts into the client-side to make it so the attacker wouldn't control them. This however, proved to be challenging, particularly in making non-blocking reads work on windows, further elaborated in issue #632.

Why remove timeouts?

With the background out of the way, I summarize here the main reasons in favor of timeout removal.

Lost purpose - after implementing the containers page streaming the only subprocess we have left is LibreOffice. So don't have such a big risk of commands hanging (the original reason for timeouts).
Little benefit - Predicting execution time is generically unsolvable computer science problem. Ultimately we were guessing an arbitrary time based on the number of pages and the document size. As a guess we made it pretty lax (30s per page or MB). A document hanging for this long will probably lead to user frustration in any case and the user may be compelled to abort the conversion.
Technical Challenges with non-blocking timeout: there have been several technical challenges in keeping timeouts that we've made effort to accommodate. A significant one was having to do non-blocking read to ensure we could timeout when reading conversion stream (and then used here)

Reconsider them in the Future

We could consider implementing timeouts in the future, though the GUI that detects when conversions are taking too long and allowing the user to stop it.

This reverts commit fea193e. This is part of the purge of timeout-related code since we no longer need it [1]. Non-blocking reads were introduced in the reverted commit in order to be able to cut a stream mid-way due to a timeout. This is no longer needed now that we're getting rid of timeouts. [1]: #687

Remove timeouts due to several reasons: 1. Lost purpose: after implementing the containers page streaming the only subprocess we have left is LibreOffice. So don't have such a big risk of commands hanging (the original reason for timeouts). 2. Little benefit: predicting execution time is generically unsolvable computer science problem. Ultimately we were guessing an arbitrary time based on the number of pages and the document size. As a guess we made it pretty lax (30s per page or MB). A document hanging for this long will probably lead to user frustration in any case and the user may be compelled to abort the conversion. 3. Technical Challenges with non-blocking timeout: there have been several technical challenges in keeping timeouts that we've made effort to accommodate. A significant one was having to do non-blocking read to ensure we could timeout when reading conversion stream (and then used here) Fixes #687

yveszoundi · 2024-01-29T05:03:10Z

That's great news!

I used a similar approach in my rust port of Dangerzone, where I replaced "early" shell commands by a combination of rust libraries to address hanging problems.

There were too many parameters involved into predicting reliably abnormal or rather long executions (pages, graphics, current system resources, etc.).
For LibreOffice, I ended up leveraging libreoffice-rs.
- For Python, pylokit appears unmaintained, but it's also a rather tiny binding with minimal functionality (compared to the rust alternative).
- I don't think that it's possible to do things in parallel though, it'll probably result into segmentation faults. See also https://www.mail-archive.com/[email protected]/msg983283.html

yveszoundi · 2024-01-29T05:29:07Z

As a side note, additional hanging issues can occur depending on how IO is handled (i.e., stdout and stderr pipes).

In the early months of my own Dangerzone implementation, I noticed that occasionally the program would randomly "hang".

I didn't know if I was misusing the programming language (Rust), a specific library or if other variables were at play
- The problem was happening more on Windows than other operating systems
- I then realized that document size was not necessarily a factor
The fix was ensuring that streams were always consumed, even if I didn't care about contents (while not eof, always_consume_blindly_stdout_and_stderr_piped_output_streams)
Examples: run libreoffice process, run container image existence checks, etc.

apyrgio · 2024-01-29T19:46:15Z

Yeap, consuming data in pipes is always something we have in mind, thanks for pointing that out. And as for natively calling LibreOffice, thanks the pylokit reference, that will prove useful 🙂