-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orca integration for static image export #1105
Comments
Thanks for the writeup! I'll think about things a bit more deeply but one idea that came up while reading was what about baking something into the Jupyter ecosystem? Like a special kernel or something whose lifecycle could be managed by Jupyter? |
A potential variant of 2 (Use CLI with figure as tmp file) as first wrote down in plotly/orca#110 (comment), would be a "batch" mode where multiple temp JSON files are saved and then exported at once using
which I think could be useful for folks writing scripts (not so much for Jupyter Notebook users I guess). |
@etpinard Yeah, if we go with 2, I was picturing some kind of batch context manager API. from plotly.io import save_image, batch_image
fig = ...
save_image(f1, 'out.png'). # Write temp and run of orca
with batch_image(parallel_limit=4):
for i in range(100):
fig = ...
save_image(fig, 'out%d.png' % i). # <- write temp file here
# <- Run orca in batch on all temp files when context manager exits. |
Great write-up! I'd vote for Method 3. I have scientific code that wants to save a static figure every iteration of an algorithm, something like once every several seconds. A 2-second overhead to start orca each call would be a non-starter in that context. But having to manually use a batching context manager and associated complications also feels bad, given how easy I think you could even consider just keep the Orca server running indefinitely after the first API call that requires it, at least as the default - I'm worried about the negative user experience if someone is using Plotly in an interactive REPL and experience seemingly-random 2-second delays as Orca is restarted after being shut down in the background. In what I think would be quite the rare case that the Orca server's memory usage is really problematic, the API could offer a function to manually shut it down. |
Thanks for chiming in @malmaud! Yeah, I do want to get as close as possible to the I'm torn regarding whether or not to auto shutdown the orca server. In my latest testing, on OS X, an orca server process that hasn't done any work yet consumes around 120MB of ram across three process (And there's nothing we can do to shrink this any further given its built on electron). After saving large images the memory usage increases, and it doesn't always decrease back down to baseline right away (garbage collection is up to electron). My first cut at this is going to autostart the sever on first use, and then shut it down after a certain amount of time if a configuration So, if all goes well, users will be able to configure the auto shutdown behavior. The remaining question is, what should the default behavior be? If it shuts down automatically, there may be some confusion as to why some image save calls slow down. If it doesn't shutdown, there may be some concern that a long running process is using a bunch or ram and not doing anything. Sounds like @malmaud would prefer not shutting the server down at all by default (maybe with the option of turning on the timeout behavior if a user wants it). Anyone else care to share a preference one way or the other? |
+1 from me. My guess is that we'll get more community/support questions about the image generation time lag than RAM usage, just judging by the savvy of an average beginning Python user / data scientist. I think we should also aim to replicate the |
Merged in #1120 Thanks for the discussion everyone! |
Overview
This is design proposal for the integration of orca into plotly.py in order to support the programmatic export of high quality static images.
Related issues:
Background
The programmatic export of static raster and vector images from JavaScript-based data visualization libraries is a notoriously complicated problem. One common solution is to combine selenium with a driver for a headless web browser like phantomjs or headless firefox/chrome. This approach is used by Bokeh and Altair for example. One challenge with this approach is that it requires the installation of dependencies that are not managed by a Python environment friendly package manager like
conda
(Although phantomjs is available through conda, its development has been suspended and it does not support WebGL). This presents challenges in terms of portability and reproducibility.The plotly.js team has taken a different approach with the Orca project. Orca is a standalone Electron application that can run as a command line image export tool, or it can run in a server mode and respond to image export requests interactively. Orca is the backbone of the plot.ly image export service, and it was open sourced earlier this year.
Because Orca can be built into a standalone executable that does not depend on a system web browser, it is possible to package Orca as a conda package, and we've had recent success towards this goal.
This issue is for the discussion of how to build the best plotly.py image export experience on top of Orca.
Goals
Potential Approaches
1. Use command-line interface with figure as arg
The current Python instructions in the Orca README suggest the following usage:
Here the figure is serialized to a JSON string and passed as a command line argument to orca. This is nice because it avoids the need to create a temporary file. Unfortunately, there's a limit to how large the command line arguments can be, and large figures cross that boundary, resulting in an exception.
2. Use command-line interface with figure as tmp file
An alternative that doesn't run into this scaling problem is to first write the figure to a temporary file and then call orca with the path to the file. Furthermore, if a collection of figures needs to converted at once, the paths can all be passed to orca at once and orca will convert them in a batch mode. This is much faster on average because the orca executable only has to start up and shut down once per batch, rather than once per figure.
3. Use orca in server mode
Another approach would be to launch orca as a subprocess in server mode. The Python library would send individual image export requests to the server on an agreed upon port. The server would respond with the byte string of the converted image. This approach has several advantages, but also some increased complexity.
3.1 Advantages
Response time: Launch orca as a command line program or as a server process takes roughly 2 seconds to complete. However requests to an already running server process are much faster. I've seen round trip request to response times of under 50ms. 2 seconds is acceptable in the context of exporting figures to images on the filesystem, but it is not acceptable for interactive use as a static backend. 50ms feels as fast as matplotlib.
No temp files: This approach doesn't involve the use of any temporary files, and it makes it much simpler to support the non-file image use cases, like returning a bytes string or
PIL.Image.Image
object to the user.3.2 Complications
There are some additional complications to this approach. First, the long runner server process would need to be managed by the Python library. It's too resource intensive to run all the time by default, so the user would need to start it explicitly, or we would need to start it the first time an export is requested.
Then there's the question of whether we leave the server process running indefinitely. Or do we implement some kind of timeout that would shut the process down after a (configurable) period of inactivity?
Finally, the communication between the Python process and the server requires an open local port, so there's the potential for restrictive firewalls to be a problem. (But, on the other hand, this is also true of the Jupyter Notebook and most applications that interact with an ipython kernel.)
What's next
Next we're going to work on testing and releasing conda packages for orca version 1.1.0.
Method 2 above (temp files) is probably the least risky approach, but I really want the advantages that come with Method 3 (server process), so I'd like to give this a shot first. I've already developed a prototype of the server mode approach, with automatic startup and timeout shutdown, and I have it working on OS X, Linux, and Windows. So far I've found it to be very reliable, and the responsiveness is really exciting.
So, I'm quite hopeful that we'll be able to build a solid user experience on top of the server mode. But I would like hear some other perspectives here.
@chriddyp @jackparmer @cldougl @nicolaskruchten @etpinard @Kully
The text was updated successfully, but these errors were encountered: