Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free Python GIL on blocking operations to allow multi-threading runtime usage without deadlocks #387

Merged
merged 1 commit into from
Sep 6, 2023

Conversation

jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Sep 6, 2023

Previously we forced the use of a single-threaded tokio runtime whenever a Python Datasource was in use. This was done to work around deadlocks, but has the unfortunate side affect of disabling multi-threaded parallelization of queries in these situations. This PR applies the technique discussed in See PyO3/pyo3#2182 to use the PyO3 allow_threads construct to release the Python GIL before performing blocking operations that may themselves need to acquire the GIL in separate threads.

In turns out that the DuckDbDatasource still requires running on the main thread in order to access the kernel's top-level DataFrames, so I made the main thread behavior configurable on a per-datasource level, where the default is to maintain the prior behavior of running on the main thread.

I started thinking about this again as a result of the discussion in #386. With these changes, it should be possible to write a __dataframe__ protocol-based VegaFusion Datasource that implements a custom DataFusion datasource without requiring everything to run on the main thread.

This avoids deadlocks when using the multithreaded runtime with Python data sources. The Python datasource implementation now has control over whether it must be run on the main thread (which duckdb requires).
@jonmmease jonmmease changed the title free Python GIL on blocking operations to allow multi-threading runtime usage without deadlocks Free Python GIL on blocking operations to allow multi-threading runtime usage without deadlocks Sep 6, 2023
@jonmmease jonmmease merged commit 7127d04 into main Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant