Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(async): set default concurrency to 10 tasks #2256

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Sep 25, 2024

Unbounded concurrency is not a good idea. This sets the default to 10.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@dcherian
Copy link
Contributor

I don't have an opinion but this is what the threading module does

Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.

@jhamman
Copy link
Member Author

jhamman commented Sep 25, 2024

There are really two issues here.

  1. how many threads we try to use at once, and
  2. how many concurrent IO operations we attempt

Right now, we are using a single config for both of these but in the future, we may consider allowing users to configure the size of the thread pool and/or multi-threading behavior separately.

@jhamman jhamman force-pushed the fix/default-concurrency branch from 4ed330c to 5e2075b Compare September 25, 2024 22:13
@jhamman jhamman force-pushed the fix/default-concurrency branch from 5e2075b to 818fc73 Compare September 25, 2024 22:15
@dcherian
Copy link
Contributor

But perhaps the default should scale with cpu_count like what threading does: min(32, os.cpu_count() + 4). If we are making arbitrary choices, perhaps we just go with what the standard library does.

@jhamman
Copy link
Member Author

jhamman commented Sep 25, 2024

But perhaps the default should scale with cpu_count like what threading does: min(32, os.cpu_count() + 4). If we are making arbitrary choices, perhaps we just go with what the standard library does.

I hear what you are saying. But this applies to both IO and tasks that use to_thread. I would like us to revisit the way we distribute work to IO and threaded tasks another day.

@jhamman jhamman added this to the 3.0.0 milestone Sep 26, 2024
@jhamman jhamman merged commit 4515671 into zarr-developers:v3 Sep 26, 2024
26 checks passed
dcherian added a commit to dcherian/zarr-python that referenced this pull request Sep 27, 2024
* v3: (21 commits)
  Default zarr.open to open_group if shape is not provided (zarr-developers#2158)
  feat: metadata-only support for storage transformers metadata (zarr-developers#2180)
  fix(async): set default concurrency to 10 tasks (zarr-developers#2256)
  chore(deps): drop support for python 3.10 and numpy 1.24 (zarr-developers#2217)
  feature(store): add LoggingStore wrapper (zarr-developers#2231)
  Apply assorted ruff/flake8-simplify rules (SIM) (zarr-developers#2259)
  Add array storage helpers (zarr-developers#2065)
  Apply ruff/flake8-annotations rule ANN204 (zarr-developers#2258)
  No need to run DeepSource any more - we use ruff (zarr-developers#2261)
  Remove unnecessary lambda expression (zarr-developers#2260)
  Enforce ruff/flake8-comprehensions rules (C4) (zarr-developers#2239)
  Use `map(str, *)` in `test_accessed_chunks` (zarr-developers#2229)
  Replace Gitter with Zulip (zarr-developers#2254)
  Enforce ruff/flake8-pytest-style rules (PT) (zarr-developers#2236)
  Fix multiple identical imports (zarr-developers#2241)
  Enforce ruff/flake8-return rules (RET) (zarr-developers#2237)
  Enforce ruff/flynt rules (FLY) (zarr-developers#2240)
  Fix fill_value handling for complex dtypes (zarr-developers#2200)
  Update V2 codec pipeline to use concrete classes (zarr-developers#2244)
  Apply and enforce more ruff rules (zarr-developers#2053)
  ...
@jhamman jhamman added the V3 label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants