-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix usage of map_blocks in AreaWeighted and elsewhere #5767
Fix usage of map_blocks in AreaWeighted and elsewhere #5767
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5767 +/- ##
=======================================
Coverage 89.74% 89.74%
=======================================
Files 92 92
Lines 22940 22942 +2
Branches 5462 5464 +2
=======================================
+ Hits 20588 20590 +2
Misses 1620 1620
Partials 732 732 ☔ View full report in Codecov by Sentry. |
Performance Benchmark Report: eb08de2Performance shifts
Full benchmark results
Generated by GHA run |
Performance Benchmark Report: d37cde8Performance shifts
Full benchmark results
Generated by GHA run |
for more information, see https://pre-commit.ci
While I doubt it actually helps here, it's interesting to note that this problem may have similarities to #5753 What's similar is the way the operations are presented to Dask as a combination of tasks : However, in that case, it seems more like dask is unnecessarily assembling a complete array in memory when both operations are entirely chunk-able with the same chunks. What makes me think that the save problem is not quite the same problem as here, is that in that case the memory use is recovered at the end of the operation. |
@pp-mo Yes, that sounds like a slightly different issue. In the case of regridding, it doesn't seem like there is any problem with the data that is being realised by dask, the problem is with the data that lives in the partial function already realised and which isn't being thrown away when we would expect it to. |
Yes, that's a telling way of looking at it. Those really are quite different things. |
@bouweandela suggests a possible future implementation that could be more efficient: making the weights array always a Dask array, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion, I'm now happy with this.
I re-ran a performance test example, and it does seem to be working (process memory usage no longer grows with a repeating large regrid operation).
…ciTools#5767) * fix usage of map_blocks * fix map_blocks for non-lazy data * add benchmark * unskip benchmark * add benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove benchmarks * remove unnecessary import * What's New entry. * map_complete_blocks docstring. * map_complete_blocks returns. * Typo. * Typo. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Yeo <[email protected]>
…ciTools#5767) * fix usage of map_blocks * fix map_blocks for non-lazy data * add benchmark * unskip benchmark * add benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove benchmarks * remove unnecessary import * What's New entry. * map_complete_blocks docstring. * map_complete_blocks returns. * Typo. * Typo. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Yeo <[email protected]>
* fix usage of map_blocks * fix map_blocks for non-lazy data * add benchmark * unskip benchmark * add benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove benchmarks * remove unnecessary import * What's New entry. * map_complete_blocks docstring. * map_complete_blocks returns. * Typo. * Typo. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Yeo <[email protected]>
* fix usage of map_blocks * fix map_blocks for non-lazy data * add benchmark * unskip benchmark * add benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove benchmarks * remove unnecessary import * What's New entry. * map_complete_blocks docstring. * map_complete_blocks returns. * Typo. * Typo. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Yeo <[email protected]>
* fix usage of map_blocks * fix map_blocks for non-lazy data * add benchmark * unskip benchmark * add benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove benchmarks * remove unnecessary import * What's New entry. * map_complete_blocks docstring. * map_complete_blocks returns. * Typo. * Typo. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Yeo <[email protected]>
…th_numpydoc * upstream/main: Bump scitools/workflows from 2024.03.0 to 2024.03.1 Bump scitools/workflows from 2024.02.2 to 2024.03.0 Fix usage of map_blocks in AreaWeighted and elsewhere (SciTools#5767) Lazy rolling_window (SciTools#5775) whatsnew update for 3.8.0 (SciTools#5793) Bump scitools/workflows from 2024.02.1 to 2024.02.2 (SciTools#5792) DOCS: Add whatsnew for recent PRs (SciTools#5789)
* 'geo-bridge' of github.com:HGWright/iris: (41 commits) Updated lock files. [pre-commit.ci] auto fixes from pre-commit.com hooks Advertise structured_um_loading performance regression. [pre-commit.ci] pre-commit autoupdate Updated environment lockfiles [pre-commit.ci] auto fixes from pre-commit.com hooks [pre-commit.ci] pre-commit autoupdate Make the bugfix panel OPEN. What's New patch. What's New patch. Bump scitools/workflows from 2024.03.1 to 2024.03.3 Bump scitools/workflows from 2024.03.0 to 2024.03.1 Bump scitools/workflows from 2024.02.2 to 2024.03.0 Fix usage of map_blocks in AreaWeighted and elsewhere (SciTools#5767) Lazy rolling_window (SciTools#5775) whatsnew update for 3.8.0 (SciTools#5793) Bump scitools/workflows from 2024.02.1 to 2024.02.2 (SciTools#5792) DOCS: Add whatsnew for recent PRs (SciTools#5789) ASV custom build command and file-based benchmark triggers (SciTools#5776) DOCS: Enable numpydoc validation pre-commit hook (SciTools#5762) ...
🚀 Pull Request
Addresses a bug raised by @hdyson in
AreaWeighted
which caused weights to be retained in memory even after all relevent python objects had been removed. This appears to be due to passing weights todask.array.map_blocks
as part of a partial function. It seems that while dask will remove arrays from memory after use it will not do the same for python functions. Passing weights as arguments tomap_blocks
ought to fix this.