-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rearrangement of raster.delayed
to prepare for tiled reprojection implementation with multiprocessing
#655
base: main
Are you sure you want to change the base?
Conversation
No comments on this, super! 😁 🚀 Looking ahead at the next step where we'll have to mirror the main functions of The best is probably to start implementing those main functions in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need a init.py file in tests/test_raster/test_distributing_computing ---> okay the init file is missing in every folder don't look at my comment ( I can't delete it)
Moving these functions could affect the memory usage by altering cache usage or failing to properly clean up memory, but I'm not exactly sure. Do you have any idea? |
01d8c52
to
efc51b5
Compare
@adebardo Not sure I understand, you mean the changes of this PR could negatively affect cache/memory this way? We have memory testing implemented in Sometimes there is an error raised because the cluster does not close properly when using |
Haaaaa and that's why the tests are failing in this CI, we need to update the name of the new test module Line 23 in 83d9980
Then the tests should never fail (but there are still some cluster teardown messages printed that can pollute the |
efc51b5
to
6091983
Compare
@vschaffn @adebardo I went through the code changes again... Hard to grasp why the memory usage would increase. Maybe splitting the code into subfunctions simply creates more links to save in the Dask graph, and this builds up a couple more 10s of MBs over all chunks and loops. If we check that memory usage before the PR changes was very close to ~100 MBs (the limit of the test, now is slightly above at 120 MBs), then we could simply increase the test threshold by multiplying it by 1.5 or something. |
Resolves #647.
Context
The aim of this ticket is to separate the functions in
raster.delayed
that use dask from those that do not. The functions that do not usedask
will form a common basis for tiled reprojection with bothdask
andmultiprocessing
, which will be introduced in a future ticket (#648) .Changes
raster.distributed_computing
to organize the new delayed structure ofraster
.raster.distributed_computing.delayed_multiproc
to prepare the next implementations.raster.delayed
that do not use dask have been moved toraster.distributed_computing.delayed_utils
.raster.delayed
that use dask have been moved toraster.distributed_computing.delayed_dask
._get_block_ids_per_chunk
,cached_cumsum
has been replaced bynp.cumsum
to avoid using dask, and then keep the function common for bothdask
andmultiprocessing
.Tests
test_raster.delayed
has been updated in compliance with the newraster.distributed_computing
directory.