Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] first try of multiprocessing for reprojection 1/2 #647

Open
4 tasks
adebardo opened this issue Feb 6, 2025 · 1 comment · May be fixed by #655
Open
4 tasks

[POC] first try of multiprocessing for reprojection 1/2 #647

adebardo opened this issue Feb 6, 2025 · 1 comment · May be fixed by #655

Comments

@adebardo
Copy link

adebardo commented Feb 6, 2025

Ticket No. 1

Context

The goal of this ticket is to implement tiled reprojection to facilitate the processing of "heavy" datasets, such as 40,000x40,000 DSM CARS.

Proposal

We have observed that this need was anticipated with the creation of the file [delayed.py](https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/delayed.py). Unfortunately, this file relies on the use of the DASK module, which enables distributed computing. However, based on past experiences, the CS 3D development team does not feel comfortable maintaining such a component for ICC needs, for the following reasons:

  • Memory management issues
  • Significant time lost on debugging and maintenance
  • Dask is effective when initial objects are designed with its philosophy in mind → would require several months of development

For these reasons, we propose implementing a delayed_multiproc module as an alternative to the current delayed implementation.

Implementation

To achieve this, we propose the following solution:

  • Research and documentation on Dask / multiprocessing
  • Update project structure:
    • In the raster directory: create a new distributed_computing folder
    • Inside raster/calcul_distribue:
      • Move and rename delayed.py to delayed_utils.py
      • Add delayed_dask.py
      • Add delayed_multiproc.py
  • Move all functions from delayed.py that do not depend on Dask to delayed_utils.py, as they will be used in both new files
  • Move all functions dependent on dask.delayed to delayed_dask.py

Tests

Update test cases involving Dask.

Documentation

Update the documentation accordingly.

/estimate 5d

@adebardo adebardo changed the title [POC] [POC] first try of multiprocessing for reprojection Feb 6, 2025
@adebardo adebardo changed the title [POC] first try of multiprocessing for reprojection [POC] first try of multiprocessing for reprojection 1/2 Feb 6, 2025
@adehecq
Copy link
Member

adehecq commented Feb 11, 2025

Sounds good to me! I would just rename folder calcul_distribue into an English equivalent. Maybe "distributed_computing"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants