You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this ticket is to implement tiled reprojection to facilitate the processing of "heavy" datasets, such as 40,000x40,000 DSM CARS.
Proposal
We have observed that this need was anticipated with the creation of the file [delayed.py](https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/delayed.py). Unfortunately, this file relies on the use of the DASK module, which enables distributed computing. However, based on past experiences, the CS 3D development team does not feel comfortable maintaining such a component for ICC needs, for the following reasons:
Memory management issues
Significant time lost on debugging and maintenance
Dask is effective when initial objects are designed with its philosophy in mind → would require several months of development
For these reasons, we propose implementing a delayed_multiproc module as an alternative to the current delayed implementation.
Implementation
To achieve this, we propose the following solution:
Research and documentation on Dask / multiprocessing
Update project structure:
In the raster directory: create a new distributed_computing folder
Inside raster/calcul_distribue:
Move and rename delayed.py to delayed_utils.py
Add delayed_dask.py
Add delayed_multiproc.py
Move all functions from delayed.py that do not depend on Dask to delayed_utils.py, as they will be used in both new files
Move all functions dependent on dask.delayed to delayed_dask.py
Tests
Update test cases involving Dask.
Documentation
Update the documentation accordingly.
/estimate 5d
The text was updated successfully, but these errors were encountered:
adebardo
changed the title
[POC]
[POC] first try of multiprocessing for reprojection
Feb 6, 2025
adebardo
changed the title
[POC] first try of multiprocessing for reprojection
[POC] first try of multiprocessing for reprojection 1/2
Feb 6, 2025
Ticket No. 1
Context
The goal of this ticket is to implement tiled reprojection to facilitate the processing of "heavy" datasets, such as 40,000x40,000 DSM CARS.
Proposal
We have observed that this need was anticipated with the creation of the file [delayed.py](https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/delayed.py). Unfortunately, this file relies on the use of the DASK module, which enables distributed computing. However, based on past experiences, the CS 3D development team does not feel comfortable maintaining such a component for ICC needs, for the following reasons:
For these reasons, we propose implementing a
delayed_multiproc
module as an alternative to the current delayed implementation.Implementation
To achieve this, we propose the following solution:
raster
directory: create a newdistributed_computing
folderraster/calcul_distribue
:delayed.py
todelayed_utils.py
delayed_dask.py
delayed_multiproc.py
delayed.py
that do not depend on Dask todelayed_utils.py
, as they will be used in both new filesdask.delayed
todelayed_dask.py
Tests
Update test cases involving Dask.
Documentation
Update the documentation accordingly.
/estimate 5d
The text was updated successfully, but these errors were encountered: