Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Tiling for multiprocessing #649

Closed
1 task
adebardo opened this issue Feb 6, 2025 · 2 comments · Fixed by #652
Closed
1 task

[POC] Tiling for multiprocessing #649

adebardo opened this issue Feb 6, 2025 · 2 comments · Fixed by #652

Comments

@adebardo
Copy link

adebardo commented Feb 6, 2025

Context

To enable multiprocessing in xDEM for handling large datasets, it is necessary to implement data tiling at the raster level. This is why we are proposing this ticket in GeoUtils.

Implementation

The idea is to create the same API as the reproject function at the end of this raster file:

tiling_grid, new_shape = raster_sec.compute_tiling(raster_ref)

and integrate the code developed in the internally proposed proof of concept (PoC):

def generate_tiling_grid(
    row_min: float,
    col_min: float,
    row_max: float,
    col_max: float,
    row_split: int,
    col_split: int,
    overlap: int = 0,
) -> np.ndarray:
    """
    Generate a grid of positions by splitting [row_min, row_max] x
    [col_min, col_max] into segments of size row_split x col_split.

    :param row_min: Minimum row index of the bounding box to split
    :param col_min: Minimum column index of the bounding box to split
    :param row_max: Maximum row index of the bounding box to split
    :param col_max: Maximum column index of the bounding box to split
    :param row_split: Height of each split
    :param col_split: Width of each split
    :param overlap: size of overlapping between tiles
    :return: A numpy array grid with splits in two dimensions (0: row, 1: column),
             where each cell contains [row_min, row_max, col_min, col_max].
    """
    # Calculate the number of splits considering overlap
    nb_col_split = math.ceil((col_max - col_min) / (col_split - overlap))
    nb_row_split = math.ceil((row_max - row_min) / (row_split - overlap))

    # Initialize the output grid
    out_grid = np.array(shape=(nb_row_split, nb_col_split, 4), dtype=float)

    for row in range(nb_row_split):
        for col in range(nb_col_split):
            row_start = row_min + row * (row_split - overlap)
            col_start = col_min + col * (col_split - overlap)
            out_grid[row, col, 0] = row_start
            out_grid[row, col, 1] = min(row_max, row_start + row_split)
            out_grid[row, col, 2] = col_start
            out_grid[row, col, 3] = min(col_max, col_start + col_split)

    return out_grid


def compute_tiling(self, tile_size, dem):
    # Get sizes
    shape_ref = self.shape
    transform_ref = self.transform
    shape_sec = dem.shape
    shape = None
    if shape_ref != shape_sec:
      raise Exception("Reference and secondary rasters do not have the same shape")
      else:
	    # Generate tiling
	    tiling_grid = generate_tiling_grid(0, 0, shape_sec[0], shape_sec[1], tile_size, tile_size)
    
    return tiling_grid, shape

Overlapping Management

Currently, overlapping is not handled, but it is essential for certain use cases. Since an optimal overlap value has not been determined yet, we will simply add this option for future use.

Tests

  • Implement unit tests
@adehecq
Copy link
Member

adehecq commented Feb 11, 2025

This makes sense. Just a few remarks:

  • note that all calls to rasterio functions can be replaced by using the Raster class, which also loads the metadata needed (shape, transform) without loading data. But it's not mandatory.
  • at this stage, I do not understand why there are 2 optional rasters as input to compute_tiling
  • regarding overlapping, yes it will be necessary. We were discussing it with @rhugonnet yesterday and it will be quite important, especially for reprojection but also for coregistration, when DEM resampling takes place. Of course, the optimal size of the overlap is ideally adjusted for each situation. But I would suggest already including an overlap argument to define the size of the overlap and take it into account when defining the tiles.

@adebardo
Copy link
Author

Thank you, @adehecq , for your feedback.

  • I will review the ticket to make it more "xdem friendly." Indeed, it's not yet a reflex for me, but it's something I need to integrate.
  • A copy-paste mistake, I'll fix it.
  • Got it, I'll add it right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants