Performance median method regridder worse than other methods #240

JoerivanEngelen · 2024-05-01T10:25:31Z

In the iMOD Python test bench, I noticed that changing the method "mode" to "median" of one specific variable (idomain), caused the tests to take more than 1 hour instead of 20 minutes. This could be due to some faulty agent on TeamCity, but I thought some simple performance tests are interesting to conduct here. I noticed a TODO in the median method notifying that this specific method might be not so performant, because np.nanpercentile is used here instead of a custom implementation.

I modified the OverlapRegridder example, to conduct some simple performance tests:

import xugrid as xu
import time

def create_grid(bounds, nx, ny):
    """Create a simple grid of triangles covering a rectangle."""
    import numpy as np
    from matplotlib.tri import Triangulation
    xmin, ymin, xmax, ymax = bounds
    dx = (xmax - xmin) / nx
    dy = (ymax - ymin) / ny
    x = np.arange(xmin, xmax + dx, dx)
    y = np.arange(ymin, ymax + dy, dy)
    y, x = [a.ravel() for a in np.meshgrid(y, x, indexing="ij")]
    faces = Triangulation(x, y).triangles
    return xu.Ugrid2d(x, y, -1, faces)

uda = xu.data.elevation_nl().ugrid.sel(
    x=slice(125_000, 225_000), y=slice(440_000, 500_000)
)
grid = create_grid(uda.ugrid.total_bounds, nx=1000, ny=1000)

functions = [
    "mean",
    "mean",  # Repeat first element to hack around JIT performance hits
    "harmonic_mean",
    "geometric_mean",
    "sum",
    "minimum",
    "maximum",
    "mode",
    "median",
    "max_overlap",
]

for f in functions:
    t0 = time.time()
    regridder = xu.OverlapRegridder(source=uda, target=grid, method=f)
    result = regridder.regrid(uda)
    t1 = time.time()
    tn = t1-t0
    print(f"{f}: {tn} seconds")

This printed:

mean: 4.471312522888184 seconds
mean: 0.17104887962341309 seconds
harmonic_mean: 0.717357873916626 seconds
geometric_mean: 0.8859498500823975 seconds
sum: 0.6982800960540771 seconds
minimum: 0.6448187828063965 seconds
maximum: 0.6634867191314697 seconds
mode: 1.0880956649780273 seconds
median: 9.672606706619263 seconds
max_overlap: 0.6971712112426758 seconds

Indeed the median method is slower than the rest. Next comes mode, but I think this is somewhat within the margin.

The text was updated successfully, but these errors were encountered:

Huite · 2024-05-01T10:49:50Z

I would've expected as much, essentially. Median just wraps numpy nan percentile.

I even left a TODO there about better performance:

xugrid/xugrid/regrid/reduce.py

Line 133 in fd1938a

# TODO: more efficient implementation?

We're numba jitting, so it'll call the numba implementation. If you follow that link, you can see in the numba source code that it's allocating several numpy arrays (on the heap).
During regridding, this results in a huge number of tiny heap allocations (one allocation per target index), which obviously has bad performance characteristics.

I've thought about pre-allocating a buffer, and passing that to each function to use as a workspace instead. (In most cases, such a buffer is probably small enough to fit in a CPU cache.)
I think the challenge here is that ideally we do the reduction in parallel (see the numba prange in make_regrid); which requires one buffer per parallel process and makes the parallellization less straightforward.

Huite · 2024-05-01T12:12:17Z

Actually, it looks like I'm mistaken.

make_regrid uses a closure to insert the redunction function f here:

    def _regrid(source: FloatArray, A: MatrixCSR, size: int):
        n_extra = source.shape[0]
        out = np.full((n_extra, size), np.nan)
        for extra_index in numba.prange(n_extra):
            source_flat = source[extra_index]
            for target_index in range(A.n):
                slice = row_slice(A, target_index)
                indices = A.indices[slice]
                weights = A.data[slice]
                if len(indices) > 0:
                    out[extra_index, target_index] = f(source_flat, indices, weights)
        return out

Numba parallellisation is occuring over the n_extra dimensions (the flattened time, layer dimensions e.g.).

The next would work fine since it's allocated per parallel process:

            source_flat = source[extra_index]
            buffer = np.empty(buffersize)
            for target_index in range(A.n):
                slice = row_slice(A, target_index)
                indices = A.indices[slice]
                weights = A.data[slice]
                if len(indices) > 0:
                    out[extra_index, target_index] = f(source_flat, indices, weights, buffer)
        return out

Then we can provide some additional memory for each reduction function.
This would also allow us the write the mode implementation in a better way, since it's currently also allocating a copy.

The trickiest part is deciding the size of the buffer.

Huite · 2024-05-08T09:20:44Z

Actually, deciding on the size of the buffer is probably also quite easy. In the sparse matrix, the largest number of non-zeros in any row suffices.

Huite · 2024-08-05T14:12:06Z

Much better now:

mean: 4.8980393409729 seconds
mean: 0.19949793815612793 seconds
harmonic_mean: 0.20173168182373047 seconds
geometric_mean: 0.2833566665649414 seconds
sum: 0.19977402687072754 seconds
minimum: 0.1972644329071045 seconds
maximum: 0.18817400932312012 seconds
mode: 0.24338030815124512 seconds
median: 0.2827489376068115 seconds
max_overlap: 0.20089316368103027 seconds

Huite · 2024-08-05T20:13:10Z

Making a local copy now, better memory access patterns might speed up some a little:

mean: 0.1990649700164795 seconds
mean: 0.1802206039428711 seconds
harmonic_mean: 0.1920933723449707 seconds
geometric_mean: 0.2039186954498291 seconds
sum: 0.17996907234191895 seconds
minimum: 0.17102718353271484 seconds
maximum: 0.17496109008789062 seconds
mode: 0.20006632804870605 seconds
median: 0.19997835159301758 seconds
max_overlap: 0.17799925804138184 seconds

JoerivanEngelen changed the title ~~Performance tests regridder~~ Performance median method regridder slow May 1, 2024

JoerivanEngelen changed the title ~~Performance median method regridder slow~~ Performance median method regridder worse than other methods May 1, 2024

Huite mentioned this issue Aug 5, 2024

Refactor reduce module; improve performance for mode, median #270

Merged

Huite closed this as completed in #270 Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance median method regridder worse than other methods #240

Performance median method regridder worse than other methods #240

JoerivanEngelen commented May 1, 2024 •

edited

Loading

Huite commented May 1, 2024 •

edited

Loading

Huite commented May 1, 2024 •

edited

Loading

Huite commented May 8, 2024 •

edited

Loading

Huite commented Aug 5, 2024

Huite commented Aug 5, 2024

Performance median method regridder worse than other methods #240

Performance median method regridder worse than other methods #240

Comments

JoerivanEngelen commented May 1, 2024 • edited Loading

Huite commented May 1, 2024 • edited Loading

Huite commented May 1, 2024 • edited Loading

Huite commented May 8, 2024 • edited Loading

Huite commented Aug 5, 2024

Huite commented Aug 5, 2024

JoerivanEngelen commented May 1, 2024 •

edited

Loading

Huite commented May 1, 2024 •

edited

Loading

Huite commented May 1, 2024 •

edited

Loading

Huite commented May 8, 2024 •

edited

Loading