-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance median method regridder worse than other methods #240
Comments
I would've expected as much, essentially. Median just wraps numpy nan percentile. I even left a TODO there about better performance: xugrid/xugrid/regrid/reduce.py Line 133 in fd1938a
We're numba jitting, so it'll call the numba implementation. If you follow that link, you can see in the numba source code that it's allocating several numpy arrays (on the heap). I've thought about pre-allocating a buffer, and passing that to each function to use as a workspace instead. (In most cases, such a buffer is probably small enough to fit in a CPU cache.) |
Actually, it looks like I'm mistaken.
def _regrid(source: FloatArray, A: MatrixCSR, size: int):
n_extra = source.shape[0]
out = np.full((n_extra, size), np.nan)
for extra_index in numba.prange(n_extra):
source_flat = source[extra_index]
for target_index in range(A.n):
slice = row_slice(A, target_index)
indices = A.indices[slice]
weights = A.data[slice]
if len(indices) > 0:
out[extra_index, target_index] = f(source_flat, indices, weights)
return out Numba parallellisation is occuring over the The next would work fine since it's allocated per parallel process: source_flat = source[extra_index]
buffer = np.empty(buffersize)
for target_index in range(A.n):
slice = row_slice(A, target_index)
indices = A.indices[slice]
weights = A.data[slice]
if len(indices) > 0:
out[extra_index, target_index] = f(source_flat, indices, weights, buffer)
return out Then we can provide some additional memory for each reduction function. The trickiest part is deciding the size of the buffer. |
Actually, deciding on the size of the buffer is probably also quite easy. In the sparse matrix, the largest number of non-zeros in any row suffices. |
Much better now:
|
Making a local copy now, better memory access patterns might speed up some a little:
|
In the iMOD Python test bench, I noticed that changing the method "mode" to "median" of one specific variable (idomain), caused the tests to take more than 1 hour instead of 20 minutes. This could be due to some faulty agent on TeamCity, but I thought some simple performance tests are interesting to conduct here. I noticed a TODO in the median method notifying that this specific method might be not so performant, because np.nanpercentile is used here instead of a custom implementation.
I modified the OverlapRegridder example, to conduct some simple performance tests:
This printed:
Indeed the
median
method is slower than the rest. Next comes mode, but I think this is somewhat within the margin.The text was updated successfully, but these errors were encountered: