Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolate_na: max_map argument not working at array boundaries #7597

Closed
3 of 4 tasks
Ockenfuss opened this issue Mar 8, 2023 · 6 comments · Fixed by #7598
Closed
3 of 4 tasks

Interpolate_na: max_map argument not working at array boundaries #7597

Ockenfuss opened this issue Mar 8, 2023 · 6 comments · Fixed by #7598

Comments

@Ockenfuss
Copy link
Contributor

What happened?

In the case of multidimensional arrays, the max_gap argument of interpolate_na is currently not working correctly at the array boundaries. This is likely due to a missing "dim" argument in the max() aggregation in xarray.core.missing._get_nan_block_lengths, I think.

What did you expect to happen?

In the following code example, due to max_gap=2, no extrapolation should be performed for the second row. Currently, this is the case, the output created is:

<xarray.DataArray (x: 2, y: 5)>
array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1 2 3 4

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
da=xr.DataArray([[1, 2,3,4, np.nan],[1,2, np.nan, np.nan, np.nan]], coords=[('x', [0,1]), ('y', [0,1,2,3,4])])
da_interp=da.interpolate_na(dim='y', max_gap=2, fill_value='extrapolate')
print(da_interp)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I added the missing dim argument and adapted the test cases (Currently, there was no test case for fully multidimensional arrays with a gap at the end).

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-135-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.0

xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.10.2
distributed: None
matplotlib: 3.6.3
cartopy: None
seaborn: None
numbagg: 0.2.1
fsspec: 2022.10.0
cupy: None
pint: 0.20.1
sparse: None
flox: 0.6.8
numpy_groupies: 0.9.20
setuptools: 58.1.0
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.6.0
sphinx: None

@Ockenfuss Ockenfuss added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 8, 2023
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 8, 2023
@dcherian
Copy link
Contributor

dcherian commented Mar 8, 2023

This is likely due to a missing "dim" argument in the max() aggregation in xarray.core.missing._get_nan_block_lengths, I think.

Thanks. A PR is welcome! It looks like you have a nice simple test.

@Karimat22

This comment was marked as off-topic.

@Karimat22

This comment was marked as off-topic.

@Ockenfuss
Copy link
Contributor Author

@dcherian The answers above read like they are generated by a bot (they even include the typo I made in the Issue title: max_map instead of max_gap). ChatGPT seems to create very similar answers, if I input the issue heading.
@Karimat22 : Please clarify what you mean, if this is not the case, I currently do not understand you comment.

@Karimat22
Copy link

@Ockenfuss i said you should try this three point I listed below and see if that could resolve the problem you raised.

  1. Try adjusting the max_gap argument to a smaller value to see if that resolves the issue. For example, if max_gap is currently set to 10, try reducing it to 5 or even 1.

  2. Consider using a different interpolation method that is better suited for the specific dataset and boundaries. For example, if linear interpolation is not working well at the array boundaries, try a cubic or spline interpolation method.

  3. Check the data at the array boundaries to ensure that it is valid and not causing issues with the interpolation. For example, if there are NaN values or outliers at the boundaries, this could be affecting the interpolation.

@dcherian
Copy link
Contributor

Thanks @Karimat22 .

In general, it's most helpful to comment if you have specific knowledge about the topic. In this case, the original post was pretty nice and clear bug report, not a usage question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants