Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation breaks with small enough shapefiles #14

Closed
bradyrx opened this issue Aug 9, 2021 · 2 comments
Closed

Aggregation breaks with small enough shapefiles #14

bradyrx opened this issue Aug 9, 2021 · 2 comments

Comments

@bradyrx
Copy link

bradyrx commented Aug 9, 2021

I'm having the .aggregate(...) step break with some shapefiles that are seemingly too small. This uses the Admin 2 level Brazilian municipalities. I've tested on some ERA5 data as well as some xarray tutorial data (below) and the issue persists, so this is definitely due to the shapefiles.

If the pooch retrieval below doesn't work, the Admin 2 shapefiles are here: https://data.humdata.org/dataset/brazil-administrative-level-0-boundaries.

import xarray as xr
import pooch
import geopandas as gpd
import xagg as xa

# Load in the Brazilian municipalities (Admin 2)
file = pooch.retrieve(
    "https://data.humdata.org/dataset/f5f0648e-f085-4c85-8242-26bf6c942f40/resource/b4bf8e52-2de8-443f-a72d-287c1ef6b462/download/bra_adm_ibge_2020.gdb.zip",
    None,
)
municipalities = gpd.read_file("zip://" + file)

# Set CRS since the shapefile does not come with a CRS
municipalities = municipalities.set_crs("EPSG:4326")

# Load in some global tutorial data from xarray
ds = xr.tutorial.open_dataset("eraint_uvz")
ds = ds.isel(level=0, month=0)["u"].to_dataset()

# Working case. Subset to 10 polygons that do work.
# Need to reset index since it breaks if index isn't continuous
# from zero.
_df = df.iloc[400:410]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

# Breaking case.
_df = df.iloc[415:420]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

Here's the traceback on the broken subset (I believe there are many other polygons here that are small enough to trigger this). I believe this suggests that it's an empty weight array.

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-63-c29dad4ffcd0> in <module>
      1 wm = xa.pixel_overlaps(ds, _df)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

You can also isolate the single polygon causing this:

# Select row 415 which seems to be too small.
polygon = gpd.GeoDataFrame(df.iloc[415]).T
polygon = polygon.set_crs("EPSG:4326")
polygon = polygon.reset_index()
wm = xa.pixel_overlaps(ds, polygon)
aggregated = xa.aggregate(ds, wm)
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-72-e6fb927d8b50> in <module>
      1 wm = xa.pixel_overlaps(ds, polygon)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

A hacky fix right now would be to loop through each polygon individually and run a try/except block where IndexErrors are caught and then the closest grid cell to the polygon center is selected, but that's of course not ideal.

@RichardScottOZ
Copy link

I have a case with a lot of small pieces and yes, that would defeat the purposes of an aggregation function.

@ks905383
Copy link
Owner

ks905383 commented Jul 3, 2023

Should've closed this a while ago - this was fixed with #10 I believe. At the very least, the error is no longer reproducible.

@ks905383 ks905383 closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants