Aggregation breaks with small enough shapefiles #14

bradyrx · 2021-08-09T17:00:40Z

I'm having the .aggregate(...) step break with some shapefiles that are seemingly too small. This uses the Admin 2 level Brazilian municipalities. I've tested on some ERA5 data as well as some xarray tutorial data (below) and the issue persists, so this is definitely due to the shapefiles.

If the pooch retrieval below doesn't work, the Admin 2 shapefiles are here: https://data.humdata.org/dataset/brazil-administrative-level-0-boundaries.

import xarray as xr
import pooch
import geopandas as gpd
import xagg as xa

# Load in the Brazilian municipalities (Admin 2)
file = pooch.retrieve(
    "https://data.humdata.org/dataset/f5f0648e-f085-4c85-8242-26bf6c942f40/resource/b4bf8e52-2de8-443f-a72d-287c1ef6b462/download/bra_adm_ibge_2020.gdb.zip",
    None,
)
municipalities = gpd.read_file("zip://" + file)

# Set CRS since the shapefile does not come with a CRS
municipalities = municipalities.set_crs("EPSG:4326")

# Load in some global tutorial data from xarray
ds = xr.tutorial.open_dataset("eraint_uvz")
ds = ds.isel(level=0, month=0)["u"].to_dataset()

# Working case. Subset to 10 polygons that do work.
# Need to reset index since it breaks if index isn't continuous
# from zero.
_df = df.iloc[400:410]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

# Breaking case.
_df = df.iloc[415:420]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

Here's the traceback on the broken subset (I believe there are many other polygons here that are small enough to trigger this). I believe this suggests that it's an empty weight array.

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-63-c29dad4ffcd0> in <module>
      1 wm = xa.pixel_overlaps(ds, _df)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

You can also isolate the single polygon causing this:

# Select row 415 which seems to be too small.
polygon = gpd.GeoDataFrame(df.iloc[415]).T
polygon = polygon.set_crs("EPSG:4326")
polygon = polygon.reset_index()
wm = xa.pixel_overlaps(ds, polygon)
aggregated = xa.aggregate(ds, wm)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-72-e6fb927d8b50> in <module>
      1 wm = xa.pixel_overlaps(ds, polygon)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

A hacky fix right now would be to loop through each polygon individually and run a try/except block where IndexErrors are caught and then the closest grid cell to the polygon center is selected, but that's of course not ideal.

The text was updated successfully, but these errors were encountered:

RichardScottOZ · 2022-06-23T23:15:01Z

I have a case with a lot of small pieces and yes, that would defeat the purposes of an aggregation function.

ks905383 · 2023-07-03T19:49:05Z

Should've closed this a while ago - this was fixed with #10 I believe. At the very least, the error is no longer reproducible.

ks905383 closed this as completed Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation breaks with small enough shapefiles #14

Aggregation breaks with small enough shapefiles #14

bradyrx commented Aug 9, 2021

RichardScottOZ commented Jun 23, 2022

ks905383 commented Jul 3, 2023

Aggregation breaks with small enough shapefiles #14

Aggregation breaks with small enough shapefiles #14

Comments

bradyrx commented Aug 9, 2021

RichardScottOZ commented Jun 23, 2022

ks905383 commented Jul 3, 2023