Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix shape masking #6129

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Conversation

hsteptoe
Copy link
Contributor

🚀 Pull Request

Description

Fixes #6126 by exposing the shape geometry to the user. Adds to docs and docstrings to make it clearer how to use this.


Consult Iris pull request check list


Add any of the below labels to trigger actions on this PR:

  • benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts

@pp-mo
Copy link
Member

pp-mo commented Aug 28, 2024

From @SciTools/peloton : discussed what we are seeing here and in #6126.
We think this is looking useful, but the two of you seem to have it covered @hsteptoe @acchamber ?
Please shout if you need help.

@hsteptoe
Copy link
Contributor Author

Discussion with @acchamber notes:

  • Don't move imports out of functions as this creates circular import errors
  • Check target_crs assignment for occasion when cube doesn't have a CRS
  • Re-write conversion of 0-360 -> -180-180 coords in case of degree based CRSs

@hsteptoe
Copy link
Contributor Author

What are @SciTools/peloton (@pp-mo) thoughts about adding rasterio as a Iris dependency?

This would facilitate better handling of shapes to rasters (which is essentially the problem we're solving) with the added bonus that we could also mask to other shape types, like lines, to extract a trajectory from a Cube, for example.

@acchamber
Copy link
Contributor

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Dec 2, 2024

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

I was planning on going to the AVD surgery this week to discuss some changes and/or make the case for rasterio

@trexfeathers
Copy link
Contributor

Thanks for your patience everyone. It's sometimes a struggle to balance everything and we just haven't had an opportunity to discuss this. Coming to the Surgery (UK Met Office) is an ideal next step.

@stephenworsley
Copy link
Contributor

@pp-mo, @trexfeathers and @stephenworsley have discussed this at the Surgery and agree in principle that we could consider adding rasterio as an optional dependency.

@trexfeathers
Copy link
Contributor

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 5, 2025

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

I just need to find more time, which is in short supply as we get near the end of FY...

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 14, 2025

This is now ready for some initial (alpha?) testing 🎉. @acchamber are you available to try out the new features? (Updates to docs etc. will follow if this is successful) Tests are incomplete are should not be review yet.

This is a fairly substantial re-write. New/improved features include:

  • Automatic CRS transforms. Will handle situation where the cube and geometry have different CRSs by transforming the geometry.
  • Handles all shape types, including Points, Lines and MultiPolygons.

There are some breaking changes, principally moving away from having a minimum_weight keyword, replacing it with an all_touched keyword. The rational here is for consistency across shape types. Minimum weight doesn't make sense for Points and Lines. Instead, Line pixilation is based on Bresenham's line algorithm. Polygon pixilation is based on either: (a) pixel centres that intersect the polygon, or (b) all pixels touched by the polygon irrespective of the area of intersection.

My informal tests so far show working behaviour for iris test data from rotated_pole.nc, toa_brightness_stereographic.nc, E1_north_america.nc, air_temp.pp. Know limitation (not-working) for semi-structured model grids, such as orca2_votemper.nc. Some test shapefiles 👉test_shapefiles.zip

@bjlittle bjlittle added this to the v3.13 milestone Feb 17, 2025
@acchamber
Copy link
Contributor

acchamber commented Feb 17, 2025

I'll see if I can find the time to play around with this alpha in the next two weeks, to give some feedback, however I am concerned by the breaking changes here.

Being unable to mask the anti-meridian or the poles feels like a major flaw - As you've pointed out fixing it requires some guess of intention by the program of the user's desires, but a user being unable to mask shapes such as - Russia , The Pacific Ocean, The Arctic circle or Antarctica is a deteriment to many science workflows, especially when some of these do work in the current function.

I am also concerned about losing minimum_weight - while you are right it doesn't make sense for Points and Lines on the face of it, reducing functionality down from a user-adjustable spectrum to a binary choice does not fit the variety of scales shapefiles can have compared to model resolution - there's enough use cases that want an option between the two extremes of "touch the centre only or touch anywhere in the box". Especially if centred-touching is the default - it makes it quite possible for a user to mask a cube and receive a fully masked cube back for smaller shapefiles and not have a obvious answer as to why. I can see mimimum_weight becoming a optional keyword but could not support removing it completely.

@hsteptoe
Copy link
Contributor Author

I'll see if I can find the time to play around with this alpha in the next two weeks, to give some feedback, however I am concerned by the breaking changes here.

Being unable to mask the anti-meridian or the poles feels like a major flaw - As you've pointed out fixing it requires some guess of intention by the program of the user's desires, but a user being unable to mask shapes such as - Russia , The Pacific Ocean, The Arctic circle or Antarctica is a deteriment to many science workflows, especially when some of these do work in the current function.

I am also concerned about losing minimum_weight - while you are right it doesn't make sense for Points and Lines on the face of it, reducing functionality down from a user-adjustable spectrum to a binary choice does not fit the variety of scales shapefiles can have compared to model resolution - there's enough use cases that want an option between the two extremes of "touch the centre only or touch anywhere in the box". Especially if centred-touching is the default - it makes it quite possible for a user to mask a cube and receive a fully masked cube back for smaller shapefiles and not have a obvious answer as to why. I can see mimimum_weight becoming a optional keyword but could not support removing it completely.

Re. meridian crossing, I'll add some more shapefile test cases for the Arctic, Antarctic and Pacific Ocean. I think that so long as these are MultiPolygons divided across the meridian it will work fine. You say 'some' of these work in the current function. Can you be more specific about which use cases do and don't work? This isn't really a limitation with the masking function, but more a limitation of the way shapefiles are represented and the connection a shape has with its CRS. There is nothing by default to specify that a shapefile that straddles the 180th meridian should connect across the meridian because it exists in a cartesian coordinate frame which by default doesn't have a cyclic boundary condition. So really it's up to the user (not Iris) to make sure that the shapefile they use (or construct) correctly specifies how the meridian crossing should be handled.

Re. minimum_weight, in principal we could look at adding this back in, but I still question how much this is actually used. The all_touched implementation is consistent with rasterising approaches in other GIS applications (and consistent with the underlying rasterio API), but I agree that making all_touched default to True is a good call and would replicate the current default behaviour. My guess is that users either go with the default (minimum_weight=0) or minimum_weight=0.5... is anyone really ever choosing minimum_weight=0.3 or minimum_weight=0.8? It would be good to understand what these use cases might be if they are.

@acchamber
Copy link
Contributor

, I'll add some more shapefile test cases for the Arctic, Antarctic and Pacific Ocean. I think that so long as these are MultiPolygons divided across the meridian it will work fine. You say 'some' of these work in the current function. Can you be more specific about which use cases do and don't work? This isn't really a limitation with the masking function, but more a limitation of the way shapefiles are represented and the connection a shape has with its CRS. There is nothing by default to specify that a shapefile that straddles the 180th meridian should connect across the meridian because it exists in a cartesian coordinate frame which by default doesn't have a cyclic boundary condition. So really it's up to the user (not Iris) to make sure that the shapefile they use (or construct) correctly specifies how the meridian crossing should be handled.

I know that Russia currently works in the current impletmentation with the "standard" Natural Earth shapefile. I have not actually checked for the Arctic/Antarctic, but I know there's active science that would need them to work, although I accept your point it's as much as about the shapefile construction as our impletmentation.

Re. minimum_weight, in principal we could look at adding this back in, but I still question how much this is actually used. The all_touched implementation is consistent with rasterising approaches in other GIS applications (and consistent with the underlying rasterio API), but I agree that making all_touched default to True is a good call and would replicate the current default behaviour. My guess is that users either go with the default (minimum_weight=0) or minimum_weight=0.5... is anyone really ever choosing minimum_weight=0.3 or minimum_weight=0.8? It would be good to understand what these use cases might be if they are.

I know I certainly am - it matters mostly when your grid resolution and your shapefile are similar magnitiudes - a good example being country-level masking on 1 degree or courser grids. You often get situations where a country is centred between grid boxes - being present in 20% of one, 30% of about, and 2% of a third as an example. Counting all the boxes equally is bad, but the shape doesn't nessacerily reach the centre of the boxes it is significantly in. Manual experimentation is required to find a suitable weight, doesn't mean it's not useful. You can also think of land-sea maskes and river watersheds for long-but-thin shapes. There's lots of different shapes you may use and data to apply them to, and assuming that all cases are covered by "all-touched" and centred-touching is reductive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: 🆕 Candidate
Development

Successfully merging this pull request may close these issues.

_transform_coord_system geometry is always assume to be None
6 participants