-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raster data footprint determination support #307
Conversation
Unsolicited opinions on the
|
Note that the issue for this is #85 where the initial draft code came from, and there's some examples of the code in action. Some responses to @pjhartzell
|
I'll also add in here that while it's use case driven whether or not overestimating or underestimating the data geometry is preferable, I think the default method should slightly underestimate the data geometry. As a user I don't want to get matches against scenes that end up having only nodata under my polygon, whereas if I miss a scene because it happened to have a handful of pixels on the edge that intersected with my polygon....I just don't see that as much of a big deal. |
@matthewhanson Agreed that shapely's Imagine the raster is all valid data. Blue line is bad, Green line is better. We can get green without heuristic densification by projecting the raster to wgs84 before extracting the valid data shape. |
Ah, gotcha @pjhartzell that makes perfect sense. Knowing if the scaling factor is usable is even more important then, since if we can get away with using even a 1/2 overview then we greatly reduce computation. Not a big deal if doing a handful of images, but over a million files this could amount to some sizable $. |
Yeah, the computational aspect of this is a definite drawback. |
One meta-comment: this is assuming that you're cataloging raster data that can be read with rasterio, and wouldn't work for tabular or Zarr / NetCDF assets (#218). I think the names |
To address some of these comments:
That code isn't used, so not relevant.
Yes -- this has been fixed. The code now either pulls nodata from the src metadata or requires the user to explicitly define what the no data value is. I chose not to just default it to 0, as while I think that's more common, I think it's a more apparent behavior to always get the entire image extent if no data is not defined than if it defaulted to 0, but your no data value isn't actually 0.
sieve wasn't the right option here -- sieve will connect the small no data polygons, but then you just get larger complex polygons within footprint. Using the convex hull gives what I would consider a footprint. Future work might be to attempt to calculate the alpha shape (a sort of "concave hull") but I don't see a clear benefit in using the more-complex alpha shape vs. the simpler convex hull. I think this is only beneficial when there are large, highly concave no data areas that only intersect one side of an image, which aren't common. I could see this being useful for another custom property of an Item, but not for the Item geometry, where you generally want something simpler and less accurate than larger and more accurate.
I left it in, but I'm still not sure what this is supposed to be used for.
already had some out-of-band conversation about this. |
I think I've balanced the densification / simplification concerns with the method parameters and the documentation around them. I explicitly put in the docs that users should emperically validate what the footprint looks like, and that they need to adjust the knobs to work for their particular data, and what those knobs mean. I think this is actually useful compared with a lot of the docs for the libraries i've used for this, which state what they do, but don't really explain how to configure things for different types of data. |
I think we get a bit of each here -- using the convex hull for the data area will overestimate the data area if there are lots of no-data concavities. I'm not really sure what the general effect of densification/simplification is along the edges -- I think in many cases the over/under balances out in terms of total area, and there will be some edges that are a little over and some a little under. |
renamed the module to "raster_footprint" |
Thinking about this a little more, since I don't even understand how scale is supposed to be used, it should probably be documented better. Does someone want to explain it to me? |
I'm +1 on keeping things simple. I would push back a little in that if a Agreed that sieve will connect the small nodata polygons into larger complex polygons within the footprint. But those larger complex polygons can then be simplified, no? The issue of holes, or inner polygons, will be an issue, but there might be some ways to remove inner polygons. But I'm open to leaving this to future work.
I guess my concern on this is that projection distortion is not the same everywhere. So a savvy user will pick an area with the worst distortion and tune things there. Or just use a very high densification factor (easier). But if just using a very high densification factor will do the trick, is there any reason we can't do this (with a bit more intelligence that just a very high value) for the user? And avoid users having to empirically tune the densification? Again, perhaps a future improvement. Speaking of densification, does densification by distance rather than by factor make sense? When densifying by factor, you can end up with a lot of very closely spaced points where the original polygon has very close vertices, and vice versa where the original vertices are far apart. |
The convexity and inclusion of nodata should be sufficiently documented. Let me know if you think it needs more.
This is a good point -- we may want to just put the densification factor to default to 10. This makes reprojection 10x slower. The additional points get cleaned up by the simplification, so the resulting geometry isn't necessarily 10x larger.
I think it would - we'd have to write our own code for this, as shapely only does equidistant densification. |
Point reprojection should be pretty cheap, though. At least compared to reprojecting an image. So I think it'd be OK to add more points.
I think a custom function ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a suggestion to remove the explicit preserve_topology
setting in densify_reproject_simplify
, I think this is good to go. We can look at handling concavity and perhaps a refined or automatic "densification" method going forward.
@pjhartzell @gadomski @matthewhanson that should resolve all the outstanding issues. |
39ddc1c
to
2f21370
Compare
@philvarner @pjhartzell I've been playing with this branch a bit this morning, and when I checked it against the landsat test data, I was surprised to find that the raster footprint did not include all pixels: Is this expected behavior? I think this has to do with the densification parameters (e.g. the MODIS example Preston posted in this thread) but just wanted to make sure |
@philvarner @pjhartzell I've made a couple of additions/changes to this PR:
I re-requested your review @pjhartzell, if you wouldn't mind looking at the command line interface in particular since no one else has put eyes on that. NB I've done a rebase of this branch (which is how I found the large file) so you'll have to force reset any local branches. I plan on rebase-with-partial-squash after approval to remove the large landsat file from the final history. |
@gadomski Still working through the review, but in answer to your question, the reason all the pixels are not included is because the
and so on. Here's a snapshot of the footprint with a precision of 3 (red) and 8 (green). I did not apply any simplification, so a precision of 8 wraps around each pixel: Of course this leads to a very large very large geometry coordinate list. If we keep a precision equal to 8 and set a simplification tolerance equal to roughly twice the pixel size (30mx2 = 60m --> 0.0006 degrees for this landsat band), we get something closer to what we expect (blue line): |
tests/data-files/raster_footprint/LC08_L1TP_198029_20220331_20220406_02_T1_B2.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the CLI on some files. Works well. Looks like the updated precision broke the tests, though. 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
This includes a cropped version of the landsat image; the original was > 90MB.
It was confusing.
defdb2a
to
42cff3e
Compare
Related Issue(s):
Description:
These changes add a new module
raster_footprint
that provides parameterized methods for generating data footprints from raster data. This includes a method to update the geometry of an Item based on the footprint determined from the data assets in the Item.PR checklist:
scripts/format
).scripts/lint
).scripts/test
).