-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sentinel-1 Subpackage #84
Conversation
elaborating on @kylebarron 's comment on footprint accuracy, below is a QGIS screenshot that illustrates the partial extent of an RTC product (bright foreground image), (covering a MGRS grid square (pink), GRD frame footprints (gray), GRD product with nodata (transparent+black). The lateral nodata of the frame footprints will vary with geographic location, in this case ~5km. also more information on GRD nodata here (scroll to bottom) https://asf.alaska.edu/how-to/data-recipes/geocode-sentinel-1-with-qgis-3-x/ |
I spent some time looking into this for Sentinel-1 a while back and used rasterio to get the footprints. I also tried using overviews to generate the footprints so it would be faster and wouldn't require reading as much of the file, but it really worked quite a bit better using the full resolution file. The resulting geometry tends to be very large and way too big, so you have to simplify it. It's a balancing act to simplify it enough to you still have the detail needed while keeping the geometry size small. You can see in the gifs below the difference between the provided geometry (blue) and the one calculated with the above code (red) This is a typical image and the provided boundary overestimates the area cross-track, and underestimates it along-track. This is a more unusual collect that is along the water where some of the water was masked out in the data, and we see the situation is even worse. The generated footprint in red is far better. There might be edge cases where this would fail, thus before going operational I think it should be used to generate a few hundred boundaries to be visually inspected. If the image were already downloaded or locally available, it's not that bad to run this processing and certainly worth it for the better footprint. The code is here: This approach could also be used for Landsat: #83 |
just make sure to Edits: 🤦 just seeing that you have a default simplify options set to 0.005 in the script you shared. Another note, is that you need to assume the |
@scottyhq Looking at your implementation, I see options for scale and precision but not simplify. Did you find that using the right combo of scale and precision precluded the need to simplify the geometry? |
Thanks for the examples everyone! it does seem like there is some potential for a common extract_footprint function in stactools.core ...
My last commit has a slightly modified footprint version that i've been using (main difference being the use of I have some other general questions, so also pinging @m-mohr to get feedback on stac-extensions. I think the RTC dataset would benefit from using a few other extensions that aren't yet implemented in pystac. In particular:
|
Pushed forward a little more today but haven't had time to look closely... test catalog is here https://github.com/scottyhq/sentinel1-rtc-stac I'm able to run (abbreviated) tiles_1 | 172.19.0.1:59252 - "GET /cog/tiles/8/38/89?url=https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/U/CU/2017/S1A_20170101_10UCU_ASC/Gamma0_VV.tif HTTP/1.1" 500
tiles_1 | [2021-04-02 06:55:03 +0000] [58] [ERROR] Exception in ASGI application
tiles_1 | Traceback (most recent call last):
tiles_1 | File "/usr/local/lib/python3.8/site-packages/titiler/endpoints/factory.py", line 334, in tile
tiles_1 | content = image.render(
tiles_1 | File "/usr/local/lib/python3.8/site-packages/rio_tiler/models.py", line 262, in render
tiles_1 | return render(self.data, self.mask, img_format=img_format, **kwargs)
tiles_1 | File "/usr/local/lib/python3.8/site-packages/rio_tiler/utils.py", line 381, in render
tiles_1 | dst.write(mask.astype(data.dtype), indexes=count + 1)
tiles_1 | File "rasterio/_base.pyx", line 394, in rasterio._base.DatasetBase.__exit__
tiles_1 | File "rasterio/_base.pyx", line 385, in rasterio._base.DatasetBase.close
tiles_1 | File "rasterio/_io.pyx", line 2111, in rasterio._io.BufferedDatasetWriterBase.stop
tiles_1 | File "rasterio/_err.pyx", line 215, in rasterio._err.exc_wrap_pointer
tiles_1 | rasterio._err.CPLE_NotSupportedError: PNG driver doesn't support data type Float32. Only eight bit (Byte) and sixteen bit (UInt16) bands supported.
tiles_1 | PNG driver doesn't support data type Float32. Only eight bit (Byte) and sixteen bit (UInt16) bands supported. @vincentsarago any suggestions there? I thought titiler automatically scaled Float32s to 0:255 ? |
@scottyhq For your questions 2 and 3, both will soon be in the raster extension (per band). Currently the file extension has some of the fields, but those will be removed soon. |
@scottyhq no titiler doesn't do that, it's up to the client to tell the tiler what to do! |
@scottyhq I've tried convex hull and for our use case overestimating was not what we wanted because it could lead to extracting AOIs that were completely empty. Convex hull could overestimate a lot, such as in my second gif above. If this were a general function would be good to have the option of taking the convex hull though, for some uses overestimating might be better. For the grid info, yes it should use the new MGRS extension. |
So far I've found convex_hull to be the best way to guarantee 1) valid data encompassed and 2) less than 100 points in footprint. Coastal areas with strange no-data regions over water lead to over-estimated footprints with convex hull (screenshot below), but over land the footprints end up being tight and simple. The jagged edges due to terrain correction, especially in high-relief mountainous areas create quite complex footprints even when using simplify(). So agreed, if there is a generic footprint-extraction function it would need to have several keyword options. I'm pretty happy with how this is looking now with the test catalog in https://github.com/scottyhq/sentinel1-rtc-stac, as noted in #86 if you want to use stac browser locally with the amplitude images it's best to add an expression and rescaling: |
@matthewhanson if you have a chance to review, could you look this over? I suspect a follow-up PR adding support for S1 GRD or SLC could refactor things a bit, but for now this seems OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked any of the bounds stuff yet, but if you can't get to it @matthewhanson lmk and I'll dig in and make my best guess on what looks good.
@scottyhq one question, do the VRTs in the test data directory need to have a .tif
extension, or can they have .vrt
? Just a little initially confusing to see the extension/filetype mismatch.
finally figured out how to deploy a stac browser on github pages (https://scottyhq.github.io/sentinel1-rtc-stac) for the test catalog mentioned above. I think having a CI-deployed browser like this is mighty convenient for review of new packages! Would appreciate any other pointers on getting this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a glance at the bounds and they seemed reasonable to me. We can always tweak them later if it is necessary, but for now I'm approving to get this in.
CI tests timed out with |
@matthewhanson or @rbavery would love to get a second pair of eyes on the metadata fields from someone who has worked with S1 data, you can browse the test catalog here: https://scottyhq.github.io/sentinel1-rtc-stac. |
I think @matthewhanson is PTO today, so I'll wait for @rbavery's review before merging. Thanks for the fixup @scottyhq! |
@scottyhq @gadomski I might be missing something, but for this example scene why does the metadata field |
Also, as a newcomer to working with SAR, I'm uninformed when it comes to the differences between different RTC products. For example, the ASF Vertex publishes two RTC products, a version relying on GAMMA+SNAP and a version relying only on SNAP: https://hyp3.asf.alaska.edu/about#list-rtcs1tbx I assume not all RTC is equal, from an email from @scottyhq it sounds like GAMMA RTC is preferred for time series analysis. So can we capture information about the quality/type of RTC in the metadata, as opposed to only listing These are my only two questions, everything else looks great. |
Good catch, I'll fix that. This also made me realize the STAC Browser URLs are unfortunately not 'human-friendly' right now (radiantearth/stac-browser#46) so linking to that issue.
There will likely be even more software options for SAR in the coming years. It's not that there is preferred software, but that the processing approach (DEM, parameters, algorithms) may vary. Ideally the entire processing pipeline could be open-source (e.g. hyp3 + SNAP or ISCE), and at the very least the product documentation should capture the full workflow. happy to wait on comments from @matthewhanson before merging. |
granule_href: str, | ||
asset: str = 'local_incident_angle.tif', | ||
additional_providers: Optional[List[pystac.Provider]] = None, | ||
include_grd_metadata: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to link back to the original metadata files as assets, so I think the default here should be True
Thanks for all this work and patience @scottyhq , it's looking really awesome. Looking forward to giving it a spin and indexing the dataset. I left a comment, but now i'm not sure about it. The productInfo.json files aren't really for these Items, they are for the SLC data from which these were derived. So the proper behavior would be to include a link with The other link type to consider including is Regarding the projection extension info - I find |
Thanks for looking this over @matthewhanson ! My plan is to do one more pass late this week or next, re-generate the test catalog with more scenes, then hopefully get this merged.
I'll have to brush up on STAC link types which I don't know much about. Unfortunately for this case a lot of metadata is coming from the processing workflow (GRDs rather than SLCs for this data). I think the key is just to provide the list of GRD frame ids so that people can get at the original files and metadata if needed from ESA.
What's your opinion on using NASA CMR STAC here? The GRD and SLC frames have a mirrored long term archive at NASA and as a result minimal STAC metadata for the original data does exist at https://cmr.earthdata.nasa.gov/stac/ASF/collections/SENTINEL-1A_DP_GRD_HIGH.v1 . A relevant issue though is needing a straightforward mapping of the granule ids (nasa/cmr-stac#21 (comment)). |
Closing in favor of https://github.com/stactools-packages/sentinel1. |
As described in #82 this adds an initial subpackage for sentinel-1 SAR data. I'm currently only working with a derived public dataset (https://registry.opendata.aws/sentinel-1-rtc-indigo/), so this PR mainly adds a
create_item
function for that dataset. But I think the framework will help to add GRD, SLC, and other datasets in the future.I've tried to roughly follow the framework in the sentinel2 subpackage, and match the STAC 'sentinel:' properties from https://registry.opendata.aws/sentinel-2-l2a-cogs/ since both datasets use the MGRS grid, and hopefully this will help with interoperability.
Happy to refine this with input @matthewhanson @lossyrob !