Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sentinel-1 Subpackage #84

Closed
wants to merge 15 commits into from
Closed

Conversation

scottyhq
Copy link

As described in #82 this adds an initial subpackage for sentinel-1 SAR data. I'm currently only working with a derived public dataset (https://registry.opendata.aws/sentinel-1-rtc-indigo/), so this PR mainly adds a create_item function for that dataset. But I think the framework will help to add GRD, SLC, and other datasets in the future.

I've tried to roughly follow the framework in the sentinel2 subpackage, and match the STAC 'sentinel:' properties from https://registry.opendata.aws/sentinel-2-l2a-cogs/ since both datasets use the MGRS grid, and hopefully this will help with interoperability.

Happy to refine this with input @matthewhanson @lossyrob !

@scottyhq
Copy link
Author

elaborating on @kylebarron 's comment on footprint accuracy, below is a QGIS screenshot that illustrates the partial extent of an RTC product (bright foreground image), (covering a MGRS grid square (pink), GRD frame footprints (gray), GRD product with nodata (transparent+black). The lateral nodata of the frame footprints will vary with geographic location, in this case ~5km.

image

(https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/12/S/YJ/2021/S1A_20210105_12SYJ_DSC/Gamma0_VV.tif)

also more information on GRD nodata here (scroll to bottom) https://asf.alaska.edu/how-to/data-recipes/geocode-sentinel-1-with-qgis-3-x/

@matthewhanson
Copy link
Member

I spent some time looking into this for Sentinel-1 a while back and used rasterio to get the footprints. I also tried using overviews to generate the footprints so it would be faster and wouldn't require reading as much of the file, but it really worked quite a bit better using the full resolution file.

The resulting geometry tends to be very large and way too big, so you have to simplify it. It's a balancing act to simplify it enough to you still have the detail needed while keeping the geometry size small.

You can see in the gifs below the difference between the provided geometry (blue) and the one calculated with the above code (red)

This is a typical image and the provided boundary overestimates the area cross-track, and underestimates it along-track.

sentinel1-boundaries-small

This is a more unusual collect that is along the water where some of the water was masked out in the data, and we see the situation is even worse. The generated footprint in red is far better.

sentinel1-boundaries-odd-small

There might be edge cases where this would fail, thus before going operational I think it should be used to generate a few hundred boundaries to be visually inspected.

If the image were already downloaded or locally available, it's not that bad to run this processing and certainly worth it for the better footprint.

The code is here:
https://gist.github.com/matthewhanson/6be66c97c828acd1d39d8cbb97a0981e

This approach could also be used for Landsat: #83

@vincentsarago
Copy link
Member

vincentsarago commented Mar 31, 2021

If the image were already downloaded or locally available, it's not that bad to run this processing and certainly worth it for the better footprint.

just make sure to simplify the geometry or you could have hundreds/thousands points ... which will result in a really big stac items. This is especially true for GRD data where you can see that pixels don't follow a straight line on the border.

Ref: https://github.com/vincentsarago/landsat-footprint/blob/master/landsat_footprint/scripts/cli.py#L113-L120

Edits: 🤦 just seeing that you have a default simplify options set to 0.005 in the script you shared.

Another note, is that you need to assume the nodata/mask value is correct in the data.

@matthewhanson
Copy link
Member

@scottyhq Looking at your implementation, I see options for scale and precision but not simplify. Did you find that using the right combo of scale and precision precluded the need to simplify the geometry?

@scottyhq
Copy link
Author

scottyhq commented Apr 1, 2021

Thanks for the examples everyone! it does seem like there is some potential for a common extract_footprint function in stactools.core ...

I see options for scale and precision but not simplify. Did you find that using the right combo of scale and precision precluded the need to simplify the geometry?

My last commit has a slightly modified footprint version that i've been using (main difference being the use of convex_hull which i like because it guarantees the footprint geometry encompasses the pixels). I suppose you might still want to run simplify()... I'll plan to create a test static catalog with 10-100 items to test. I tend to use lat/lon decimal precisions of 4 or 5 (corresponding to ~10 to 1 m geocoding precision) since RTC has 20m posting. At the end of the day, it's STAC-api clients using these footprints to return data intersecting some query, right, so slight overestimates are OK? I have a hunch if you want to be as precise as possible with the GRD footprints because they span ~250km you might want to 1. warpedVRT to a mercator projection centered on the approx footprint center, 2. apply shapely functions, 3. transform points to lat/lon?

I have some other general questions, so also pinging @m-mohr to get feedback on stac-extensions. I think the RTC dataset would benefit from using a few other extensions that aren't yet implemented in pystac. In particular:

  1. @matthewhanson is the idea with new sentinel2 catalogs to use the MGRS extension?
  2. I'd like to specify NoData=0 for each asset, not sure where that goes... seems like https://github.com/stac-extensions/raster
  3. Another thing that would nice to capture in this metadata is dtype and scaling (for local incident_angle.tif "Angle (degrees) is scaled and converted to UInt16 to save storage space. For example, 4500 is 45 degrees."

@scottyhq
Copy link
Author

scottyhq commented Apr 2, 2021

Pushed forward a little more today but haven't had time to look closely... test catalog is here https://github.com/scottyhq/sentinel1-rtc-stac

I'm able to run stac browse (which is so awesome by the way!) and navigate. I expect the COGs to render, but I'm seeing tracebacks in the logs such as :

(abbreviated)

tiles_1    | 172.19.0.1:59252 - "GET /cog/tiles/8/38/89?url=https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/U/CU/2017/S1A_20170101_10UCU_ASC/Gamma0_VV.tif HTTP/1.1" 500
tiles_1    | [2021-04-02 06:55:03 +0000] [58] [ERROR] Exception in ASGI application
tiles_1    | Traceback (most recent call last):

tiles_1    |   File "/usr/local/lib/python3.8/site-packages/titiler/endpoints/factory.py", line 334, in tile
tiles_1    |     content = image.render(
tiles_1    |   File "/usr/local/lib/python3.8/site-packages/rio_tiler/models.py", line 262, in render
tiles_1    |     return render(self.data, self.mask, img_format=img_format, **kwargs)
tiles_1    |   File "/usr/local/lib/python3.8/site-packages/rio_tiler/utils.py", line 381, in render
tiles_1    |     dst.write(mask.astype(data.dtype), indexes=count + 1)
tiles_1    |   File "rasterio/_base.pyx", line 394, in rasterio._base.DatasetBase.__exit__
tiles_1    |   File "rasterio/_base.pyx", line 385, in rasterio._base.DatasetBase.close
tiles_1    |   File "rasterio/_io.pyx", line 2111, in rasterio._io.BufferedDatasetWriterBase.stop
tiles_1    |   File "rasterio/_err.pyx", line 215, in rasterio._err.exc_wrap_pointer
tiles_1    | rasterio._err.CPLE_NotSupportedError: PNG driver doesn't support data type Float32. Only eight bit (Byte) and sixteen bit (UInt16) bands supported.  
tiles_1    | PNG driver doesn't support data type Float32. Only eight bit (Byte) and sixteen bit (UInt16) bands supported.  

@vincentsarago any suggestions there? I thought titiler automatically scaled Float32s to 0:255 ?

@m-mohr
Copy link

m-mohr commented Apr 2, 2021

@scottyhq For your questions 2 and 3, both will soon be in the raster extension (per band). Currently the file extension has some of the fields, but those will be removed soon.
Not sure whether there are other questions I can help with?

@vincentsarago
Copy link
Member

@vincentsarago any suggestions there? I thought titiler automatically scaled Float32s to 0:255 ?

@scottyhq no titiler doesn't do that, it's up to the client to tell the tiler what to do!

@matthewhanson
Copy link
Member

@scottyhq I've tried convex hull and for our use case overestimating was not what we wanted because it could lead to extracting AOIs that were completely empty. Convex hull could overestimate a lot, such as in my second gif above.

If this were a general function would be good to have the option of taking the convex hull though, for some uses overestimating might be better.

For the grid info, yes it should use the new MGRS extension.

@scottyhq
Copy link
Author

scottyhq commented Apr 9, 2021

@scottyhq I've tried convex hull and for our use case overestimating was not what we wanted because it could lead to extracting AOIs that were completely empty. Convex hull could overestimate a lot, such as in my second gif above.

So far I've found convex_hull to be the best way to guarantee 1) valid data encompassed and 2) less than 100 points in footprint. Coastal areas with strange no-data regions over water lead to over-estimated footprints with convex hull (screenshot below), but over land the footprints end up being tight and simple. The jagged edges due to terrain correction, especially in high-relief mountainous areas create quite complex footprints even when using simplify(). So agreed, if there is a generic footprint-extraction function it would need to have several keyword options.

I'm pretty happy with how this is looking now with the test catalog in https://github.com/scottyhq/sentinel1-rtc-stac, as noted in #86 if you want to use stac browser locally with the amplitude images it's best to add an expression and rescaling: &expression=sqrt(b1)&rescale=0,1:

Screen Shot 2021-04-09 at 9 43 39 AM

@scottyhq
Copy link
Author

@matthewhanson if you have a chance to review, could you look this over? I suspect a follow-up PR adding support for S1 GRD or SLC could refactor things a bit, but for now this seems OK

Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked any of the bounds stuff yet, but if you can't get to it @matthewhanson lmk and I'll dig in and make my best guess on what looks good.

@scottyhq one question, do the VRTs in the test data directory need to have a .tif extension, or can they have .vrt? Just a little initially confusing to see the extension/filetype mismatch.

@gadomski gadomski mentioned this pull request May 11, 2021
@gadomski gadomski added this to the v0.1.6 milestone May 11, 2021
@gadomski gadomski linked an issue May 11, 2021 that may be closed by this pull request
@scottyhq
Copy link
Author

finally figured out how to deploy a stac browser on github pages (https://scottyhq.github.io/sentinel1-rtc-stac) for the test catalog mentioned above. I think having a CI-deployed browser like this is mighty convenient for review of new packages! Would appreciate any other pointers on getting this merged.

@scottyhq scottyhq requested a review from gadomski May 26, 2021 11:09
Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a glance at the bounds and they seemed reasonable to me. We can always tweak them later if it is necessary, but for now I'm approving to get this in.

@scottyhq
Copy link
Author

CI tests timed out with ERROR: Cannot install stactools, stactools-sentinel1==0.1.4, stactools==0.1.5 and stactools[all]==0.1.5 because these package versions have conflicting dependencies. So I bumped the version to 0.1.5, but looks like it needs to be re-approved to run tests

@scottyhq
Copy link
Author

@matthewhanson or @rbavery would love to get a second pair of eyes on the metadata fields from someone who has worked with S1 data, you can browse the test catalog here: https://scottyhq.github.io/sentinel1-rtc-stac.

@gadomski gadomski requested review from matthewhanson and removed request for matthewhanson May 27, 2021 13:03
@gadomski
Copy link
Member

I think @matthewhanson is PTO today, so I'll wait for @rbavery's review before merging. Thanks for the fixup @scottyhq!

@rbavery
Copy link

rbavery commented May 27, 2021

@scottyhq @gadomski I might be missing something, but for this example scene why does the metadata field Polarizations list VV,HH but the Assets tab shows Gamma0 VV backscatter and Gamma0 VH backscatter?

@rbavery
Copy link

rbavery commented May 27, 2021

Also, as a newcomer to working with SAR, I'm uninformed when it comes to the differences between different RTC products. For example, the ASF Vertex publishes two RTC products, a version relying on GAMMA+SNAP and a version relying only on SNAP: https://hyp3.asf.alaska.edu/about#list-rtcs1tbx

I assume not all RTC is equal, from an email from @scottyhq it sounds like GAMMA RTC is preferred for time series analysis. So can we capture information about the quality/type of RTC in the metadata, as opposed to only listing RTC for the Product Type field?

These are my only two questions, everything else looks great.

@scottyhq
Copy link
Author

I might be missing something, but for this example scene why does the metadata field Polarizations list VV,HH but the Assets tab shows Gamma0 VV backscatter and Gamma0 VH backscatter?

Good catch, I'll fix that. This also made me realize the STAC Browser URLs are unfortunately not 'human-friendly' right now (radiantearth/stac-browser#46) so linking to that issue.

I assume not all RTC is equal, from an email from @scottyhq it sounds like GAMMA RTC is preferred for time series analysis. So can we capture information about the quality/type of RTC in the metadata, as opposed to only listing RTC for the Product Type field?

sar:product_type in the STAC extension could be any string I believe, but is generally an abbreviated identifier. But your question identifies a critical need to link to original product documentation is somewhere in the STAC catalog. Right now under 'providers' there is a link to https://registry.opendata.aws/sentinel-1-rtc-indigo, and then there is a link on that page to methodology docs https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/methodology.html

There will likely be even more software options for SAR in the coming years. It's not that there is preferred software, but that the processing approach (DEM, parameters, algorithms) may vary. Ideally the entire processing pipeline could be open-source (e.g. hyp3 + SNAP or ISCE), and at the very least the product documentation should capture the full workflow.

happy to wait on comments from @matthewhanson before merging.

@gadomski gadomski requested a review from matthewhanson May 27, 2021 23:36
granule_href: str,
asset: str = 'local_incident_angle.tif',
additional_providers: Optional[List[pystac.Provider]] = None,
include_grd_metadata: Optional[bool] = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to link back to the original metadata files as assets, so I think the default here should be True

@matthewhanson
Copy link
Member

Thanks for all this work and patience @scottyhq , it's looking really awesome. Looking forward to giving it a spin and indexing the dataset.

I left a comment, but now i'm not sure about it. The productInfo.json files aren't really for these Items, they are for the SLC data from which these were derived. So the proper behavior would be to include a link with rel=derived_from to point to the STAC Item from which it came from. But the SLC data have no STAC catalog.

The other link type to consider including is via, which should point to the metadata used to create the STAC metadata. In this case though it's the the tifs themselves that are the source of metadata. But is a lot of that metadata simply copied over from the productInfo.json file? If so, then I think it would be better to link to the productInfo.json file(s) with a via link and leave them off the assets.

Regarding the projection extension info - I find proj:bbox in that extension redundant. Everything we need to know can be determined from the geotransform, so it would be my preference to not include it.

@scottyhq
Copy link
Author

scottyhq commented Jun 7, 2021

Thanks for looking this over @matthewhanson ! My plan is to do one more pass late this week or next, re-generate the test catalog with more scenes, then hopefully get this merged.

The productInfo.json files aren't really for these Items, they are for the SLC data from which these were derived. So the proper behavior would be to include a link with rel=derived_from to point to the STAC Item from which it came from. But the SLC data have no STAC catalog.

I'll have to brush up on STAC link types which I don't know much about. Unfortunately for this case a lot of metadata is coming from the processing workflow (GRDs rather than SLCs for this data). I think the key is just to provide the list of GRD frame ids so that people can get at the original files and metadata if needed from ESA.

But the SLC data have no STAC catalog.

What's your opinion on using NASA CMR STAC here? The GRD and SLC frames have a mirrored long term archive at NASA and as a result minimal STAC metadata for the original data does exist at https://cmr.earthdata.nasa.gov/stac/ASF/collections/SENTINEL-1A_DP_GRD_HIGH.v1 . A relevant issue though is needing a straightforward mapping of the granule ids (nasa/cmr-stac#21 (comment)).

@gadomski gadomski removed this from the v0.1.6 milestone Jun 9, 2021
@gadomski
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Sentinel 1 subpackage
9 participants