-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sentinel-1 dataset #821
Conversation
|
|
|
I'm not sure if I'm the right person for this PR, I don't even know the difference between:
|
According to https://docs.terradue.com/geohazards-tep/tutorials/s1-grd-rgb-composite.html, we want GRD, not SLC, and we want VV (red), HH (green), and VV / VH (blue) |
According to https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar/product-overview/polarimetry, we want VV (red), VH (green), and |VV| / |VH| (blue) |
I'd recommend making two different classes: one for Sentinel-1 SLCs, and one for Sentinel-1 GRDs. Punt on the first one for now since that involves dealing with non-georeferenced (kind of), complex-valued data. GRD should be easy to integrate. Plot the VV/VH bands separately as 2x1 subplot. |
Yes. S1B is broken and will not be fixed. You'll only see S1A after Dec 2021. SLC = single look, complex We only care about GRD and SLC, in that order.
Just like how EO can be captured at different wavelengths, SAR is captured at various polarizations. The first letter tells us which polarization was transmitted, the second one tells us which polarization was received. HH = H transmit, H receive. HH+HV is just a literal sum. These are different acquisition modes for the Sentinel-1 satellites. They trade-off capture area and resolution based on different desired characteristics. Regardless of mode, you'll still have SLC/GRD products (besides WV. That only has SLC and OCN). |
I downloaded |
Yep, these files use ground control points, not CRS. This is a fun hiccup. |
If anyone wants to follow along, the exact files I'm trying to use are https://scihub.copernicus.eu/dhus/odata/v1/Products('d3b5c561-ac0a-4617-a040-75708c0d28b8')/$value Not sure if that URL will work for others though. |
7130c30
to
e4a42e9
Compare
@RitwikGupta it would be good to get your review of this since you're the SAR expert (at least among us). We can always expand this in future PRs, especially if you're focusing on better ML for SAR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limitations on what kinds of files can be in one Dataset instance are currently wrong.
I don't think there's any reason why we need to limit the kinds of bands available within the dataset at a time. Also, we need to be mindful that cross polarization is reciprocal.
We need to limit that a dataset can either be
["HH", "HV"], | ||
["HV", "HH"], | ||
["VV", "VH"], | ||
["VH", "VV"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these not just set(["HH", "HV"])
and set(["VV", "VH"])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our MSI datasets, we want to be able to optionally select a subset of bands (RGB-only, RGB+NIR, etc.). I was following the same approach here, but I'm not actually sure if that makes sense for SAR. Is there any reason why someone might want to only use a single band and not both? If not, I can simplify the code quite a bit.
# Sentinel-1 uses dual polarization, it can only transmit either | ||
# horizontal or vertical at a single time | ||
msg = """ | ||
'bands' cannot contain both horizontal and vertical transmit at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? Is this limitation solely for plotting purposes? It is totally valid for multiple passes over the same area to be of different polarizations. It's also valid for consecutive captures from the satellite to be of different transmit/receive pairings. Additionally, the intensities for VH and HV are equal due to Helmholtz reciprocity!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limitation is inherent in our current RasterDataset
implementation. Basically, every GeoDataset
has an R-tree index containing a single filename for each spatiotemporal location. We populate this R-tree during __init__
using a filename glob, which has to contain a single band name (can't be both HH and VV). Then, during __getitem__
, we loop over all bands and swap the band in the filename to load the data. If all bands aren't present for every spatiotemporal location, then loading will fail. This was designed for MSI and HSI imagery without any thought towards (or knowledge of) SAR data.
There are a few options to deal with this:
1. Leave it up to the user
This is our current approach. The user can create two datasets and compute the union themselves.
Pro: simple
Con: users may have to do extra work
2. Compute the union for them
We could subclass UnionDataset
instead of RasterDataset
and create two RasterDatasets
for the user. This is actually a very elegant solution and shouldn't be that difficult.
Pro: users don't have to do any extra work
Con: it would not be possible to change bands. You could create a dataset containing all 4 bands, but it would not be possible to create a dataset with only 2 bands
I assume horizontal transmit and vertical transmit are fundamentally different and some users may want to work with only one or the other but not both? This would not be possible with this idea. A workaround to support this would be to create 3 datasets:
- Sentinel1Horizontal: [HH, HV]: RasterDataset
- Sentinel1Vertical: [VV, VH]: RasterData
- Sentinel1: [HH, HV, VV, VH]: UnionDataset
3. Override __init__
and __getitem__
in Sentinel1
We could replace the default logic for populating the R-tree and loading the data to search for both HH and VV and to load [HH, HV] and [VV, VH].
Pro: would support 4 bands or 2 bands
Con: mostly duplicate code, more difficult to maintain, need to do this for every bi-pole SAR instrument
4. Modify RasterDataset to support optional bands
Instead of populating the R-tree with a single band for each spatiotemporal location, we could load all bands we can find. Then, at loading time, we only load the bands that were found.
Pro: supports both SAR as well as MSI/HSI where the user might not have downloaded all bands but forgot to change bands
Con: significantly slower since R-tree lookup will go from
Let me know how you feel about 1–4. I'm leaning towards 1 or 2. I'm hesitant to go with 3 because duplicate code == evil, and 4 would require a separate PR and longer discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer 1 -- until we have someone who is actively using SAR imagery with TorchGeo let's not spend too much time making this class perfect
|
||
title = f"({bands[0]})" | ||
else: | ||
# Both horizontal and vertical receive, plot as RGB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to plot these images as RGB? The case to plot RGB images is rare as the third band ratio choice is extremely dependent on application. Why not just present an Nx1 subplot with all bands plotted in grayscale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because RGB looks cooler 😄
Seriously though, I only went with RGB cuz that's how we plot MSI datasets. When I view thumbnails of these images on Copernicus, they also choose to display an RGB image. What other third band ratio choices are common? We could add an option to select one or the other when plotting. Of course, users can always implement their own plotting code, this is just supposed to be a reasonable default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not tied to displaying an image one way or another, but if it makes the code easier, then displaying N grayscale bands would be infinitely easier than dealing with all these combinations/ratios for RGB imagery.
* Add Sentinel-1 dataset * SLC -> GRD * Color scaling * Improve image title * Newline before lists * New in version 0.4 * Divide before scaling * Documentation clarifications * Relax constraints on bands * One more combo * Fix syntax * Fix syntax * Fix flip * Fix test name * pyupgrade * Clarify co/cross-pol, backscatter coeff, and scale
I've never worked with Sentinel-1 or SAR data before, so I have a ton of questions. Maybe @isaaccorley can help?