Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sentinel-1 dataset #821

Merged
merged 16 commits into from
Jan 23, 2023
Merged

Add Sentinel-1 dataset #821

merged 16 commits into from
Jan 23, 2023

Conversation

adamjstewart
Copy link
Collaborator

I've never worked with Sentinel-1 or SAR data before, so I have a ton of questions. Maybe @isaaccorley can help?

  1. The files have no CRS/transform, what's up with that?
  2. How do I plot SAR data?

@adamjstewart adamjstewart mentioned this pull request Oct 6, 2022
@isaaccorley
Copy link
Collaborator

  1. That doesn't seem right. Where are you getting the data from? PC?

  2. Normally the bands are plotted as a false color image. So you would make some combination of 3 bands like (VV, VH, ratio of VV/VH) and then plot it as if it's RGB. Of course this will likely be noisy since the data is typically heavily preprocessed.

@adamjstewart adamjstewart added this to the 0.4.0 milestone Oct 6, 2022
@adamjstewart
Copy link
Collaborator Author

  1. Data is from Copernicus, haven't looked at PC data yet

@ashnair1 ashnair1 added the datasets Geospatial or benchmark datasets label Nov 18, 2022
@adamjstewart
Copy link
Collaborator Author

  1. There are 3 VV and 3 VH. What's the difference? Should I plot all 3 combos?

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Dec 4, 2022

I'm not sure if I'm the right person for this PR, I don't even know the difference between:

@adamjstewart
Copy link
Collaborator Author

According to https://docs.terradue.com/geohazards-tep/tutorials/s1-grd-rgb-composite.html, we want GRD, not SLC, and we want VV (red), HH (green), and VV / VH (blue)

@adamjstewart
Copy link
Collaborator Author

According to https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar/product-overview/polarimetry, we want VV (red), VH (green), and |VV| / |VH| (blue)

@RitwikGupta
Copy link
Contributor

I'd recommend making two different classes: one for Sentinel-1 SLCs, and one for Sentinel-1 GRDs. Punt on the first one for now since that involves dealing with non-georeferenced (kind of), complex-valued data. GRD should be easy to integrate. Plot the VV/VH bands separately as 2x1 subplot.

@RitwikGupta
Copy link
Contributor

RitwikGupta commented Dec 5, 2022

@adamjstewart

S1A/S1B: I guess there's just 2 satellites

Yes. S1B is broken and will not be fixed. You'll only see S1A after Dec 2021.

SLC/GRD/OCN/RAW

SLC = single look, complex
GRD = ground range detected
OCN = ocean
RAW = literally raw radar data

We only care about GRD and SLC, in that order.

HH/VV/VH/HV/HH+HV/VV+VH

Just like how EO can be captured at different wavelengths, SAR is captured at various polarizations. The first letter tells us which polarization was transmitted, the second one tells us which polarization was received. HH = H transmit, H receive. HH+HV is just a literal sum.

SM/IW/EW/WV

These are different acquisition modes for the Sentinel-1 satellites. They trade-off capture area and resolution based on different desired characteristics. Regardless of mode, you'll still have SLC/GRD products (besides WV. That only has SLC and OCN).

@adamjstewart
Copy link
Collaborator Author

I downloaded S1A_IW_GRDH_1SDV_20221204T210847_20221204T210912_046187_0587AD_F19F.SAFE and S1A_IW_SLC__1SDV_20221006T133300_20221006T133327_045322_056B2B_84A7.SAFE but neither have a crs value when opened with rasterio. Is there something I'm missing? I'm just downloading random images from Copernicus in hopes of finding one that I can visualize.

@adamjstewart
Copy link
Collaborator Author

@adamjstewart
Copy link
Collaborator Author

Yep, these files use ground control points, not CRS. This is a fun hiccup.

@adamjstewart
Copy link
Collaborator Author

If anyone wants to follow along, the exact files I'm trying to use are https://scihub.copernicus.eu/dhus/odata/v1/Products('d3b5c561-ac0a-4617-a040-75708c0d28b8')/$value

Not sure if that URL will work for others though.

@github-actions github-actions bot added documentation Improvements or additions to documentation testing Continuous integration testing labels Dec 6, 2022
@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Dec 6, 2022

Alright, found some decent image parameters! These are scenes over the big island of Hawai'i:

HH_HV
VV_VH

Happy to tweak these parameters if anyone has better suggestions.

@adamjstewart adamjstewart marked this pull request as ready for review December 6, 2022 05:35
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Collaborator Author

@RitwikGupta it would be good to get your review of this since you're the SAR expert (at least among us). We can always expand this in future PRs, especially if you're focusing on better ML for SAR!

torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
calebrob6
calebrob6 previously approved these changes Dec 20, 2022
Copy link
Contributor

@RitwikGupta RitwikGupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limitations on what kinds of files can be in one Dataset instance are currently wrong.

I don't think there's any reason why we need to limit the kinds of bands available within the dataset at a time. Also, we need to be mindful that cross polarization is reciprocal.

We need to limit that a dataset can either be $\gamma_0$ or $\beta_0$, and image scales need to be normalized to one choice, power by default.

tests/data/sentinel1/data.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
torchgeo/datasets/sentinel.py Outdated Show resolved Hide resolved
["HH", "HV"],
["HV", "HH"],
["VV", "VH"],
["VH", "VV"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not just set(["HH", "HV"]) and set(["VV", "VH"])?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our MSI datasets, we want to be able to optionally select a subset of bands (RGB-only, RGB+NIR, etc.). I was following the same approach here, but I'm not actually sure if that makes sense for SAR. Is there any reason why someone might want to only use a single band and not both? If not, I can simplify the code quite a bit.

# Sentinel-1 uses dual polarization, it can only transmit either
# horizontal or vertical at a single time
msg = """
'bands' cannot contain both horizontal and vertical transmit at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Is this limitation solely for plotting purposes? It is totally valid for multiple passes over the same area to be of different polarizations. It's also valid for consecutive captures from the satellite to be of different transmit/receive pairings. Additionally, the intensities for VH and HV are equal due to Helmholtz reciprocity!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limitation is inherent in our current RasterDataset implementation. Basically, every GeoDataset has an R-tree index containing a single filename for each spatiotemporal location. We populate this R-tree during __init__ using a filename glob, which has to contain a single band name (can't be both HH and VV). Then, during __getitem__, we loop over all bands and swap the band in the filename to load the data. If all bands aren't present for every spatiotemporal location, then loading will fail. This was designed for MSI and HSI imagery without any thought towards (or knowledge of) SAR data.

There are a few options to deal with this:

1. Leave it up to the user

This is our current approach. The user can create two datasets and compute the union themselves.

Pro: simple
Con: users may have to do extra work

2. Compute the union for them

We could subclass UnionDataset instead of RasterDataset and create two RasterDatasets for the user. This is actually a very elegant solution and shouldn't be that difficult.

Pro: users don't have to do any extra work
Con: it would not be possible to change bands. You could create a dataset containing all 4 bands, but it would not be possible to create a dataset with only 2 bands

I assume horizontal transmit and vertical transmit are fundamentally different and some users may want to work with only one or the other but not both? This would not be possible with this idea. A workaround to support this would be to create 3 datasets:

  • Sentinel1Horizontal: [HH, HV]: RasterDataset
  • Sentinel1Vertical: [VV, VH]: RasterData
  • Sentinel1: [HH, HV, VV, VH]: UnionDataset

3. Override __init__ and __getitem__ in Sentinel1

We could replace the default logic for populating the R-tree and loading the data to search for both HH and VV and to load [HH, HV] and [VV, VH].

Pro: would support 4 bands or 2 bands
Con: mostly duplicate code, more difficult to maintain, need to do this for every bi-pole SAR instrument

4. Modify RasterDataset to support optional bands

Instead of populating the R-tree with a single band for each spatiotemporal location, we could load all bands we can find. Then, at loading time, we only load the bands that were found.

Pro: supports both SAR as well as MSI/HSI where the user might not have downloaded all bands but forgot to change bands
Con: significantly slower since R-tree lookup will go from $\mathcal{O}(\log N)$ to $\mathcal{O}(\log CN)$ and memory will go from $\mathcal{O}(N)$ to $\mathcal{O}(CN)$ for all datasets. Even slower would be loading and grabbing the CRS and bounding box of every file from disk. Also, if some downloads have different numbers of bands, this will break when loading data. Easy fix for this would be to specify a subset of bands when instantiating the dataset, but the error message would not be clear and this may be difficult to figure out for an inexperienced user

Let me know how you feel about 1–4. I'm leaning towards 1 or 2. I'm hesitant to go with 3 because duplicate code == evil, and 4 would require a separate PR and longer discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer 1 -- until we have someone who is actively using SAR imagery with TorchGeo let's not spend too much time making this class perfect


title = f"({bands[0]})"
else:
# Both horizontal and vertical receive, plot as RGB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to plot these images as RGB? The case to plot RGB images is rare as the third band ratio choice is extremely dependent on application. Why not just present an Nx1 subplot with all bands plotted in grayscale?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RGB looks cooler 😄

Seriously though, I only went with RGB cuz that's how we plot MSI datasets. When I view thumbnails of these images on Copernicus, they also choose to display an RGB image. What other third band ratio choices are common? We could add an option to select one or the other when plotting. Of course, users can always implement their own plotting code, this is just supposed to be a reasonable default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not tied to displaying an image one way or another, but if it makes the code easier, then displaying N grayscale bands would be infinitely easier than dealing with all these combinations/ratios for RGB imagery.

torchgeo/datasets/sentinel.py Show resolved Hide resolved
torchgeo/datasets/sentinel.py Show resolved Hide resolved
@adamjstewart adamjstewart merged commit 72ea9f8 into main Jan 23, 2023
@adamjstewart adamjstewart deleted the datasets/sentinel1 branch January 23, 2023 05:19
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Add Sentinel-1 dataset

* SLC -> GRD

* Color scaling

* Improve image title

* Newline before lists

* New in version 0.4

* Divide before scaling

* Documentation clarifications

* Relax constraints on bands

* One more combo

* Fix syntax

* Fix syntax

* Fix flip

* Fix test name

* pyupgrade

* Clarify co/cross-pol, backscatter coeff, and scale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants