-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding seasonet dataset + tests + doc #1466
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, excited to add your dataset to TorchGeo!
Thanks for the fast reply! We will look into the necessary changes asap! |
Thanks @dkosm! Please let us know if you questions about the changes :) |
@microsoft-github-policy-service agree company="TU Dortmund" |
So after some delay, we were able to submit the CLA. We tried to adress all remarks that you had, in the meantime. Anything left ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic!
Left a couple of optional style suggestions.
SeasoNet.metadata[6], | ||
"url", | ||
os.path.join("tests", "data", "seasonet", "meta.csv"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a list and a for-loop so there isn't so much code duplication.
{ | ||
"name": "spring", | ||
"ext": ".zip", | ||
"url": "https://zenodo.org/api/files/e2288446-9ee8-4b2e-ae76-cd80366a40e1/spring.zip", # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would save a base url like:
url = "https://zenodo.org/api/files/e2288446-9ee8-4b2e-ae76-cd80366a40e1/"
and then you can append only when you need the full URL. Then you don't have so much code duplication. Not sure if you really need ext
either, could just infer that from the filename.
self, | ||
root: str = "data", | ||
split: str = "train", | ||
seasons: Collection[str] = all_seasons, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is Collection different from Iterable? How did you decide which to use?
""" | ||
path = self.files.iloc[index][0] | ||
with rasterio.open(f"{path}_labels.tif") as f: | ||
array = f.read() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining why - 1?
Thanks for the contribution, hopefully the first of many! Apologies it took so long to get the PR merged. In the meantime I moved to Germany! Starting a postdoc at TUM, maybe we'll see each other around someday. |
"nodata": None, | ||
"crs": CRS.from_epsg(32632), | ||
"transform": Affine(10.0, 0.0, 664800.0, 0.0, -10.0, 5342400.0), | ||
"compress": "zstd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to confirm that the original SeasoNet files are also compressed with ZSTD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are
This PR adds the SeasoNet Seasonal Land Cover Segmentation Dataset.
Huge thanks to @briktor for his work on the dataset and the torchgeo version !!
The SeasoNet dataset consists of 1,759,830 multi-spectral Sentinel-2 image patches, taken from 519,547 unique locations, covering the whole surface area of Germany. Annotations are provided in the form of pixel-level land cover and land usage segmentation masks from the German land cover model LBM-DE2018 with land cover classes based on the CORINE Land Cover database (CLC) 2018. The set is split into two overlapping grids, consisting of roughly 880,000 samples each, which are shifted by half the patch size in both dimensions. The images in each of the both grids themselves do not overlap.
Example Plot:
![example_plot](https://private-user-images.githubusercontent.com/29751954/264653078-1fa8dc19-461e-4796-8858-e4a07533417d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NjgyOTEsIm5iZiI6MTczODk2Nzk5MSwicGF0aCI6Ii8yOTc1MTk1NC8yNjQ2NTMwNzgtMWZhOGRjMTktNDYxZS00Nzk2LTg4NTgtZTRhMDc1MzM0MTdkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDIyMzk1MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU0MDdmYWViZmQ4ZjgyY2M3MzhlMTU0NDY0ZGNmZDM3NmMyN2VkNzQxZTg4ZDExODcyZTQ2OTkwNDMxYWJmMjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.P_RGlllYUAZJ7JKGBYz6gFqEHmZArAY7A_ZxBrWwrYo)
Dataset format:
Paper: here