Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CaBuAr dataset #2235

Merged
merged 7 commits into from
Aug 28, 2024
Merged

Add CaBuAr dataset #2235

merged 7 commits into from
Aug 28, 2024

Conversation

DarthReca
Copy link
Contributor

@DarthReca DarthReca commented Aug 19, 2024

This PR extends the ChaBuD dataset introduced in #1259.

It is based on data presented in both CaBuAr: California Burned Areas dataset for delineation and ChaBuD challenge. Train and Validation are taken from CaBuAr, while the Test is from ChaBuD.

The files are hosted on HuggingFace.

These are some samples taken from the dataset:

sample_0
sample_1

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing datamodules PyTorch Lightning datamodules labels Aug 19, 2024
@DarthReca
Copy link
Contributor Author

@microsoft-github-policy-service agree

@adamjstewart
Copy link
Collaborator

Missing tests for the new datamodule. Usually we put these in the trainer tests, but we don't yet have a trainer for change detection (feel free to work on this if you want). For now, you'll have to create a new tests/datamodules test.

@DarthReca
Copy link
Contributor Author

DarthReca commented Aug 19, 2024

I have added the datamodule test, but the project coverage is decreasing for some reason, while the patch one is ok.

Ruff mentions some issues with the cabuar dataset file, but no changes were made, and the last run was ok.

For the datasets test, should I add another importorskip to the datamodule test?

@adamjstewart
Copy link
Collaborator

  • ruff: I just merged Ruff: enable ruff-specific rules #2218 which adds extra checks for certain coding patterns to avoid. You can rebase/merge main to get the tests to fail locally. I can comment directly on the lines with the suggested changes if that helps.
  • datasets: Copy the pytest.importorskip to your datamodules tests as well
  • coverage: No idea what's going on here, probably a random bug. If you still see the issue after you push a couple more commits I'll investigate.

@DarthReca DarthReca marked this pull request as ready for review August 19, 2024 15:46
@DarthReca DarthReca marked this pull request as draft August 19, 2024 15:47
@DarthReca
Copy link
Contributor Author

Everything fine. Do you suggest keeping both implementations (ChaBuD and CaBuAr)?

@adamjstewart
Copy link
Collaborator

I see no reason not to keep both. Even if the newer dataset is more useful, someone might want to reproduce the results on the previous dataset for comparison against older papers.

@DarthReca DarthReca marked this pull request as ready for review August 19, 2024 15:59
docs/api/datasets.rst Outdated Show resolved Hide resolved
docs/api/non_geo_datasets.csv Outdated Show resolved Hide resolved
tests/data/cabuar/data.py Outdated Show resolved Hide resolved
tests/data/cabuar/data.py Outdated Show resolved Hide resolved
tests/datamodules/test_cabuar.py Outdated Show resolved Hide resolved
tests/datasets/test_cabuar.py Outdated Show resolved Hide resolved
tests/datasets/test_cabuar.py Outdated Show resolved Hide resolved
torchgeo/datamodules/cabuar.py Outdated Show resolved Hide resolved
torchgeo/datasets/cabuar.py Outdated Show resolved Hide resolved

`CaBuAr <https://huggingface.co/datasets/DarthReca/california_burned_areas>`__
is a dataset for Change detection for Burned area Delineation and part of
the splits are used for the ChaBuD ECML-PKDD 2023 Discovery Challenge.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how much code is the same, it may be possible to either subclass ChaBuD:

from .chabud import ChaBuD

class CaBuAr(ChaBuD):
    ...

or create a shared abstract base class. We could either have both datasets in a single file or a different dataset in each file. It's borderline since it's more like a ChaBuD v2, but it has a unique name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the data was proposed in CaBuAr, then we extended the dataset for the challenge. However, the challenge page is not working anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably do the reverse if that seems appropriate: I can make ChaBuD an extension of CaBuAr since the latter deals already with more files and subsets.

@adamjstewart adamjstewart added this to the 0.6.0 milestone Aug 21, 2024
@adamjstewart
Copy link
Collaborator

@DarthReca do you have time to address these change requests? Trying to finalize the 0.6.0 release this week.

@DarthReca
Copy link
Contributor Author

I will start working today to resolve any issues.

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks fine to me now. Can you add an example figure created by the plot method to the PR description? This will help validate that the plot method works correctly. Feel free to make one dataset a subclass of the other if you want, but I'm also fine keeping things the way they are. Would like to merge this by the end of the day so we can prepare the release.

@DarthReca
Copy link
Contributor Author

DarthReca commented Aug 28, 2024

I have added some plots. If this is fine for now, everything should be ready.

@adamjstewart adamjstewart merged commit ccc314c into microsoft:main Aug 28, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants