-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CaBuAr dataset #2235
Add CaBuAr dataset #2235
Conversation
@microsoft-github-policy-service agree |
Missing tests for the new datamodule. Usually we put these in the trainer tests, but we don't yet have a trainer for change detection (feel free to work on this if you want). For now, you'll have to create a new |
I have added the datamodule test, but the project coverage is decreasing for some reason, while the patch one is ok. Ruff mentions some issues with the cabuar dataset file, but no changes were made, and the last run was ok. For the datasets test, should I add another importorskip to the datamodule test? |
|
Everything fine. Do you suggest keeping both implementations (ChaBuD and CaBuAr)? |
I see no reason not to keep both. Even if the newer dataset is more useful, someone might want to reproduce the results on the previous dataset for comparison against older papers. |
|
||
`CaBuAr <https://huggingface.co/datasets/DarthReca/california_burned_areas>`__ | ||
is a dataset for Change detection for Burned area Delineation and part of | ||
the splits are used for the ChaBuD ECML-PKDD 2023 Discovery Challenge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how much code is the same, it may be possible to either subclass ChaBuD
:
from .chabud import ChaBuD
class CaBuAr(ChaBuD):
...
or create a shared abstract base class. We could either have both datasets in a single file or a different dataset in each file. It's borderline since it's more like a ChaBuD v2, but it has a unique name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the data was proposed in CaBuAr, then we extended the dataset for the challenge. However, the challenge page is not working anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably do the reverse if that seems appropriate: I can make ChaBuD an extension of CaBuAr since the latter deals already with more files and subsets.
@DarthReca do you have time to address these change requests? Trying to finalize the 0.6.0 release this week. |
I will start working today to resolve any issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks fine to me now. Can you add an example figure created by the plot
method to the PR description? This will help validate that the plot method works correctly. Feel free to make one dataset a subclass of the other if you want, but I'm also fine keeping things the way they are. Would like to merge this by the end of the day so we can prepare the release.
I have added some plots. If this is fine for now, everything should be ready. |
This PR extends the ChaBuD dataset introduced in #1259.
It is based on data presented in both CaBuAr: California Burned Areas dataset for delineation and ChaBuD challenge. Train and Validation are taken from CaBuAr, while the Test is from ChaBuD.
The files are hosted on HuggingFace.
These are some samples taken from the dataset: