Fix folder name for labels_root in the FAIR1M dataset #1099

SpontaneousDuck · 2023-02-08T23:13:58Z

Change name of expected extracted folder for FAIR1M labels_root. Fixes #1098

Change name of expected extracted folder for FAIR1M labels_root

adamjstewart · 2023-02-09T05:18:20Z

Thanks for the fix!

Where did you even download the dataset? The dataset homepage seems to be down.

In order to get the tests to pass, you'll also need to change the fake data in tests/data/fair1m to have the same directory structure (both the directory and the zip file of that directory). Let me know if you need help.

SpontaneousDuck · 2023-02-09T14:51:48Z

I was able to access the dataset's website here: https://www.gaofen-challenge.com/benchmark. Since it requires an account to get the links to the dataset, my colleague who is from China was able to get that for me. The Google Drive hosted dataset is a public folder here: https://drive.google.com/drive/folders/1lCZibAl3k9sI5d7ahRm_5GA3g7OCLXmY.

SpontaneousDuck · 2023-02-09T14:52:25Z

Tests fixed! I changed what you requested in the tests folder. We'll see if that works!

SpontaneousDuck · 2023-02-09T14:59:27Z

@microsoft-github-policy-service agree company="Kostas Research Institute at Northeastern University"

adamjstewart · 2023-02-09T15:33:56Z

I think you need to update the zip file too.

SpontaneousDuck · 2023-02-09T15:41:11Z

The zip file name is correct! The problem is they zipped a folder with a different name than the archive containing it so you download labelXmls.zip from the dataset repository but then that zip extracts to a folder named labelXml when you extract it. So the files inside the labelXmls.zip zip file are under the structure like labelXml/518.xml.

adamjstewart · 2023-02-09T16:34:19Z

The name of the zip file is correct, but the contents of the zip file in our test data are not correct.

SpontaneousDuck · 2023-02-09T16:43:11Z

Ah, got it! I just changed the contents and made a new commit

torchgeo/datasets/fair1m.py

adamjstewart · 2023-02-09T17:11:22Z

I was able to access the dataset's website here: https://www.gaofen-challenge.com/benchmark.

Can you update the URL in the FAIR1M docstring? The old one gives a 404.

adamjstewart · 2023-02-09T17:13:40Z

Is there a difference in directory structure between the 1.0 and 2.0 versions of the dataset? I'm wondering if this PR only supports 2.0 and the old version only supports 1.0. It would be easy to support both if you have the ability to download both and test them.

SpontaneousDuck · 2023-02-09T17:20:08Z

So the current version of the dataset has two different parts to the training dataset as well as a validation set. (train/part1 and train/part2). It looks like the md5 for images.zip matches the part1 folder so I think what they did is leave part1 as the original dataset and added the part2 folder as well as the validation folder as version 2.0. The md5 for part1/labelXmls.zip so there must have been some change to the original labels when they updated. Maybe the folder name change in the zip was changed in the update and that also affected the original dataset?

From what I can tell, they left the original zips from 1.0 in the 2.0 dataset and added more data zips to extend it. Should I extend this PR to support version 2.0 or make that a separate PR? Either way I am getting the 1.0 md5s and tests fixed now 😊

SpontaneousDuck · 2023-02-09T17:29:36Z

Directory structure for FAIR1M2.0 is as below with the exception of the labelXmls folders which I renamed to work with the current release version of torchgeo.datasets.FAIR1M

.
├── test
│   ├── images0.zip
│   ├── images1.zip
│   └── images2.zip
├── train
│   ├── part1
│   │   ├── images
│   │   ├── images.zip
│   │   ├── labelXmls
│   │   └── labelXmls.zip
│   └── part2
│       ├── images
│       ├── images.zip
│       ├── labelXmls
│       └── labelXmls.zip
└── validation
    ├── images.zip
    └── labelXmls.zip

adamjstewart · 2023-02-09T17:44:03Z

So what you're saying is that our dataset is for version 1.0, but the authors seem to have changed the directory structure for 1.0 (and therefore the dataset broke and the MD5s changed), and your PR fixes support for version 1.0? That sounds fine to me, we can add support for version 2.0 in a separate PR.

SpontaneousDuck · 2023-02-09T17:57:19Z

Great! Yes, they actually changed the 1.0 version download link to just link to v2.0/train/part1 folder. So, if anyone downloads the 1.0 version, this dataset code will work. This dataset code will also work if the version 2.0 dataset is downloaded and root="train/part1 is set for torchgeo.datasets.FAIR1M.

adamjstewart · 2023-02-09T19:54:56Z

Closing and reopening to try to fix the coverage checks...

* Update fair1m.py Change name of expected extracted folder for FAIR1M labels_root * moved test data for FAIR1M to match new folder name * fix contents of fair1m test label zipfile * update md5 hash for fair1m labelXmls.zip * update md5s for fair1m dataset as well as tests

Update fair1m.py

3233cbe

Change name of expected extracted folder for FAIR1M labels_root

github-actions bot added the datasets Geospatial or benchmark datasets label Feb 8, 2023

adamjstewart added this to the 0.4.1 milestone Feb 9, 2023

moved test data for FAIR1M to match new folder name

8211b84

github-actions bot added the testing Continuous integration testing label Feb 9, 2023

fix contents of fair1m test label zipfile

08bd4af

adamjstewart reviewed Feb 9, 2023

View reviewed changes

torchgeo/datasets/fair1m.py Outdated Show resolved Hide resolved

update md5 hash for fair1m labelXmls.zip

78a73e0

update md5s for fair1m dataset as well as tests

944d6b5

adamjstewart closed this Feb 9, 2023

adamjstewart reopened this Feb 9, 2023

adamjstewart approved these changes Feb 9, 2023

View reviewed changes

adamjstewart merged commit f82af4a into microsoft:main Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix folder name for labels_root in the FAIR1M dataset #1099

Fix folder name for labels_root in the FAIR1M dataset #1099

SpontaneousDuck commented Feb 8, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023 •

edited

Loading

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

Fix folder name for labels_root in the FAIR1M dataset #1099

Fix folder name for labels_root in the FAIR1M dataset #1099

Conversation

SpontaneousDuck commented Feb 8, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023 • edited Loading

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023

adamjstewart commented Feb 9, 2023

SpontaneousDuck commented Feb 9, 2023 •

edited

Loading