-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix folder name for labels_root in the FAIR1M dataset #1099
Fix folder name for labels_root in the FAIR1M dataset #1099
Conversation
Change name of expected extracted folder for FAIR1M labels_root
Thanks for the fix! Where did you even download the dataset? The dataset homepage seems to be down. In order to get the tests to pass, you'll also need to change the fake data in |
I was able to access the dataset's website here: https://www.gaofen-challenge.com/benchmark. Since it requires an account to get the links to the dataset, my colleague who is from China was able to get that for me. The Google Drive hosted dataset is a public folder here: https://drive.google.com/drive/folders/1lCZibAl3k9sI5d7ahRm_5GA3g7OCLXmY. |
Tests fixed! I changed what you requested in the tests folder. We'll see if that works! |
@microsoft-github-policy-service agree company="Kostas Research Institute at Northeastern University" |
I think you need to update the zip file too. |
The zip file name is correct! The problem is they zipped a folder with a different name than the archive containing it so you download |
The name of the zip file is correct, but the contents of the zip file in our test data are not correct. |
Ah, got it! I just changed the contents and made a new commit |
Can you update the URL in the |
Is there a difference in directory structure between the 1.0 and 2.0 versions of the dataset? I'm wondering if this PR only supports 2.0 and the old version only supports 1.0. It would be easy to support both if you have the ability to download both and test them. |
So the current version of the dataset has two different parts to the training dataset as well as a validation set. (train/part1 and train/part2). It looks like the md5 for images.zip matches the part1 folder so I think what they did is leave part1 as the original dataset and added the part2 folder as well as the validation folder as version 2.0. The md5 for From what I can tell, they left the original zips from 1.0 in the 2.0 dataset and added more data zips to extend it. Should I extend this PR to support version 2.0 or make that a separate PR? Either way I am getting the 1.0 md5s and tests fixed now 😊 |
Directory structure for FAIR1M2.0 is as below with the exception of the
|
So what you're saying is that our dataset is for version 1.0, but the authors seem to have changed the directory structure for 1.0 (and therefore the dataset broke and the MD5s changed), and your PR fixes support for version 1.0? That sounds fine to me, we can add support for version 2.0 in a separate PR. |
Great! Yes, they actually changed the 1.0 version download link to just link to v2.0/train/part1 folder. So, if anyone downloads the 1.0 version, this dataset code will work. This dataset code will also work if the version 2.0 dataset is downloaded and |
Closing and reopening to try to fix the coverage checks... |
* Update fair1m.py Change name of expected extracted folder for FAIR1M labels_root * moved test data for FAIR1M to match new folder name * fix contents of fair1m test label zipfile * update md5 hash for fair1m labelXmls.zip * update md5s for fair1m dataset as well as tests
* Update fair1m.py Change name of expected extracted folder for FAIR1M labels_root * moved test data for FAIR1M to match new folder name * fix contents of fair1m test label zipfile * update md5 hash for fair1m labelXmls.zip * update md5s for fair1m dataset as well as tests
Change name of expected extracted folder for FAIR1M labels_root. Fixes #1098