Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto download fails for FireRisk #1996

Closed
bourcierj opened this issue Apr 14, 2024 · 11 comments · Fixed by #2000
Closed

Auto download fails for FireRisk #1996

bourcierj opened this issue Apr 14, 2024 · 11 comments · Fixed by #2000
Assignees
Labels
datasets Geospatial or benchmark datasets

Comments

@bourcierj
Copy link

Description

Auto download fails for the FireRisk dataset hosted on Google Drive.

Warning and error:

/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchvision/datasets/utils.py:260: UserWarning: We detected some HTML elements in the downloaded file. This most likely means that the dow
nload triggered an unhandled API response by GDrive. Please report this to torchvision at https://github.com/pytorch/vision/issues including the response:

<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="Udd3l48zF0spb_ikIDzQdw">.goog-link-button{position:rel
ative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badg
e{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}sentinel{}</style><link rel="icon" href="//ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png"/></head><body><div class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-subcaption"><span class="uc-name-size"><a href="/open?id=1J5GrJJPLWkpuptfY_kgqkiDtcSNP88OP">FireRisk.zip</a> (14G)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="download-form" action="https://drive.usercontent.google.com/download" method="get"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/><input type="hidden" name="id" value="1J5GrJJPLWkpuptfY_kgqkiDtcSNP88OP"><input type="hidden" name="export" value="download"><input type="hidden" name="confirm" value="t"><input type="hidden" name="uuid" value="c4203717-b28d-4640-8d59-e9f5d88a2120"></form></div></div><div class="uc-footer"><hr class="uc-footer-divider"></div></body></html>
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jb/code/torchgeo/slip/datasets/firerisk.py", line 25, in __init__
    super().__init__(root=root, split=split, download=download, checksum=checksum)
  File "/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchgeo/datasets/fire_risk.py", line 94, in __init__
    self._verify()
  File "/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchgeo/datasets/fire_risk.py", line 126, in _verify
    self._download()
  File "/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchgeo/datasets/fire_risk.py", line 131, in _download
    download_url(
  File "/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 139, in download_url
    return download_file_from_google_drive(file_id, root, filename, md5)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jb/miniconda3/envs/torchgeo/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 268, in download_file_from_google_drive
    raise RuntimeError(
RuntimeError: The MD5 checksum of the download file /data/labeleff/datasets/firerisk/FireRisk.zip does not match the one on record.Please delete the file and try again. If the issue persists, please report this to torchvision at https://github.com/pytorch/vision/issues.

Steps to reproduce

from torchgeo.datasets import FireRisk
dataset = FireRisk(download=True, checksum=True)

Version

0.5.1

@adamjstewart
Copy link
Collaborator

It looks like this dataset was added by @isaaccorley, can you investigate?

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Apr 14, 2024
@adamjstewart
Copy link
Collaborator

Which version of torchvision are you using? Torchvision 0.17.1 added a dependency on gdown (must be installed separately) and refactored a lot of their Google Drive logic. Either updating or downgrading may resolve this issue, depending on which version you are currently using.

@isaaccorley
Copy link
Collaborator

Looks like the authors may have changed the sharing permissions on gdrive so you have to actually go to the gdrive link in the browser to download.

@CharmonyShen any objections to us rehosting the dataset on HuggingFace?

@adamjstewart
Copy link
Collaborator

Looks like the dataset is CC-BY-NC-4.0, so there should be no legal issues with rehosting.

@bourcierj
Copy link
Author

bourcierj commented Apr 15, 2024

This bug happens with torchvision 0.16.1. I am having trouble installing 0.17.1 with the latest version of torchgeo (it falls back to 0.16.1).
However, trying to download with gdown (version 4.4.0) directly does not work either (access denied because the public link of the file cannot be retrieved). So torchvision 0.17.1 should also fail.

@adamjstewart
Copy link
Collaborator

Happy to help debug whatever issue is preventing you from upgrading to torchvision 0.17.1, but it sounds like we should redistribute the dataset regardless.

@isaaccorley
Copy link
Collaborator

I'm uploading to HuggingFace here. Will make a quick PR to change the url later today.

@adamjstewart
Copy link
Collaborator

Can we make the name lowercase? Can use an underscore if you want.

@isaaccorley
Copy link
Collaborator

Can we make the name lowercase? Can use an underscore if you want.

Done

@isaaccorley
Copy link
Collaborator

This should be fixed in #2000

@bourcierj
Copy link
Author

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants