Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchvision - can't extract Caltech256 #4127

Closed
ikamensh opened this issue Jun 26, 2021 · 2 comments
Closed

Torchvision - can't extract Caltech256 #4127

ikamensh opened this issue Jun 26, 2021 · 2 comments

Comments

@ikamensh
Copy link

🐛 Bug

When I try to download Caltech256 dataset via torchvision, an empty file is downloaded.

-rw-r--r-- 1 ikkamens staff 0B Jun 26 12:05 256_ObjectCategories.tar

I get following error:

0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/ikkamens/PycharmProjects/pythonProject1/main.py", line 4, in <module>
    torchvision.datasets.Caltech256("data", download=True)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 166, in __init__
    self.download()
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 215, in download
    download_and_extract_archive(
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 417, in download_and_extract_archive
    extract_archive(archive, extract_root, remove_finished)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 394, in extract_archive
    extractor(from_path, to_path, compression)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 260, in _extract_tar
    with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1604, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
Extracting data/caltech256/256_ObjectCategories.tar to data/caltech256

To Reproduce

Run:

import torchvision

torchvision.datasets.Caltech256("data", download=True)

Expected behavior

No exception, some images in the dataset in usable format.

Environment

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 10.14.6 (x86_64)
GCC version: Could not collect
Clang version: 11.0.0 (clang-1100.0.33.17)
CMake version: version 3.16.5
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: macOS-10.14.6-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0
[pip3] torchvision==0.10.0
[conda] Could not collect

Additional context

A .tar archive is actually (partially?) downloaded. However, I can't extract it manually as well:
(venv) ~/P/p/c/caltech256 $ tar -xf caltech256/caltech256/256_ObjectCategories.tar imgs
tar: Error opening archive: Failed to open 'caltech256/caltech256/256_ObjectCategories.tar'

cc @fmassa @vfdev-5 @pmeier

@ikamensh
Copy link
Author

p.s. similar thing happens for CalTech101:

/Users/ikkamens/PycharmProjects/pythonProject1/venv/bin/python /Users/ikkamens/PycharmProjects/pythonProject1/main.py
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2311, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1103, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1039, in frombuf
    raise EmptyHeaderError("empty header")
tarfile.EmptyHeaderError: empty header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ikkamens/PycharmProjects/pythonProject1/main.py", line 4, in <module>
    torchvision.datasets.Caltech101("data", download=True)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 51, in __init__
    self.download()
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 123, in download
    download_and_extract_archive(
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 417, in download_and_extract_archive
    extract_archive(archive, extract_root, remove_finished)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 394, in extract_archive
    extractor(from_path, to_path, compression)
  File "/Users/ikkamens/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 260, in _extract_tar
    with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1617, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1670, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1647, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 1510, in __init__
    self.firstmember = self.next()
  File "/Users/ikkamens/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2326, in next
    raise ReadError("empty file")
tarfile.ReadError: empty file
Extracting data/caltech101/101_ObjectCategories.tar.gz to data/caltech101

@pmeier pmeier transferred this issue from pytorch/pytorch Jun 28, 2021
@pmeier
Copy link
Collaborator

pmeier commented Jun 28, 2021

Duplicate of #4108, which was fixed in #4109.

@pmeier pmeier closed this as completed Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants