Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Fix reading from fsspec s3 #130

Merged
merged 4 commits into from
Dec 6, 2022

Conversation

wroberts4
Copy link
Contributor

@wroberts4 wroberts4 commented Jul 19, 2022

Xarray does not like reading the same file twice from fsspec. See pydata/xarray#6813

The following currently fails:

import fsspec
import datatree as dt
fs = fsspec.filesystem('s3', anon=True)
fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc'
data = fs.open(fp)
dt.open_datatree(data, engine='h5netcdf', chunks={})

with

Traceback (most recent call last):
  File "//example.py", line 24, in <module>
    run_test(dt.open_datatree, fs, fp)
  File "//example.py", line 15, in run_test
    func(data, engine='h5netcdf', chunks={})
  File "/opt/conda/lib/python3.10/site-packages/datatree/io.py", line 58, in open_datatree
    return _open_datatree_netcdf(filename_or_obj, engine=engine, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/datatree/io.py", line 67, in _open_datatree_netcdf
    ds = open_dataset(filename, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset
    backend_ds = backend.open_dataset(
  File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 389, in open_dataset
    store = H5NetCDFStore.open(
  File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 157, in open
    magic_number = read_magic_number_from_file(filename)
  File "/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py", line 645, in read_magic_number_from_file
    raise ValueError(
ValueError: cannot guess the engine, file-like object read/write pointer not at the start of the file, please close and reopen, or use a context manager

My change fixes that.

@weiji14
Copy link
Member

weiji14 commented Dec 6, 2022

FYI, the underlying file pointer issue has been fixed upstream in pydata/xarray#7304 and is available in xarray=2022.12.0. But I still think this Pull Request is good, because it does look nicer to intialize the tree_root outside of the with ncDataset(filename, mode="r") as ncds: block. Plus the change here will help in some cases for older xarray<2022.12.0 versions.

@TomNicholas
Copy link
Member

Thanks both, and sorry for forgetting about this!

I still think this Pull Request is good, because it does look nicer to intialize the tree_root outside of the with ncDataset(filename, mode="r") as ncds: block

I agree with that.

Plus the change here will help in some cases for older xarray<2022.12.0 versions.

This isn't really relevant because datatree is unlikely to work with older xarray versions anyway (datatree imports so many internals that any changes inside xarray need to be reflected inside datatree, that's why datatree needs to be moved upstream).

@TomNicholas TomNicholas enabled auto-merge (squash) December 6, 2022 20:34
@TomNicholas TomNicholas merged commit a25a3ec into xarray-contrib:main Dec 6, 2022
flamingbear pushed a commit to flamingbear/rewritten-datatree that referenced this pull request Jan 19, 2024
* Fix pointer not at the start of the file error

* whatsnew

Co-authored-by: William Roberts <[email protected]>
Co-authored-by: Tom Nicholas <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants