Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow engine='zarr' and passing args for new xarray API #89

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion intake_xarray/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def zarr_source():
tdir = tempfile.mkdtemp()
data = xr.open_dataset(TEST_URLPATH)
data.to_zarr(tdir)
yield ZarrSource(tdir)
yield ZarrSource(tdir, chunks={"level": 4})
finally:
shutil.rmtree(tdir)

Expand Down
2 changes: 2 additions & 0 deletions intake_xarray/tests/test_remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@
import time
import xarray as xr
import fsspec
import pytest
import dask
import numpy
import s3fs

here = os.path.abspath(os.path.dirname(__file__))
cat_file = os.path.join(here, 'data', 'catalog.yaml')
DIRECTORY = os.path.join(here, 'data')
pytest.importorskip("RangeHTTPServer")


@pytest.fixture(scope='module')
Expand Down
8 changes: 3 additions & 5 deletions intake_xarray/xzarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,10 @@ def __init__(self, urlpath, storage_options=None, metadata=None, **kwargs):

def _open_dataset(self):
import xarray as xr
from fsspec import get_mapper

self._mapper = get_mapper(self.urlpath, **self.storage_options)
self._ds = xr.open_zarr(self._mapper, **self.kwargs)
self._ds = xr.open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)
Comment on lines +28 to +30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add in xr.open_mfdataset in here, or do it in a separate PR?

Suggested change
self._ds = xr.open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)
if "*" in url or isinstance(url, list):
_open_dataset = xr.open_mfdataset
else:
_open_dataset = xr.open_dataset
self._ds = _open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)

Probably need to include more code following the NetCDF driver at

if "*" in url or isinstance(url, list):
_open_dataset = xr.open_mfdataset
if self.pattern:
kwargs.update(preprocess=self._add_path_to_ds)
if self.combine is not None:
if 'combine' in kwargs:
raise Exception("Setting 'combine' argument twice in the catalog is invalid")
kwargs.update(combine=self.combine)
if self.concat_dim is not None:
if 'concat_dim' in kwargs:
raise Exception("Setting 'concat_dim' argument twice in the catalog is invalid")
kwargs.update(concat_dim=self.concat_dim)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but let's make it a separate follow-on PR (since the zarr backend didn't support multiple datasets before anyway). That code looks like it doesn't have anything specific to zarr, so it should be reusable.

Copy link
Collaborator

@weiji14 weiji14 Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍 Edit: Actually, I just remembered that @Mikejmnez had a PR for multiple zarr at #80 already!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do have something kind of working. @martindurant was working on doing some changes at the xarray level that would have changed the way multiple zarr files are passed through intake. I see that that pull request hasn't passed... I guess I am just waiting on the new implemented changes and to see how tests are going to be changing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be close, that PR at pydata/xarray#4461 is approved, just needs to get merged


def close(self):
super(ZarrSource, self).close()
self._fs = None
self._mapper = None