-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for JSON metadata workflow #124
Comments
This does already work, but the invocation via intake-xarray (or xarray open_dataset directly) is complex. Actually, intake-xarray is great exactly because it hides this complexity from the user once you've figured it out. Your call should look something like source = intake.open_zarr(
"reference://",
storage_options={
"fo": '/home/jovyan/work/output/s3/combine.json',
"remote_protocol": "...", # e.g., "s3", "http", ...
"remote_options": {...} # anything needed to configure that remote filesystem
},
consolidated=False
) And yes, open_netcdf essentially does the same thing, except that you specify the engine, and all those arguments get nested inside a "backend_kwargs". |
If you succeed in generating an interesting dataset and would like to share in public, the kerchunk project would like to know about it! |
Thank you for your help! Your approach works perfectly fine. I was able to generate a YAML file from the source above and load it back in. Actually, I am not working on a dataset but on a web-based tool for migrating NETCDF4 data to Zarr. It supports both an actual conversion and the JSON metadata workflow mentioned above. Right now I am working on the Intake integration for the JSON metadata. Here is a link to the repository: https://github.com/climate-v/nc2zarr-webapp |
Are you aware of https://pangeo-forge.org/ |
Use Case
I am trying to access NetCDF4 data via JSON metadata with intake-xarray. This approach is based on this blog post by lsterzinger. I am trying to make the data access as convenient as possible. The ideal solution for me with the existing API would look like this:
When testing this approach I get the following error:
The approach from the blog post uses an
FSMap
. So I tried the following:This one works. But it kind of misses the point of Intake as the user has to know about the
fsspec
API to create a workingFSMap
.Suggestion
Version 1
I would like to implement an extra case for the
open_zarr
method to support the JSON workflow introduced in the blog post mentioned above.Version 2
I could also imagine an extra method for the JSON workflow, something like
intake.open_zarr_metadata('combine.json')
.Questions
Which approach would you prefer?
While looking through existing issues I found xarray.open_zarr to be deprecated #70. If I get it correctly, you removed the
fsspec
mapper 2020 as it wasn't needed anymore. Is there another solution to bring the JSON workflow to intake-xarray that I overlooked?Unfortunately, my Python knowledge is limited so I have no idea how to test a modified version of intake-array. I found https://intake-xarray.readthedocs.io/en/latest/contributing.html#id9 to run tests. But how can I test a modified version of intake-array with Intake locally? Would be great to have this in the docs!
The text was updated successfully, but these errors were encountered: