-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft for INALT60 recipe #26
base: master
Are you sure you want to change the base?
Conversation
nitems_per_input=None, | ||
target_chunks={'time_counter': 15} | ||
) | ||
recipe:surf_ocean_4h = NetCDFtoZarrSequentialRecipe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rabernat I'm not sure if the syntax is correct here... Could you advise?
@roxyboy, the My understanding is that each of these frequencies should be its own dataset. Did I organize these correctly? Expand for dataset structure 👇for r in list(recipes):
print(r)
d = dict(recipes[r].file_pattern.items())
for k, v in zip(d.keys(),d.values()):
print(k, v[87:])
print()
|
Yes, it looks great! :) |
@roxyboy, the surface datasets for INALT60 are now on OSN. The following will return a dictionary containing all three: import s3fs
import xarray as xr
endpoint_url = 'https://ncsa.osn.xsede.org'
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
url = "s3://Pangeo/pangeo-forge/swot_adac/INALT60/"
inalt60_datasets = {
ds: xr.open_zarr(fs_osn.get_mapper(f"{url}{ds}.zarr"), consolidated=True)
for ds in ["surf_ocean_4h", "surf_ocean_5d", "surf_flux_1d"]
}
inalt60_datasets As we discussed above, these were the only |
@cisaacstern Thanks again for working on this but it seems that |
In addition to the missing variables: these also seem to have the same problem as noted in #29 (comment): no actual data! For example fs_osn.ls("s3://Pangeo/pangeo-forge/swot_adac/INALT60/surf_ocean_4h.zarr/vozocrtx")
Another way to see this is via the Zarr "Chunks initialized" statistics import zarr
group = zarr.open_consolidated(fs_osn.get_mapper("s3://Pangeo/pangeo-forge/swot_adac/INALT60/surf_ocean_4h.zarr"))
group['vozocrtx'].info
We can see that none of the chunks have been initialized. @cisaacstern let's sync up at some point today to dig into what might be going wrong. |
Noting that based on my conversation with Ryan, this issue appears to be a variant of the one described in #29 (comment). IIUC, both issues arise when an attempt is made to open an empty zarr store. In this recipe, that happens on this call to Logs + Traceback
I'm now filling in the missing data (and variables) and will ping the thread again when that's complete. |
@roxyboy, the previously missing INALT60 surface data (and variables) should now accessible via the same xarray dictionary provided in my earlier comment on this thread. Here's a preview of the variables in each zarr store and their sizesimport s3fs
import zarr
endpoint_url = 'https://ncsa.osn.xsede.org'
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
url_base = "s3://Pangeo/pangeo-forge/swot_adac/INALT60/"
root_paths = [f"{url_base}{ds}.zarr" for ds in ("surf_ocean_4h", "surf_ocean_5d", "surf_flux_1d")]
vars_a = ["sossheig", "vomecrty", "vosaline", "votemper", "vozocrtx",]
vars_b = ["sohefldo", "sometauy", "sowaflup", "sozotaux",]
for r in root_paths:
print(r)
group = zarr.open_consolidated(fs_osn.get_mapper(r))
variables = vars_a if "ocean" in r else vars_b
for v in variables:
group_info = group[v].info_items()
print(f"""{group_info[0][1]}
{group_info[-2]}
{group_info[-1]}
""")
As in your helpful comment #29 (comment) (which I will now take a look at in detail), please do let me know if anything seems opaque or out-of-place with these stores. It's a learning process, but I think we're getting a lot closer already! |
@cisaacstern Yes, thank you :) Also, could you push the grid files for HYCOM50 (#29 ) and INALT60 (#26 )? |
Yep! The INALT60 grid is now on OSN, as shown below. Question: I note that the import s3fs
import xarray as xr
endpoint_url = 'https://ncsa.osn.xsede.org'
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
url = "s3://Pangeo/pangeo-forge/swot_adac/INALT60/grid.zarr"
inalt60_grid = xr.open_zarr(fs_osn.get_mapper(url), consolidated=True)
print(inalt60_grid)
|
Thanks a bunch!
The same issue exists also with eNATL60 where both are NEMO outputs. Maybe @lesommer or @auraoupa will have more to say but it seems that the staggered C-grid metadata isn't carried over to the netcdf files. |
The data provider for INALT60 asked for password protection for their data on OSN. Sorry for the additional work but could we do this? |
@roxyboy, it is not possible to password-protect data on our OSN bucket, so we will have to find another bucket to write it to. i do not currently have credentials for another bucket, but i will ask others on our team if there's one we can use. given that ryan is on vacation, it's possible it may take up to a few weeks to identify a password-protectable bucket (though it could be faster). in the interim, would you like me to delete all of the INALT60 data from OSN? or would you prefer that i leave it there until we've identified an alternative location for it? (it's up to you; from a technical perspective either is fine.) |
Ok, thanks for the speedy updates. I'll ask the INALT60 crew and see what they'd prefer. |
@sharkinsspatial, per #26 (comment) above, @roxyboy and I are in search of a password-protected cloud bucket (ideally on s3, but any should be fine), that we can write these INALT60 recipes to. Collectively, they represent about 550 GBs. (Around 50 for the surface data, and 500 for the interiors.) Are you aware of any existing password-protected endpoints in the greater Pangeo Forge universe that fit these requirements? |
The INALT60 crew are ok with keeping the data on OSN until we find a private bucket. |
Cool. In the interim, would you like me to write the INALT60 interior data to OSN? I have it queued and ready to go. I can also wait until we have the private bucket, but that may be a few weeks from now. It's looking like we will probably need Ryan to set that up. |
I think we can wait for the interior data of INALT60 :) |
I will check with the OSN folks about this. |
Per Ryan's suggestion in #56 (comment), before this recipe is merged, I will rework it to avoid manipulating |
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
@andersy005 , did you mean to do these changes here #189 ? |
A recipe PR for lNATL60 maintained by GEOMAR.