Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for FESOM recipe #52

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Draft for FESOM recipe #52

wants to merge 15 commits into from

Conversation

roxyboy
Copy link

@roxyboy roxyboy commented Jun 24, 2021

I tried to build upon the INALT60 PR #26 but I couldn't quite figure out the correct syntax...

@roxyboy roxyboy changed the title FESOM recipe Draft for FESOM recipe Jun 24, 2021
@roxyboy
Copy link
Author

roxyboy commented Jun 24, 2021

Also, currently only the surface data is available to flux to the cloud. The interior should be ready soonish.

@cisaacstern
Copy link
Member

Thanks @roxyboy. I'll take a look today.

@cisaacstern
Copy link
Member

@roxyboy, your draft was super helpful. I'm now caching these inputs to GCS.

@cisaacstern
Copy link
Member

cisaacstern commented Jun 29, 2021

@roxyboy, FESOM inputs are cached. Before building the Zarr, I wanted to confirm your target_chunks preference.

Currently, target_chunks is set to {"time": 15} in the recipe here. By my calculation (see code, below), that puts each chunk at about 120 MBs in size. I don't know much about chunking, but I've noted that pangeo-forge-recipes/docs/tutorials/cmip6-recipe.ipynb, e.g., makes a general recommendation that chunk sizes be between 50-100 MBs.

Thoughts? If you don't have another preference, I'll build the Zarr with target_chunks={"time": 6}, which is the integer chunk size which gets us closest to 50 MBs per chunk.

cache_path = rec.input_cache.root_path  # `rec` is the `"FESOM/surf/fma"` recipe
first_input_url = fs_gcs.ls(cache_path)[0]  # `fs_gcs` is the endpoint where inputs are cached
first_input_file = fs_gcs.open(first_input_url, mode='rb')
ds = xr.open_dataset(first_input_file)

ntime = len(ds.time)  # the number of time slices
ncfile_size = ds.nbytes  # the netcdf file size

target_chunk_options = (6, 15)
for option in target_chunk_options:
    chunksize = int((ncfile_size/ntime) * option)
    print(f"With {option} time slices per chunk, chunksize is {chunksize/1e6} MBs")
With 6 time slices per chunk, chunksize is 48.000177 MBs
With 15 time slices per chunk, chunksize is 120.000442 MBs

@roxyboy
Copy link
Author

roxyboy commented Jun 29, 2021

Thoughts? If you don't have another preference, I'll build the Zarr with target_chunks={"time": 6}, which is the integer chunk size which gets us closest to 50 MBs per chunk.

target_chunks={"time": 6} is fine with me :)

@cisaacstern
Copy link
Member

Noting that, along with #24 (comment), this build is also blocked by pangeo-forge/pangeo-forge-recipes#164. The surface data for this recipe contain single-variable arrays of shape (744, 1000, 1000) (for dims time, lat, lon) and size ~6 GBs.

@cisaacstern
Copy link
Member

I was able to build the FESOM surface data using an un-merged draft of pangeo-forge/pangeo-forge-recipes#166.

It can be accessed via the project catalog, as demonstrated here: https://github.com/pangeo-data/swot_adac_ogcms/blob/main/intake_demo.ipynb

@roxyboy
Copy link
Author

roxyboy commented Sep 3, 2021

@cisaacstern Are the interior data for FESOM on OSN...?

@cisaacstern
Copy link
Member

@cisaacstern Are the interior data for FESOM on OSN...?

Not yet. Is it on the ftp server? Last I was aware it wasn't available over ftp yet.

@roxyboy
Copy link
Author

roxyboy commented Sep 9, 2021

@cisaacstern The FESOM group notified me that their data was corrupted so they re-ran their simulation. They've finished remaking the winter data. Could we swap the data on OSN??
The surface and 3D winter data are here: https://swiftbrowser.dkrz.de/public/dkrz_035d8f6ff058403bb42f8302e6badfbc/SWOT_intercomparison2/

@cisaacstern
Copy link
Member

Sure thing, @roxyboy. Do you have a deadline (either specific or general) for this?

@roxyboy
Copy link
Author

roxyboy commented Sep 9, 2021

It'd be nice if I could include it in the talk I'll give at the high-res ocean modelling workshop that will take place from Sep. 29 so... the sooner the better but if you're occupied with other projects, it's ok if the FESOM data doesn't make it to it.

@cisaacstern
Copy link
Member

That's very doable, probably won't get to it this week, but will get it done by end of next week.

@cisaacstern
Copy link
Member

@roxyboy, apologies that I won't be able to complete this by the end of this week as I'd initially planned. I will make sure to revisit this on Monday and should still be able to get it to you in advance of your deadline. I'll follow-up with a better time estimate next week once I've taken a closer look.

@cisaacstern
Copy link
Member

@roxyboy, I'm removing the existing FESOM surface winter Zarr store from OSN now, to replace it with the corrected data you've linked in #52 (comment).

I'll be adding the winter 3D data at the same time and will ping you again here when both new Zarr stores are complete.

@roxyboy
Copy link
Author

roxyboy commented Sep 20, 2021

@roxyboy, I'm removing the existing FESOM surface winter Zarr store from OSN now, to replace it with the corrected data you've linked in #52 (comment).

I'll be adding the winter 3D data at the same time and will ping you again here when both new Zarr stores are complete.

Ok, thanks a lot! :)

@cisaacstern
Copy link
Member

@roxyboy, the updated FESOM data (surface and interior) is now on OSN and I've added FESOM interior data as an option in the project catalog pangeo-data/swot_adac_ogcms#6

Here's a quick summary of how you'd use the workflow described in pangeo-data/swot_adac_ogcms/intake_demo.ipynb to open the FESOM interior data:

Note this would need to be run from within theswot_adac_ogcms project repo to have access to the local imports

# load the catalog and determine the allowable params for the selected item
from validate_catalog import all_params
params_dict, cat = all_params()
item = "FESOM"
params_dict[item]
[{'datatype': 'surf', 'season': 'fma'},
 {'datatype': 'surf', 'season': 'aso'},
 {'datatype': 'int', 'season': 'fma'},
 {'datatype': 'int', 'season': 'aso'}]
# select a param set from the list and confirm its the one you want
params = params_dict[item][2]
print(item, params)
FESOM {'datatype': 'int', 'season': 'fma'}
# load item with the selected params
ds = cat[item](**params).to_dask()
print(ds)
<xarray.Dataset>
Dimensions:  (depth: 48, lat: 1000, lon: 1000, time: 90)
Coordinates:
  * depth    (depth) float64 0.0 -5.0 -10.0 -15.0 ... -760.0 -860.0 -1.04e+03
  * lat      (lat) float64 30.0 30.01 30.02 30.03 ... 39.97 39.98 39.99 40.0
  * lon      (lon) float64 -78.0 -77.99 -77.98 -77.97 ... -68.02 -68.01 -68.0
  * time     (time) datetime64[ns] 2012-02-01 2012-02-02 ... 2012-04-30
Data variables:
    salt     (time, depth, lat, lon) float32 dask.array<chunksize=(10, 48, 1000, 1000), meta=np.ndarray>
    temp     (time, depth, lat, lon) float32 dask.array<chunksize=(10, 48, 1000, 1000), meta=np.ndarray>
    u        (time, depth, lat, lon) float32 dask.array<chunksize=(10, 48, 1000, 1000), meta=np.ndarray>
    v        (time, depth, lat, lon) float32 dask.array<chunksize=(10, 48, 1000, 1000), meta=np.ndarray>
    w        (time, depth, lat, lon) float32 dask.array<chunksize=(10, 48, 1000, 1000), meta=np.ndarray>
Attributes:
    CDI:          Climate Data Interface version 1.9.6 (http://mpimet.mpg.de/...
    CDO:          Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/...
    Conventions:  CF-1.6
    history:      Thu Sep 09 15:44:25 2021: cdo mergetime 2012-02_01_w_cubic....

@roxyboy
Copy link
Author

roxyboy commented Sep 28, 2021

@cisaacstern FESOM data appears to working fine :) Thanks again!

@cisaacstern cisaacstern added the swot-adac SWOT Adopt-a-Crossover Dataset label Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swot-adac SWOT Adopt-a-Crossover Dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants