Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decide on attributes and metadata for downscaled data #179

Closed
dgergel opened this issue Sep 10, 2021 · 4 comments · Fixed by #320
Closed

decide on attributes and metadata for downscaled data #179

dgergel opened this issue Sep 10, 2021 · 4 comments · Fixed by #320
Assignees

Comments

@dgergel
Copy link
Member

dgergel commented Sep 10, 2021

We need to iterate and decide on how we want to set up our attributes and metadata for downscaled zarr stores - this is just a placeholder to start the discussion.

@brews
Copy link
Member

brews commented Sep 13, 2021

I think there is a more essential low-hanging fruit version of this. As it is right now, we're just dumping blob data into azure storage. We need at least enough basic metadata to know what we're looking at, at the simplest level (source_id, variable_id, etc...), even if that information is not encoded in the data path.

@dgergel
Copy link
Member Author

dgergel commented Sep 13, 2021

That makes sense. So first pass can be all of the basic identifiers - e.g. SSP, model, variable, frequency, etc - along with methods used (e.g. QDM/AIQPD, wet day frequency for precip). We can iron out the CF-conventions-compliance part of this as a next step. Sound good?

In terms of what data actually needs metadata, I'm thinking the bias corrected and downscaled outputs - any intermediate outputs as well? Cleaned and rechunked CMIP6 input data to bias correction perhaps?

Thinking we add a function in dodola for now that adds attributes based on the step of the workflow and on the tuple identifiers unique to each GCM/variable/scenario. We would have those as parameters in the workflow already. How does that sound to you @brews?

@brews
Copy link
Member

brews commented Sep 14, 2021

@dgergel

Thinking we add a function in dodola for now that adds attributes based on the step of the workflow and on the tuple identifiers unique to each GCM/variable/scenario. We would have those as parameters in the workflow already. How does that sound to you @brews?

The immediate issue is that metadata needs to be preserved between workflow steps because it's useful to validate and write final-ish output. All of this "immediately essential" data (stuff we use for I/O) is present as soon as we download it from CMIP6-in-the-cloud. We just need to hold on to it. Right now, ClimateImpactLab/dodola#116 is likely the most obvious offender.

@brews
Copy link
Member

brews commented Oct 27, 2021

For an example of what we currently have in place, here is a copy of the Dataset metadata for output:

Our additions have a dc6_ prefix. This separates original metadata from our contributions.

I need additional feedback to continue. Any other input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants