decide on attributes and metadata for downscaled data #179

dgergel · 2021-09-10T22:59:25Z

We need to iterate and decide on how we want to set up our attributes and metadata for downscaled zarr stores - this is just a placeholder to start the discussion.

brews · 2021-09-13T18:59:54Z

I think there is a more essential low-hanging fruit version of this. As it is right now, we're just dumping blob data into azure storage. We need at least enough basic metadata to know what we're looking at, at the simplest level (source_id, variable_id, etc...), even if that information is not encoded in the data path.

dgergel · 2021-09-13T20:18:31Z

That makes sense. So first pass can be all of the basic identifiers - e.g. SSP, model, variable, frequency, etc - along with methods used (e.g. QDM/AIQPD, wet day frequency for precip). We can iron out the CF-conventions-compliance part of this as a next step. Sound good?

In terms of what data actually needs metadata, I'm thinking the bias corrected and downscaled outputs - any intermediate outputs as well? Cleaned and rechunked CMIP6 input data to bias correction perhaps?

Thinking we add a function in dodola for now that adds attributes based on the step of the workflow and on the tuple identifiers unique to each GCM/variable/scenario. We would have those as parameters in the workflow already. How does that sound to you @brews?

brews · 2021-09-14T19:48:00Z

@dgergel

Thinking we add a function in dodola for now that adds attributes based on the step of the workflow and on the tuple identifiers unique to each GCM/variable/scenario. We would have those as parameters in the workflow already. How does that sound to you @brews?

The immediate issue is that metadata needs to be preserved between workflow steps because it's useful to validate and write final-ish output. All of this "immediately essential" data (stuff we use for I/O) is present as soon as we download it from CMIP6-in-the-cloud. We just need to hold on to it. Right now, ClimateImpactLab/dodola#116 is likely the most obvious offender.

brews · 2021-10-27T22:40:35Z

For an example of what we currently have in place, here is a copy of the Dataset metadata for output:

Our additions have a dc6_ prefix. This separates original metadata from our contributions.

I need additional feedback to continue. Any other input?

dgergel added the enhancement label Sep 10, 2021

dgergel assigned brews, dgergel and kemccusker Sep 10, 2021

This was referenced Sep 16, 2021

Add proper biascorrect output using Zarr metadata, bug fixes #219

Merged

Downscale on multiple SSPs, output downscaled to proper storage bucket #221

Merged

Update parameter design to allow multiple independent SSP, historical runs #222

Merged

brews mentioned this issue Oct 22, 2021

Begin adding downscale workflow metadata to key output #281

Merged

brews mentioned this issue Nov 15, 2021

Switch workflow QDM and QPLAD to set output metadata with JSON #320

Merged

brews closed this as completed in #320 Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decide on attributes and metadata for downscaled data #179

decide on attributes and metadata for downscaled data #179

dgergel commented Sep 10, 2021

brews commented Sep 13, 2021

dgergel commented Sep 13, 2021 •

edited

Loading

brews commented Sep 14, 2021

brews commented Oct 27, 2021 •

edited

Loading

decide on attributes and metadata for downscaled data #179

decide on attributes and metadata for downscaled data #179

Comments

dgergel commented Sep 10, 2021

brews commented Sep 13, 2021

dgergel commented Sep 13, 2021 • edited Loading

brews commented Sep 14, 2021

brews commented Oct 27, 2021 • edited Loading

dgergel commented Sep 13, 2021 •

edited

Loading

brews commented Oct 27, 2021 •

edited

Loading