Add tutorial on intake catalog #298

anton-seaice · 2023-10-09T03:07:50Z

Closes #295

Add a tutorial on moving from cosima_cookbook to the intake catalog.

review-notebook-app · 2023-10-09T03:07:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

navidcy · 2023-10-14T14:46:44Z

@anton-seaice can you resolve the conflicts?

anton-seaice · 2023-10-15T10:14:47Z

@anton-seaice can you resolve the conflicts?

Ok done

dougiesquire · 2023-10-17T02:41:32Z

Thanks @anton-seaice! I haven't looked at this in detail yet (sorry), but my first thought is that this could possibly be more focussed on how to do the same things with the cookbook/catalogue and less focussed on plotting ice data. For example, sections on:

Loading the cookbook/catalogue
Finding model runs
Loading variables from a COSIMA run
- include tips on passing kwargs and chunking
Loading data directly as a datatree (possibly?)

These sections could emphasise the differences (+pros/cons) between the two approaches. Some of this info is already in your notebook, but adapting the notebook to make this info the focus might make it easier to find/ingest/remember.

anton-seaice · 2023-10-17T03:35:29Z

Thanks @anton-seaice! I haven't looked at this in detail yet (sorry), but my first thought is that this could possibly be more focussed on how to do the same things with the cookbook/catalogue and less focussed on plotting ice data. For example, sections on:
* Loading the cookbook/catalogue

* Finding model runs

* Loading variables from a COSIMA run
  
  * include tips on passing kwargs and chunking

* Loading data directly as a datatree (possibly?)
These sections could emphasise the differences (+pros/cons) between the two approaches. Some of this info is already in your notebook, but adapting the notebook to make this info the focus might make it easier to find/ingest/remember.

I was thinking we needed the sections on trying to load some real data to show why we needed to do some steps. i.e. you only discover the need for extra filtering of search results if you are trying to load a field without the time dimension, simlarly you can't subset by a date range correctly before you have subtracted the 'timedelta'.

I could end the notebook at the line 'There are no more changes to the notebook', and use your 4 headings to breakup the notebook into clearer sections?

dougiesquire · 2023-10-17T03:53:19Z

'There are no more changes to the notebook'

So is this an adaption of another notebook? Is that made clear somewhere? We possibly want to avoid having two notebooks that are related, as they then have to be maintained together to remain useful.

simlarly you can't subset by a date range correctly before you have subtracted the 'timedelta'

This is a quirk of the ice data right? Not related to the use of the catalog.

Yes, I agree that we want to demonstrate loading data etc in the notebook, but we don't have to do anything with those data. Instead, you could demonstrate loading some data using both methods and show that they give the same thing. I guess the key message is that:

da = cosima_cookbook.querying.getvar(expt="expt0", variable="var0", session=session, frequency="1 monthly")

translates to

ds = catalog["expt0"].search(variable="var0", frequency="1mon").to_dask()

and then differences (e.g. fixed frequency data, time filtering, queries that return files spanning multiple datasets) could be demonstrated from there.

anton-seaice · 2023-10-17T04:12:49Z

Ill update to focus on those points :)

anton-seaice · 2023-10-19T04:12:01Z

@dougiesquire Back to you. I'm not very good with dask - so I am not sure the chunking section is very good, your input would be useful :)

I realised it doesn't show the dask graph size or chunk memory size in the github view, so maybe we need a different way of demonstrating the chunk size?

dougiesquire · 2023-10-20T04:40:27Z

@anton-seaice, do you mind if I make some changes directly to your notebook (if I have permission)? There are a few things that could be changed/added and doing this via reviewnb is always a bit painful

anton-seaice · 2023-10-20T04:42:00Z

@anton-seaice, do you mind if I make some changes directly to your notebook (if I have permission)? There are a few things that could be changed/added and doing this via reviewnb is always a bit painful

No problem. Please do that. You should be able to pull straight from the pr (gh pr checkout 298)

dougiesquire · 2023-10-24T10:54:28Z

@anton-seaice sorry it took me so long to look at this properly. I've changed/added a bunch of things. Let me know what you think.

review-notebook-app · 2023-10-24T22:40:13Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:12Z
----------------------------------------------------------------

These warnings take up the whole page, so maybe run this command with

%%capture --no-stderr

or similar, and reference COSIMA/cosima-cookbook#333

review-notebook-app · 2023-10-24T22:40:14Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:13Z
----------------------------------------------------------------

I think the catalog should go first and the cookbook second under the heading "opening data"

dougiesquire commented on 2023-10-24T22:49:53Z
----------------------------------------------------------------

I thought it could be best to first show people what they're familiar with, then demonstrate that you do exactly the same thing with the catalog. But I'll leave it up to you. Happy with whatever, so long as it's clear

review-notebook-app · 2023-10-24T22:40:15Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:14Z
----------------------------------------------------------------

"The catalog loads multiple frequencies of the same variable"

dougiesquire commented on 2023-10-24T22:51:58Z
----------------------------------------------------------------

But it's not just different frequencies. You might have data on different grids for example that the catalog could load into distinct datasets with a single call, but the cookbook couldn't

review-notebook-app · 2023-10-24T22:40:16Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:15Z
----------------------------------------------------------------

I think this bit adds confusion:

"

Additionally, there are cases where the cookbook will silently load and concatenate data on different grids, because it does not know which files in the database should/shouldn't be concatenated together. "

Maybe put at the end of this sub section?

Then the rest could be:

"The catalog knows which files are at different frequencies and provide methods to open multiple datasets from one query. For example, trying to open a variable available at multiple frequencies without specifying a frequency, gives an error, but we can open them using to_dataset_dict() rather than to_dask()"

dougiesquire commented on 2023-10-24T23:03:36Z
----------------------------------------------------------------

The point is that the cookbook will effectively just find all the files in the database that match the query and try to concatenate them. The catalog however knows which files should be concatenated together and which shouldn't. As a hypothetical example, imagine your query returns 6 files: 2 monthly on grid1, 2 daily on grid1 and 2 monthly on grid2. The cookbook will try to concatenate these into a single dataset (actually it will probably thrown an error first). The catalog will return 3 datasets.

So the difference is more general than just frequencies. Feel free to remove/edit the paragraph you found confusing, but I don't think we should make this section just about frequencies.

review-notebook-app · 2023-10-24T22:40:17Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:16Z
----------------------------------------------------------------

Well need to add a cell tag to ignore this cell in jenkins

dougiesquire commented on 2023-10-24T23:03:52Z
----------------------------------------------------------------

I think I've done that already

review-notebook-app · 2023-10-24T22:40:18Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:17Z
----------------------------------------------------------------

"Alternatively, multiple datasets can be opened directly into an xarray-datatree by ..."

review-notebook-app · 2023-10-24T22:40:19Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:18Z
----------------------------------------------------------------

mask the warnings or make this a raw cell

review-notebook-app · 2023-10-24T22:40:20Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:19Z
----------------------------------------------------------------

Move this part into a new comment below the code cell?

This is mostly because the opening of datasets is a trivially parallelized task that is done lazily anyway so opening all files and reducing the times using xarray's sel methods shouldn't add too much overhead. In most cases where the overhead of opening the files seems large, this can be reduced through sensible choices of keyword arguments provided to open_dataset and combine_by_coords - see the xarray documentation on Reading multi-file datasets for details.

This note can be made clearer - ill work on it

review-notebook-app · 2023-10-24T22:40:21Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:20Z
----------------------------------------------------------------

Possibly this section should go before "Some important differences" because these are things everyone needs to know? The differences section are more intuitive?

dougiesquire commented on 2023-10-24T23:08:46Z
----------------------------------------------------------------

I'm not sure I agree. Maybe everything could be merged into a single section? Or they each become their own sections?

review-notebook-app · 2023-10-24T22:40:21Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:21Z
----------------------------------------------------------------

I think expand this cell similar to

"Passing keyword arguments is different" section

review-notebook-app · 2023-10-24T22:40:22Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:22Z
----------------------------------------------------------------

Correctly choosing chunk sizes when you open datasets will greatly improve the speed of your analysis. Check out the ...

review-notebook-app · 2023-10-24T22:40:23Z

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T22:40:23Z
----------------------------------------------------------------

I'm not sure this heading is clear?

"Loading time-invariant variables"

"Loading model information"

Are there variables other than grid information?

"Loading grid information"

dougiesquire commented on 2023-10-24T23:09:40Z
----------------------------------------------------------------

"Loading time-invariant variables" is good

dougiesquire · 2023-10-24T22:49:54Z

I thought it could be best to first show people what they're familiar with, then demonstrate that you do exactly the same thing with the catalog. But I'll leave it up to you. Happy with whatever, so long as it's clear

View entire conversation on ReviewNB

dougiesquire · 2023-10-24T22:51:59Z

But it's not just different frequencies. You might have data on different grids for example that the catalog could load into distinct datasets with a single call, but the cookbook couldn't

View entire conversation on ReviewNB

dougiesquire · 2023-10-24T23:03:37Z

The point is that the cookbook will effectively just find all the files in the database that match the query and try to concatenate them. The catalog however knows which files should be concatenated together and which shouldn't. As a hypothetical example, imagine your query returns 6 files: 2 monthly on grid1, 2 daily on grid1 and 2 monthly on grid2. The cookbook will try to concatenate these into a single dataset (actually it will probably thrown an error first). The catalog will return 3 datasets.

So the difference is more general than just frequencies. Feel free to remove/edit the paragraph you found confusing, but I don't think we should make this section just about frequencies.

View entire conversation on ReviewNB

dougiesquire · 2023-10-24T23:03:53Z

I think I've done that already

View entire conversation on ReviewNB

dougiesquire · 2023-10-24T23:08:47Z

I'm not sure I agree. Maybe everything could be merged into a single section? Or they each become their own sections?

View entire conversation on ReviewNB

dougiesquire · 2023-10-24T23:09:41Z

"Loading time-invariant variables" is good

View entire conversation on ReviewNB

anton-seaice · 2023-11-02T02:52:20Z

Dougie and I have finished looking at this, if someone else is able to review that would be great. @navidcy ?

This reverts commit 6771596.

anton-seaice · 2023-11-29T05:38:27Z

@micaeljtoliveira Do you mind giving this a quick review? Dougie and I are happy with it

max-anu

I've read through the notebook - It's good and I don't think anything else needs to be added. I've also checked that it runs correctly.

anton-seaice added 2 commits October 9, 2023 13:56

Add tutoril on intake catalog

5bc3cdd

Add intake notebook to tests

49f4797

anton-seaice added the 📔 tutorial label Oct 9, 2023

anton-seaice requested a review from dougiesquire October 9, 2023 03:07

anton-seaice self-assigned this Oct 9, 2023

Rename Using_The_Intake_Catalog.ipynb to Using_Intake_Catalog.ipynb

6a27004

navidcy requested review from PaulSpence, adele-morrison, aekiss and edoddridge October 14, 2023 14:47

Merge branch 'main' into intake-example

1e04f15

Shorten and tidy up intake example

cc96d67

Formatting in intake example

739fc64

Demonstrate differences, add tips and gotchas

9b4c927

anton-seaice added 2 commits November 2, 2023 13:44

Minor review changes

e33a40a

Minor review changes

73e039f

anton-seaice added 2 commits November 2, 2023 15:46

updates for parcels 3.0.0

6771596

Revert "updates for parcels 3.0.0"

75b5da6

This reverts commit 6771596.

max-anu self-requested a review November 30, 2023 01:07

max-anu approved these changes Nov 30, 2023

View reviewed changes

max-anu merged commit f7a4ee5 into COSIMA:main Nov 30, 2023

anton-seaice deleted the intake-example branch January 8, 2024 03:35

navidcy mentioned this pull request Jan 8, 2024

Converting notebooks from COSIMA Cookbook to ACCESS-NRI intake catalog #313

Open

43 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial on intake catalog #298

Add tutorial on intake catalog #298

anton-seaice commented Oct 9, 2023

review-notebook-app bot commented Oct 9, 2023

navidcy commented Oct 14, 2023 •

edited

Loading

anton-seaice commented Oct 15, 2023

dougiesquire commented Oct 17, 2023 •

edited

Loading

anton-seaice commented Oct 17, 2023

dougiesquire commented Oct 17, 2023

anton-seaice commented Oct 17, 2023

anton-seaice commented Oct 19, 2023 •

edited

Loading

dougiesquire commented Oct 20, 2023

anton-seaice commented Oct 20, 2023

dougiesquire commented Oct 24, 2023

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

anton-seaice commented Nov 2, 2023

anton-seaice commented Nov 29, 2023

max-anu left a comment

Add tutorial on intake catalog #298

Add tutorial on intake catalog #298

Conversation

anton-seaice commented Oct 9, 2023

review-notebook-app bot commented Oct 9, 2023

navidcy commented Oct 14, 2023 • edited Loading

anton-seaice commented Oct 15, 2023

dougiesquire commented Oct 17, 2023 • edited Loading

anton-seaice commented Oct 17, 2023

dougiesquire commented Oct 17, 2023

anton-seaice commented Oct 17, 2023

anton-seaice commented Oct 19, 2023 • edited Loading

dougiesquire commented Oct 20, 2023

anton-seaice commented Oct 20, 2023

dougiesquire commented Oct 24, 2023

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

"Passing keyword arguments is different" section

review-notebook-app bot commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023 • edited Loading

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

dougiesquire commented Oct 24, 2023

anton-seaice commented Nov 2, 2023

anton-seaice commented Nov 29, 2023

max-anu left a comment

Choose a reason for hiding this comment

navidcy commented Oct 14, 2023 •

edited

Loading

dougiesquire commented Oct 17, 2023 •

edited

Loading

anton-seaice commented Oct 19, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023 •

edited

Loading