Skip to content

Commit

Permalink
Update README.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
rhysrevans3 authored Apr 23, 2024
1 parent 981b054 commit 9dfb97f
Showing 1 changed file with 91 additions and 55 deletions.
146 changes: 91 additions & 55 deletions example/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,88 +25,124 @@ To just run it and see the example output, open ``example_notebook.ipynb``.
Local deployment
-----------------

1. Install the requirements
1. Install the generator

.. code-block::
pip install -r requirements.txt
pip install -r .
2. Generate items by running the `stac-generator`_

.. code-block::
stac_generator -c example/conf/item-generator.yaml
stac_generator -c conf/item-generator.yaml
3. Generate collections by running the `stac-generator`_

.. code-block::
stac_generator -c example/conf/collection-generator.yaml
stac_generator -c conf/collection-generator.yaml
Inputs Explained
================

The yaml files in conf setup the input and outputs for the script. In this case, the input is an intake-esm catalog and the output is the terminal.
The yaml files in conf setup the input and outputs for the script. In this case, the input is an intake-esm catalog and the output is the terminal, json file, and a text file. The text file is then used for collection generation.

The file in collection-descriptions, describes the workflow to extract the facets.
.. code-block:: yaml
# The type of generator to be run
generator: item
# The root directory of the recipes
recipes_root: recipes/
# The input plugins to be run for the generator
inputs:
- name: text_file
filepath: input/assets.txt
# The output plugins to be run for the generator
outputs:
- name: standard_out
# Output plugins can use mappings to reshape the output
mappings:
- name: stac
stac_version: '1.0.0'
stac_extensions: []
- name: json_file
dirpath: output/items
filename_term: id
mappings:
- name: stac
stac_version: '1.0.0'
stac_extensions: []
- name: text_file
filepath: input/collections.txt
The recipes in example/recipes, describes the steps needed to extract and manipulate the metadata.

.. code-block:: yaml
# The paths that the recipe will be run on
paths:
- https://cmip6-zarr-o.s3-ext.jc.rl.ac.uk/CMIP6.CMIP.MOHC.UKESM1-0-LL
asset:
# The default asset id is a hash of the assets uri
extraction_methods:
# - method: posix_stats
- method: regex
inputs:
regex: 'https://cmip6-zarr-o.s3-ext.jc.rl.ac.uk\/(?P<mip_era>\w+)\.(?P<activity_id>\w+)\.(?P<institution_id>[\w-]+)\.(?P<source_id>[\w-]+)\/(?P<experiment_id>[\w-]+)\.(?P<member_id>\w+)\.(?P<table_id>\w+)\.(?P<var_id>\w+)\.(?P<grid_label>\w+)\.(?P<version>\w+)'
item:
# The default item id is a hash of the collection id
- https://cmip6-zarr-o.s3-ext.jc.rl.ac.uk/CMIP6.C4MIP.MOHC.UKESM1-0-LL
# The type of STAC record that will be generated
type: item
# These extraction methods will be run after `extraction_methods` and should generate the id of the record
id:
method: hash
- method: default
inputs:
terms:
- mip_era
- activity_id
- institution_id
- source_id
- table_id
- var_id
- version
defaults:
item_id: $instance_id
# The extaction methods are run in series with the output dictionary is passed from one to the next
# extaction methods add, update or remove the data from the output dictionary
extraction_methods:
- method: json_file
- method: regex
inputs:
filepath: tests/file-io/assets.json
terms:
- mip_era
- activity_id
- institution_id
- source_id
- table_id
- var_id
- version
collection:
# The default collection id is "undefined"
id:
method: default
regex: '\/(?P<mip_era>\w*)\.(?P<activity_id>\w*)\.(?P<institution_id>[\w-]*)\.(?P<source_id>[\w-]*)\/(?P<experiment_id>[\w-]*)\.(?P<member_id>\w*)\.(?P<table_id>\w*)\.(?P<var_id>\w*)\.(?P<grid_label>\w*)\.(?P<version>\w*)'
- method: string_template
inputs:
value: cmip6
extraction_methods:
- method: json_file
template: '{mip_era}.{activity_id}.{institution_id}.{source_id}.{table_id}.{var_id}.{version}'
output_key: instance_id
# Some extraction methods generate assets which can also include their own list of extration methods to be run on the assets
- method: intake_assets
inputs:
uri: https://raw.githubusercontent.com/cedadev/cmip6-object-store/master/catalogs/ceda-zarr-cmip6.json
object_path_attr: zarr_path
search_kwargs:
mip_era: $mip_era
activity_id: $activity_id
institution_id: $institution_id
source_id: $source_id
table_id: $table_id
variable_id: $var_id
version: $version
extraction_methods:
- method: default
inputs:
defaults:
roles: ["data"]
- method: lambda
inputs:
function: 'lambda assets: {f"data{str(en+1).zfill(4)}": assets[key] for en, key in enumerate(sorted(assets))}'
input_args:
- $assets
output_key: assets
- method: remove
inputs:
filepath: tests/file-io/items.json
terms:
- mip_era
- activity_id
- institution_id
- source_id
- table_id
- var_id
- version
keys:
- uri
# member of defines the other recipes that define a parent of this record
member_of:
- recipes/collection/CMIP6.CMIP.MOHC.UKESM1-0-LL.yaml
Outputs Explained
Expand All @@ -115,7 +151,7 @@ Outputs Explained
STAC Generation
---------------

The stac-genetator outputs:
The output of the extraction methods is a dictionary of the metadata:

.. code-block:: python
Expand Down Expand Up @@ -157,7 +193,7 @@ The stac-genetator outputs:
Mappings
--------

The mappings can be used to re-arange the output to the desired framework. For example the STAC mapping:
The mappings can be used to re-arange the output into a desired framework. For example using the STAC mapping:

.. code-block:: python
Expand Down

0 comments on commit 9dfb97f

Please sign in to comment.