-
Notifications
You must be signed in to change notification settings - Fork 1
Minimal CLI #1
Minimal CLI #1
Changes from 197 commits
10134a0
e9a2e79
7bf8fe6
4ec7901
e7324dd
ea91ff2
42c12f9
6e5f4d8
0d631fd
f0988c0
f7be5f1
20728f7
b940764
12259bb
a18a18f
29b3158
1313767
ce22844
a1c4862
d5fc27c
d1bb451
d7ac168
ec43b2b
116145f
41a3866
74197ff
7380e73
8d78c6b
9da64e4
0e050a2
8177be2
77abbcb
ff17cfe
c1e6368
93d41e6
7bef2d2
63ebce5
6f5edb0
2bd2f9b
06906f6
43ec795
27a92fa
05a1355
2ef7ba6
2994442
2ea1f4a
32d3d5b
9694fc2
c15d662
39e7ce8
098acee
b5e688d
dd09c39
4dc3a82
6d4efeb
83fb6be
ec75b06
ac290bb
585c971
e2b575d
020e612
92dded1
3965b9a
c0c3e01
867d692
3f78cb8
56324e1
7098191
615151b
bcda6c5
3640058
8842e4f
9ad68ee
671bf9f
39866c8
c0b3bb5
3b39b32
0704595
1e18298
fbd6a83
7eb3d54
45764a3
0d8d086
a62ad94
87249a0
4462bdf
a82d172
47955d6
a637779
d6ce749
35f40cd
7775a1c
c7e439e
56ac98a
8afa0e3
0979c79
a686d5f
742e599
0089538
ec668cb
caadae0
2f5f011
cc2ea2a
78b6fbe
ca12c2d
eab44a2
1b26af9
1b3027e
337fd6a
4d9051a
61b9a2a
fe4a243
c8b401e
270391c
b29d64f
fde1904
a109d21
872c01d
5b6b008
9acd5cf
308dc5a
c438c40
d7a4950
1b121ad
8a05e81
bdc385a
c8a202d
f518588
d0a91eb
bdda1e8
3689fe4
3fa1fd6
9d28da9
073e081
c4fd499
b48c53e
de36c30
201c70e
a8c6eef
0f28ca5
caffafe
85894ec
a955924
e1590a7
8aeecb0
c70afca
c92f50f
d8d778d
5d9aeab
295a0ae
939a44d
87a546c
cca7933
8381d08
7fa25f4
a31e5f0
4aee117
11004b6
86bd82c
48e6e4b
6f70e79
bb125f3
df235d3
4168b16
731d9ab
fd22f4a
d45ebcf
cda5823
8eceefe
c48d6b5
d402546
39fc27d
4f96722
e5ccf53
7ff972e
2700d47
6137fc8
1fe483d
bc1b181
1292bb4
05dcf4f
46b2008
0284af5
e4d953b
6209892
d7de6e1
0638a26
b71c8b1
7ffb2bc
c25e5eb
f476c24
cb14444
6269cf7
d638a0a
228484b
2646d1b
7040f85
cf0576c
43c86f2
5e59706
b86182c
701a77f
07d6393
5212885
e9c8100
242769a
04a8eb6
0e149d3
f110c7e
0585fc4
b6d238f
2978d52
c437a92
2402893
c645918
68f534b
bc82f04
2529779
67ef8d4
08f377d
102de6d
7cb9467
2f63644
69b4a6b
5da8378
3078845
fbf4cc1
5138d58
0987701
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[run] | ||
omit = | ||
*/tests/* |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/.github/workflows/main.yaml | ||
|
||
name: Tests | ||
|
||
on: | ||
push: | ||
branches: "*" | ||
paths-ignore: | ||
- 'docs/**' | ||
pull_request: | ||
branches: master | ||
paths-ignore: | ||
- 'docs/**' | ||
|
||
env: | ||
PYTEST_ADDOPTS: "--color=yes" | ||
|
||
jobs: | ||
test: | ||
name: ${{ matrix.python-version }}-build | ||
runs-on: ubuntu-latest | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
python-version: [3.8, 3.9] | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Python | ||
uses: actions/setup-python@v1 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
architecture: x64 | ||
# - name: Cache conda | ||
# uses: actions/cache@v1 | ||
# env: | ||
# Increase this value to reset cache if ci/py${{ matrix.python-version }}.yml has not changed | ||
# CACHE_NUMBER: 0 | ||
# with: | ||
# path: ~/conda_pkgs_dir | ||
# key: ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('ci/py${{ matrix.python-version }}.yml') }} | ||
- name: setup miniconda | ||
uses: conda-incubator/setup-miniconda@v2 | ||
with: | ||
activate-environment: pfo-poetry | ||
# environment-file: ci/py${{ matrix.python-version }}.yml | ||
python-version: ${{ matrix.python-version }} | ||
auto-activate-base: false | ||
# use-only-tar-bz2: true | ||
- name: install pangeo-forge-orchestrator plus deps | ||
shell: bash -l {0} | ||
run: | | ||
conda install -c conda-forge poetry | ||
poetry install | ||
cisaacstern marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- name: install ipykernel for papermill notebook execution | ||
shell: bash -l {0} | ||
run: | | ||
python -m ipykernel install --user --name pfo-poetry | ||
- name: print conda env | ||
shell: bash -l {0} | ||
run: | | ||
conda info | ||
conda list | ||
- name: Run Tests | ||
shell: bash -l {0} | ||
run: | | ||
pytest tests -v --cov=pangeo_forge_orchestrator \ | ||
--cov-config .coveragerc \ | ||
--cov-report term-missing \ | ||
--cov-report xml \ | ||
--durations=10 --durations-min=1.0 | ||
- name: Codecov | ||
uses: codecov/[email protected] | ||
with: | ||
file: ./coverage.xml | ||
env_vars: OS,PYTHON | ||
name: codecov-umbrella | ||
fail_ci_if_error: false |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
name: pre-commit | ||
|
||
on: | ||
pull_request: | ||
push: | ||
branches: [main] | ||
|
||
jobs: | ||
pre-commit: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions/setup-python@v2 | ||
- uses: pre-commit/[email protected] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# See: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/.pre-commit-config.yaml | ||
|
||
repos: | ||
|
||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v3.1.0 | ||
hooks: | ||
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: check-docstring-first | ||
- id: check-json | ||
- id: check-yaml | ||
- id: pretty-format-json | ||
args: ["--autofix", "--indent=2", "--no-sort-keys"] | ||
|
||
- repo: https://github.com/ambv/black | ||
rev: 19.10b0 | ||
hooks: | ||
- id: black | ||
args: ["--line-length", "100"] | ||
|
||
- repo: https://gitlab.com/pycqa/flake8 | ||
rev: 3.8.3 | ||
hooks: | ||
- id: flake8 | ||
args: | ||
- "--max-line-length=100" | ||
|
||
- repo: https://github.com/asottile/seed-isort-config | ||
rev: v2.2.0 | ||
hooks: | ||
- id: seed-isort-config | ||
|
||
- repo: https://github.com/pre-commit/mirrors-mypy | ||
rev: 'v0.910' | ||
hooks: | ||
- id: mypy | ||
exclude: tests | ||
|
||
- repo: https://github.com/pycqa/isort | ||
rev: 5.5.4 | ||
hooks: | ||
- id: isort | ||
args: ["--profile", "black"] | ||
|
||
- repo: https://github.com/myint/rstcheck | ||
rev: 3f92957478422df87bd730abde66f089cc1ee19b | ||
hooks: | ||
- id: rstcheck |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
version: 2 | ||
|
||
sphinx: | ||
configuration: docs/conf.py | ||
|
||
# Optionally set the version of Python and requirements required to build your docs | ||
python: | ||
version: 3.8 | ||
install: | ||
- requirements: docs/requirements.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
codecov: | ||
require_ci_to_pass: no | ||
max_report_age: off | ||
|
||
comment: false | ||
|
||
coverage: | ||
precision: 2 | ||
round: down | ||
status: | ||
project: | ||
default: | ||
target: 95 | ||
informational: true | ||
patch: off | ||
changes: off | ||
|
||
ignore: | ||
- "tests/*" | ||
- "**/__init__.py" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/docs/Makefile | ||
|
||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/docs/conf.py | ||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = "Pangeo Forge Orchestrator" | ||
copyright = "2021, Pangeo Community" | ||
author = "Pangeo Community" | ||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
extensions = [ | ||
"myst_nb", | ||
"sphinx.ext.autodoc", | ||
"sphinx.ext.extlinks", | ||
# "numpydoc", | ||
"sphinx_autodoc_typehints", | ||
"sphinx_copybutton", | ||
] | ||
|
||
extlinks = { | ||
"issue": ("https://github.com/pangeo-forge/pangeo-forge-orchestrator/issues/%s", "GH issue "), | ||
"pull": ("https://github.com/pangeo-forge/pangeo-forge-orchestrator/pull/%s", "GH PR "), | ||
} | ||
|
||
exclude_patterns = ["_build", "**.ipynb_checkpoints"] | ||
master_doc = "index" | ||
|
||
# we always have to manually run the notebooks because they are slow / expensive | ||
# jupyter_execute_notebooks = "auto" | ||
# execution_excludepatterns = ["tutorials/xarray_zarr/*", "tutorials/hdf_reference/*"] | ||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# https://sphinx-book-theme.readthedocs.io/en/latest/configure.html | ||
html_theme = "pangeo_sphinx_book_theme" | ||
html_theme_options = { | ||
"repository_url": "https://github.com/pangeo-forge/pangeo-forge-orchestrator", | ||
"repository_branch": "main", | ||
"path_to_docs": "docs", | ||
"use_repository_button": True, | ||
"use_issues_button": True, | ||
"use_edit_page_button": True, | ||
} | ||
html_logo = "_static/pangeo-forge-logo-blue.png" | ||
html_static_path = ["_static"] | ||
|
||
myst_heading_anchors = 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Pangeo Forge Orchestrator | ||
|
||
```{warning} | ||
The official Pangeo Forge docs can be found at [https://pangeo-forge.readthedocs.io/](https://pangeo-forge.readthedocs.io/en/latest/). | ||
|
||
You have found the documentation for for `pangeo-forge-orchestrator`, an unreleased package. | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: 1 | ||
|
||
motivation | ||
quick_start | ||
structural_view | ||
new_pydantic_types | ||
use_guide | ||
testing_strategy | ||
next_steps | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
@ECHO OFF | ||
|
||
pushd %~dp0 | ||
|
||
REM Command file for Sphinx documentation | ||
|
||
if "%SPHINXBUILD%" == "" ( | ||
set SPHINXBUILD=sphinx-build | ||
) | ||
set SOURCEDIR=. | ||
set BUILDDIR=_build | ||
|
||
if "%1" == "" goto help | ||
|
||
%SPHINXBUILD% >NUL 2>NUL | ||
if errorlevel 9009 ( | ||
echo. | ||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | ||
echo.installed, then set the SPHINXBUILD environment variable to point | ||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | ||
echo.may add the Sphinx directory to PATH. | ||
echo. | ||
echo.If you don't have Sphinx installed, grab it from | ||
echo.http://sphinx-doc.org/ | ||
exit /b 1 | ||
) | ||
|
||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
goto end | ||
|
||
:help | ||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
|
||
:end | ||
popd |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Motivation | ||
|
||
The first practical application of this package (even before we refactor other automation repos into it) is to orchestrate the process of generating STAC Items for datasets which have already been built with Pangeo Forge. All of the metadata required for cataloging already exists _somewhere_ in Pangeo Forge: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a link for the words |
||
|
||
- the datasets themselves are in a Bakery target somewhere | ||
- other metadata is available in the Feedstocks's `meta.yaml` | ||
|
||
...so a generalizable cataloging approach needs to know: | ||
|
||
- how to access a given Bakery target and open a dataset therein | ||
- which datasets reside at which paths at the given Bakery target | ||
- which Feedstocks (including Feedstock [version](https://github.com/pangeo-forge/roadmap/pull/34)) those paths were built from | ||
- how to read/parse a `meta.yaml` from a given versioned Feedstock |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# New pydantic types | ||
|
||
In updating `meta_types/bakery.py` (as mentioned in {doc}`structural_view`), `pangeo-forge-orchestrator` aims to implement Pydantic-based input validation with the lightest touch possible for each input type. The types were initially implemented as Python dataclasses, so for many of them, the edit was simply to use `pydantic.dataclasses` as a drop-in replacement (perhaps in combination with stricter type hints). In addition, three new dataclasses, two new Models, and some regex-constrained type functions are defined; these are described below. | ||
|
||
## `BakeryName` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we have |
||
|
||
This dataclass validates a string against the [Bakery naming scheme defined in the Bakery database ADR](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0004-use-yaml-file-for-bakery-database.md#bakery-name). As I understand it, the stipulation to "follow [java package syntax](https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html) to ensure unique bakery names" means that these names should all begin with a reversed organization url. Therefore, I'm making the assumption (reflected in this dataclass's type hinting) that un-reversing this portion of the Bakery name should yield a valid `pydantic.HttpUrl`. (More on this in {doc}`use_guide`.) In addition, following the ADR, an acceptable Bakery name input string must conclude with `".bakery.{region}"` where region conforms to an `"{provider}.{region}"` format. | ||
|
||
```{eval-rst} | ||
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BakeryName | ||
:members: | ||
``` | ||
|
||
## `RunRecord` | ||
|
||
> This type (along with the next one, `BuildLogs`) provide the necessary link, mentioned in {doc}`motivation`, between datasets in a Bakery target and the Feedstocks from which they were built. They are new concepts for Pangeo Forge and certainly merit their own ADR. After we move through this PR (assuming we agree that some version of these concepts are useful), we can work on an ADR for them. | ||
|
||
This is a container for metadata describing the execution of a Pangeo Forge Recipe, including: | ||
- **timestamp**: The datetime at which the execution took place (not sure if this should be start, end, or maybe tuple of both). | ||
- **feedstock**: The name of the feedstock (including version). For this PR, I'm provisionally using the format `"{feedstock_name}@{major_version}.{minor_version}"`. | ||
- **recipe**: The name of the Python object within the Feedstock's `recipe.py` module which was used to build the dataset. In the case of a single-recipe module, this is the name of a `pangeo_forge_recipes.recipes` class instance (and therefore needs to be a valid Python identifier). In the case of a [`dict_object`](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0002-use-meta-yaml-to-track-feedstock-metadata.md#recipes-section), this would follow the established convention for the `meta.yaml`: i.e., `"{dict_name}:{dict_key}"`. | ||
- **path**: The relative path to the dataset within the Bakery storage target. | ||
|
||
```{eval-rst} | ||
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.RunRecord | ||
:members: | ||
``` | ||
|
||
|
||
## `BuildLogs` | ||
|
||
This is a mapping between an execution run identifier and a `RunRecord`. Provisionally, I'm specifying a run identifier as a (five digit) integer string, i.e. `"00000"`, `"00001"` etc. These values would be sequentially assigned to each dataset as they are built to a given Bakery storage target. (And each Bakery target would keep its own tally.) There are other identifiers (i.e. the dataset path) which will be unique within a given Bakery storage target, but the idea here is to provide a short string for passing as a command line argument (an example of this is provided in the {doc}`use_guide`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use either an integer or a uuid. Forget about the 5-digit string business. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 bc82f04 (Docs updates to follow) |
||
|
||
> Currently, the `pangeo_forge_orchestrator.components.Bakery` object assumes that a `build_logs.json` (with entries parsable into `BuildLogs` objects) will exist at the Bakery target's root path. By opening and parsing this JSON, the `Bakery` instance knows exactly what dataset paths exist at the target and what Feedstocks they are tied to. In the future, these records could be ingested into a database instead of, or in addition to, keeping a copy in the Bakery target. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Appending build log records to a .json file is not going to be sustainable for long. json is just not an appendable format! Given the simplicity of the build log, a simple CSV would suffice better. In the long run, we really need to think about the right architecture here... Should the bakeries have their own REST endpoint? Sqlite database? Log to a central service? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
```{eval-rst} | ||
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BuildLogs | ||
:members: | ||
``` | ||
|
||
## Why is `StorageOptions` a Model? | ||
|
||
The Bakery database ADR [defines a `storage_options` field](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0004-use-yaml-file-for-bakery-database.md#storage-options) for Bakery target access parameters. The reason we're defining the Python container for these options as a pydantic Model, rather than a dataclass, is for the [`.dict(exclude_none=True)` method](https://pydantic-docs.helpmanual.io/usage/exporting_models/#modeldict), which allows us to define arbitrary numbers of optional type-checked fields for this object, but also succinctly export kwargs dictionaries representing only those fields which have been set on a given instance. | ||
|
||
```{eval-rst} | ||
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.StorageOptions | ||
:members: | ||
``` | ||
|
||
|
||
## ... and what about `BakeryDatabase`? | ||
|
||
The new `BakeryDatabase` object is a Model (instead of a dataclass) because we want to take advantage of Model features for `pangeo_forge_orchestrator.components.Bakery`, and it seemed to make sense to have `Bakery` inherit from `BakeryDatabase`. | ||
|
||
```{eval-rst} | ||
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BakeryDatabase | ||
:members: | ||
``` | ||
|
||
## What's with those regexes? | ||
|
||
There are certain string values (such as the Feedstock name, etc.) which we definitely want to ensure conform to a specified format, but for which it seemed excessive to define an entire class for. For [these cases](https://github.com/pangeo-forge/pangeo-forge-orchestrator/blob/620989215c8d191d55c3080d403d6454a895230b/pangeo_forge_orchestrator/meta_types/bakery.py#L110-L121), I opted to use pydantic's [Constrained Type function, `constr`](https://pydantic-docs.helpmanual.io/usage/types/#constrained-types). I (and I think most people) don't find regular expressions especially readable, so I wrote explanatory comments for each of these cases. | ||
|
||
## Full diff | ||
|
||
And finally, here's [the full diff](https://github.com/pangeo-forge/pangeo-forge-orchestrator/compare/de36c30070f249136a5eb3c0f54144f3eaafb428..620989215c8d191d55c3080d403d6454a895230b#diff-374b3112607d6019e80fa96dff3aec0f9159e803faf62a96ef35330f308bff9b) between Sean's existing types, and those proposed in this PR. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Next steps | ||
|
||
- Circle back to Orchestrator and BuildLogs ADR proposals | ||
- Bring the official Bakery database within spec, as described in the {doc}`use_guide` | ||
- Update the `catalog` CLI (and related pydantic types) to make it possible to build STAC Items for datasets built from un-merged Feedstocks | ||
- Begin to merge automation repositories (i.e. unmerged portions of `pangeo-forge-prefect`) and refactor related GitHub Actions accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm dead config -- is this CL still WIP? Should I come back later?