State machines in Kubernetes 🐦🔥
The operator (Kubernetes) part will be added soon! Right now we have the configs that create the state machine and orchestrate the work.
- go version v1.22.0+
- docker version 17.03+.
- kubectl version v1.11.3+.
- Access to a Kubernetes v1.11.3+ cluster.
You can create a cluster locally (if your computer is chonky and can handle it) or use AWS. Here is locally:
kind create cluster --config ./examples/kind-config.yaml
We provide two examples - one using the operator, and one manual for those that want to create the various objects and understand how the state machine operator (and corresponding Python library) work. For the manual examples, see the readme in examples. We will continue here with the operator example.
The operator is built via its manifest in dist. For development:
# Install and load into general cluster
make test-deploy-recreate
# Install and load into kind
make test-deploy-kind
For non-development:
kubectl apply -f examples/dist/state-machine-operator.yaml
And apply the CRD to create the state machine. For interactive work, remember to set spec->workflow->interactive (or the same for any job under jobs) to true.
kubectl apply -f examples/state-machine.yaml
These are some design decisions I've made (of course open to discussion):
- The workflow model is a state machine - state is derived from Kubernetes, always
- The state machine manager manages units of job sequences (each a state machine) and each state machine orchestrates the logic of the jobs within it.
- No application code (the jobs) is tangled with the state machine or manager
- We assume jobs don't need to be paused / resumed / reclaimed like on HPC
- Jobs are modular units with a config known how to be parsed by the manager, and the rest is provided to them.
- We likely want to test with a real registry OR allow a volume bind (existing data) to the registry.
- Otherwise, artifacts deleted on cleanup. We could also have an option that allows keeping the ephemeral registry.
- Under what conditions do we cancel / cleanup jobs?
- I haven't tested a failure yet (or need to cleanup / delete)
- We might want to do other cleanup (e.g., config maps)
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614