-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
vdk-meta-jobs: Undelete the package for potential future fixes (#2106)
This PR brings back the vdk-meta-jobs plugin to the repo after it was deleted once the plugin was renamed to vdk-dag. Additionally, this PR includes a necessary fix stemming from a change in the vdk-control-service-api. Testing done: pipelines Signed-off-by: Gabriel Georgiev <[email protected]>
- Loading branch information
1 parent
3bca09a
commit 6a4c358
Showing
45 changed files
with
2,446 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
image: "python:3.7" | ||
|
||
.build-vdk-meta-jobs: | ||
variables: | ||
PLUGIN_NAME: vdk-meta-jobs | ||
extends: .build-plugin | ||
|
||
build-py37-vdk-meta-jobs: | ||
extends: .build-vdk-meta-jobs | ||
image: "python:3.7" | ||
|
||
build-py311-vdk-meta-jobs: | ||
extends: .build-vdk-meta-jobs | ||
image: "python:3.11" | ||
|
||
release-vdk-meta-jobs: | ||
variables: | ||
PLUGIN_NAME: vdk-meta-jobs | ||
extends: .release-plugin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
# Meta Jobs | ||
|
||
Express dependencies between data jobs. | ||
|
||
A plugin for Versatile Data Kit extends its Job API with an additional feature that allows users to trigger so-called Meta Jobs. | ||
|
||
A meta job is a regular Data Job that invokes other Data Jobs using Control Service Execution API. | ||
In this way, there's nothing different from other data jobs except for its purpose. See [Data Job types](https://github.com/vmware/versatile-data-kit/wiki/User-Guide#data-job-types) for more info. | ||
|
||
It's meant to be a much more lightweight alternative to complex and comprehensive workflows solution (like Airflow) | ||
as it doesn't require to provision any new infrastructure or to need to learn new tool. | ||
You install a new python library (this plugin itself) and you are ready to go. | ||
|
||
Using this plugin, you can specify dependencies between data jobs as a direct acyclic graph (DAG). | ||
See [Usage](#usage) for more information. | ||
|
||
## Usage | ||
|
||
``` | ||
pip install vdk-meta-jobs | ||
``` | ||
|
||
Then one would create a single [step](https://github.com/vmware/versatile-data-kit/wiki/dictionary#data-job-step) and | ||
define the jobs we want to orchestrate: | ||
|
||
```python | ||
def run(job_input): | ||
jobs = [ | ||
{ | ||
"job_name": "name-of-job", | ||
"team_name": "team-of-job", | ||
"fail_meta_job_on_error": True or False, | ||
"arguments": {"key1": value1, "key2": value2}, | ||
"depends_on": [name-of-job1, name-of-job2] | ||
}, | ||
... | ||
] | ||
MetaJobInput().run_meta_job(jobs) | ||
``` | ||
|
||
When defining a job to be run following attributes are supported: | ||
* **job_name**: required, the name of the data job. | ||
* **team_name:**: optional, the team of the data job. If omitted , it will use the meta job's team. | ||
* **fail_meta_job_on_error**: optional, default is true. If true, the meta job will abort and fail if the orchestrated job fails, if false, meta job won't fail and continue. | ||
* **arguments**: optional, the arguments that are passed to the underlying orchestrated data job. | ||
* **depends_on**: required (can be empty), list of other jobs that the orchestrated job depends on. The job will not be started until depends_on job have finished. | ||
|
||
|
||
### Example | ||
|
||
The following example dependency graph can be implemented with below code. | ||
|
||
|
||
 | ||
|
||
In this example what happens is: | ||
* Job 1 will execute. | ||
* After Job 1 is completed, jobs 2,3,4 will start executing in parallel. | ||
* Jobs 5 and 6 will start executing after job 3 completes, but will not wait for the completion of jobs 2 and 4. | ||
|
||
|
||
```python | ||
|
||
from vdk.api.job_input import IJobInput | ||
from vdk.plugin.meta_jobs.meta_job_runner import MetaJobInput | ||
|
||
JOBS_RUN_ORDER = [ | ||
{ | ||
"job_name": "job1", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": [] | ||
}, | ||
|
||
{ | ||
"job_name": "job2", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": ["job1"] | ||
}, | ||
{ | ||
"job_name": "job3", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": ["job1"] | ||
}, | ||
{ | ||
"job_name": "job4", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": ["job1"] | ||
}, | ||
|
||
{ | ||
"job_name": "job5", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": ["job3"] | ||
}, | ||
{ | ||
"job_name": "job6", | ||
"team_name": "team-awesome", | ||
"fail_meta_job_on_error": True, | ||
"arguments": {}, | ||
"depends_on": ["job3"] | ||
}, | ||
] | ||
|
||
|
||
def run(job_input: IJobInput) - > None: | ||
MetaJobInput().run_meta_job(JOBS_RUN_ORDER) | ||
``` | ||
|
||
|
||
### Runtime sequencing | ||
|
||
The depends_on key stores the dependencies of each job - the jobs that have to finish before it starts. | ||
The DAG execution starts from the jobs with empty dependency lists - they start together in parallel. | ||
But what happens if they are too many? It could cause server overload. In order to avoid such unfortunate situations, | ||
a limit in the number of concurrent running jobs is set. This limit is | ||
a [configuration variable](https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_configuration.py#L87) | ||
that you are able to set according to your needs. When the limit is exceeded, the execution of the rest of the jobs | ||
is not cancelled but delayed until a spot is freed by one of the running jobs. What's important here is that | ||
although there are delayed jobs due to the limitation, the overall sequence is not broken. | ||
|
||
|
||
### Data Job start comparison | ||
|
||
There are 3 types of jobs right now in terms of how are they started. | ||
|
||
* Started by Schedule | ||
* When the time comes for a scheduled execution of a Data Job, if the one is currently running, it will be waited | ||
to finish by retrying a few times. If it is still running then, this scheduled execution will be skipped. | ||
* Started by the user using UI or CLI | ||
* If a user tries to start a job that is already running, one would get an appropriate error immediately and a | ||
recommendation to try again later. | ||
* **Started by a DAG Job** | ||
* If a DAG job tries to start a job and there is already running such job, the approach of the DAG job would be | ||
similar to the schedule - retry later but more times. | ||
|
||
### FAQ | ||
|
||
|
||
**Q: Will the metajob retry on Platform Error?**<br> | ||
A: Yes, as any other job, up to N (configurable by the Control Service) attempts for each job it is orchestrating. | ||
See Control Service documentation for more information | ||
|
||
**Q: If an orchestrated job fails, will the meta job fail?**<br> | ||
Only if fail_meta_job_on_error flag is set to True (which is teh default setting if omited) | ||
|
||
The meta job then will fail with USER error (regardless of how the orchestrated job failed) | ||
|
||
|
||
**Q: Am I able to run the metajob locally?**<br> | ||
A: Yes, but the jobs orchestrated must be deployed to the cloud (by the Control Service). | ||
|
||
**Q: Is there memory limit of the meta job?**<br> | ||
A: The normal per job limits apply for any jobs orchestrated/started by the meta job. | ||
|
||
**Q: Is there execution time limit of the meta job?**<br> | ||
A: Yes, the meta job must finish within the same limit as any normal data job. | ||
The total time of all data jobs started by the meta job must be less than the limit specified. | ||
The overall limit is controlled by Control Service administrators | ||
|
||
**Q: Is the metajob going to fail and not trigger the remaining jobs if any of the jobs it is orchestrating fails?**<br> | ||
A: This is configurable by the end user in the parameter fail_meta_job_on_error | ||
|
||
**Q: Can I schedule one job to run every hour and use it in the meta job at the same time?**<br> | ||
A: Yes, if the job is already running, the metajob will wait for the concurrent run to finish and then trigger the job again from the meta job, | ||
If the job is already running as part of the meta job, the concurrent scheduled run will be skipped | ||
|
||
|
||
### Build and testing | ||
|
||
``` | ||
pip install -r requirements.txt | ||
pip install -e . | ||
pytest | ||
``` | ||
|
||
In VDK repo [../build-plugin.sh](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/build-plugin.sh) script can be used also. | ||
|
||
|
||
#### Note about the CICD: | ||
|
||
.plugin-ci.yaml is needed only for plugins part of [Versatile Data Kit Plugin repo](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins). | ||
|
||
The CI/CD is separated in two stages, a build stage and a release stage. | ||
The build stage is made up of a few jobs, all which inherit from the same | ||
job configuration and only differ in the Python version they use. | ||
They run according to rules, which are ordered in a way such that changes to a | ||
plugin's directory trigger the plugin CI, but changes to a different plugin does not. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# this file is used to provide testing requirements | ||
# for requirements (dependencies) needed during installation update setup.py install_requires section | ||
|
||
|
||
pytest | ||
pytest-httpserver | ||
urllib3 | ||
vdk-control-api-auth | ||
vdk-control-service-api | ||
vdk-core | ||
vdk-plugin-control-cli | ||
vdk-test-utils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
import pathlib | ||
|
||
import setuptools | ||
|
||
""" | ||
Builds a package with the help of setuptools in order for this package to be imported in other projects | ||
""" | ||
|
||
__version__ = "0.1.0" | ||
|
||
setuptools.setup( | ||
name="vdk-meta-jobs", | ||
version=__version__, | ||
url="https://github.com/vmware/versatile-data-kit", | ||
description="Express dependecies between data jobs.", | ||
long_description=pathlib.Path("README.md").read_text(), | ||
long_description_content_type="text/markdown", | ||
install_requires=[ | ||
"vdk-core", | ||
"graphlib-backport", | ||
"vdk-control-api-auth", | ||
"vdk-plugin-control-cli", | ||
"vdk-control-service-api", | ||
"urllib3", | ||
], | ||
package_dir={"": "src"}, | ||
packages=setuptools.find_namespace_packages(where="src"), | ||
# This is the only vdk plugin specifc part | ||
# Define entry point called "vdk.plugin.run" with name of plugin and module to act as entry point. | ||
entry_points={"vdk.plugin.run": ["meta-jobs = vdk.plugin.meta_jobs.dags_plugin"]}, | ||
classifiers=[ | ||
"Development Status :: 2 - Pre-Alpha", | ||
"License :: OSI Approved :: Apache Software License", | ||
"Programming Language :: Python :: 3.7", | ||
"Programming Language :: Python :: 3.8", | ||
"Programming Language :: Python :: 3.9", | ||
"Programming Language :: Python :: 3.10", | ||
"Programming Language :: Python :: 3.11", | ||
], | ||
) |
53 changes: 53 additions & 0 deletions
53
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/api/meta_job.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Copyright 2021-2023 VMware, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
import abc | ||
from abc import abstractmethod | ||
from dataclasses import dataclass | ||
from dataclasses import field | ||
from typing import List | ||
|
||
|
||
@dataclass | ||
class SingleJob: | ||
""" | ||
This class represents a single job to be executed. | ||
:param job_name: the name of the job. | ||
:param team_name: the name of the team that owns the job. | ||
:param fail_dag_on_error: boolean flag indicating whether the job should be executed. | ||
:param arguments: JSON-serializable dictionary of arguments to be passed to the job. | ||
:param depends_on: list of names of jobs that this job depends on. | ||
""" | ||
|
||
job_name: str | ||
team_name: str = None | ||
fail_meta_job_on_error: bool = True | ||
arguments: dict = None | ||
depends_on: List[str] = field(default_factory=list) | ||
|
||
|
||
@dataclass | ||
class MetaJob(SingleJob): | ||
""" | ||
This class represents a DAG Job, which is a single job itself and consists of single jobs - the orchestrated ones. | ||
:param jobs: list of the orchestrated jobs | ||
""" | ||
|
||
jobs: List[SingleJob] = field(default_factory=list) | ||
|
||
|
||
class IMetaJobInput(abc.ABC): | ||
""" | ||
This class is responsible for the DAG job run. | ||
""" | ||
|
||
@abstractmethod | ||
def run_meta_job(self, meta_job: MetaJob): | ||
""" | ||
Runs the given DAG job. | ||
:param dag: the DAG job to be run | ||
:return: | ||
""" | ||
pass |
Oops, something went wrong.