Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-meta-jobs: Initial implementation #1249

Merged
merged 6 commits into from
Nov 3, 2022

Conversation

antoniivanov
Copy link
Collaborator

@antoniivanov antoniivanov commented Oct 19, 2022

This is a pre-Alpha implementation of Meta Jobs functionality.

One crucial feature which has been missing from VDK is the ability to
express workload (data job) interdependencies, meaning that a certain
Data Job wil only trigger upon the successful completion (or other
criteria) of one or more other Data Jobs.

Integration with Apache Airflow was developed to address this,
however, this comes with its downsides: namely that teams would have to
manage their own instance of Airflow, or pay for an externally managed
service.

The aim of this submission is to build a plugin for Versatile Data Kit
which extends its API with an additional feature that allows users to
trigger so-called Meta Jobs - jobs that trigger one or more other jobs,
wait for their completion, and then trigger another set of jobs until
either the entire job pipeline has succeeded, or at least one of the
jobs has failed.

#1243

Testing Done: functional tests

Signed-off-by: Antoni Ivanov [email protected]

@antoniivanov antoniivanov force-pushed the person/aivanov/vdk-meta-jobs branch 4 times, most recently from e7211e4 to 8084fe6 Compare October 24, 2022 09:41
@antoniivanov antoniivanov force-pushed the person/aivanov/vdk-meta-jobs branch from 23de03d to 9851cb7 Compare November 1, 2022 21:10
@antoniivanov antoniivanov force-pushed the person/aivanov/vdk-meta-jobs branch 2 times, most recently from ae9a862 to 99cbae4 Compare November 2, 2022 10:18
This is an pre-Alpha implementation of Meta Jobs functioanlity

One crucial feature which has been missing from VDK is the ability to
express workload (data job) interdependencies, meaning that a certain
Data Job wil only trigger upon the successful completion (or other
criteria) of one or more other Data Jobs.

An integration with Apache Airflow was developed to address this,
however this comes with its downsides: namely that teams would have to
manage their own instance of Airflow, or pay for an externally managed
service.

The aim of this submission is to build a plugin for Versatile Data Kit
which extends its API with an additional feature which allows users to
trigger so called Meta Jobs - jobs which trigger one or more other jobs,
wait for their completion, and then trigger another set of jobs until
either the entire job pipeline has succeeded, or at least one of the
jobs has failed.

#1243

Signed-off-by: Antoni Ivanov <[email protected]>
Signed-off-by: Antoni Ivanov <[email protected]>
Signed-off-by: Antoni Ivanov <[email protected]>
The status calls were basically done in busy loop while jobs are
running. This would be hammering the database which is not a good idea
:)

Signed-off-by: Antoni Ivanov <[email protected]>
@antoniivanov antoniivanov force-pushed the person/aivanov/vdk-meta-jobs branch from 99cbae4 to 05dee0a Compare November 3, 2022 07:23
@antoniivanov antoniivanov enabled auto-merge (squash) November 3, 2022 10:20
@antoniivanov antoniivanov merged commit 1ccde40 into main Nov 3, 2022
@antoniivanov antoniivanov deleted the person/aivanov/vdk-meta-jobs branch November 3, 2022 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants