diff --git a/specs/vep-1739-multiple-python-versions/README.md b/specs/vep-1739-multiple-python-versions/README.md new file mode 100644 index 0000000000..bba207f4e1 --- /dev/null +++ b/specs/vep-1739-multiple-python-versions/README.md @@ -0,0 +1,185 @@ + +# VEP-1739: Multiple Python Versions + +* **Author(s):** Andon Andonov (andonova@vmware.com) +* **Status:** draft + + + +- [Summary](#summary) +- [Glossary](#glossary) +- [Motivation](#motivation) +- [Requirements and goals](#requirements-and-goals) +- [High-level design](#high-level-design) +- [API Design](#api-design) +- [Detailed design](#detailed-design) +- [Implementation stories](#implementation-stories) +- [Alternatives](#alternatives) + +## Summary + +--- +Currently, when a data job is deployed, the Control Service uses vdk and data job base images set once and applied to +all job deployments. If a data engineer decides they want to use a different python version for their job, they need to +ask their infrastructure administrator, or whoever is responsible for the Control Service deployment, to change the +configuration of the service and re-deploy it to allow for data jobs with a different python version to be deployed. +This, however, would break existing deployed data jobs, as the moment they are re-deployed, the new python version would +be applied with unforeseeable consequences. + +We want to allow users to deploy data jobs with different python versions without needing to re-deploy the Control +Service. To do this, we will extend the Control Service logic and API to support multiple python versions for data job +deployments. + +## Glossary + +--- +* VDK: https://github.com/vmware/versatile-data-kit/wiki/dictionary#vdk +* Control Service: https://github.com/vmware/versatile-data-kit/wiki/dictionary#control-service +* Data Job: https://github.com/vmware/versatile-data-kit/wiki/dictionary#data-job +* Data Job Deployment: https://github.com/vmware/versatile-data-kit/wiki/dictionary#data-job-deployment +* Kubernetes: https://kubernetes.io/ + +## Motivation + +--- +As mentioned in the [Summary](#Summary) section above, the vdk and data job base images are set per Control Service +deployment. This is not an issue in general, as it is assumed that the Versatile Data Kit administrators, responsible +for the Control Service deployment, have taken into account the data engineers' tech stack. + +There are, however, situations when this might not be the case. For example, if the administrators of a Versatile Data +Kit deployment decide to keep an older python version (say 3.8) for all data job deployments, but a data engineer working +on a special use case needs to use a dependency that does not support anything below python 3.10, they would not be able +to deploy a data job, because the job would be build with python 3.8. To accommodate the special job, the administrators +would need to re-configure and re-deploy the Control Service. Although this may not be a big issue (not taking into +account the hassle of redeploying the whole Control Service for just one special job), it would break all jobs whose +dependencies rely on python versions older than 3.10, as once set to 3.10, the Control Service will deploy all data jobs +with it. + +In such cases, there are two main approaches that could be taken: +1) Look for a different package -- this works in most cases, as there are often multiple packages that solve the same +problems and are build for different python versions. However, depending on how specialized the problem at hand is, there +might be no alternatives to a package, or there might be necessary to use an older version of the package which could +expose the data job to vulnerabilities patched in the newer package releases. +2) Deploy a separate Control Service instance -- with this solution, a new instance of the Control Service would need to +deployed and configured to use the newer python version. In addition, the vdk SDK would also need to be reconfigured to +point to the new Control Service instance, which may cause more confusion among other engineers who may not be aware that +there are multiple Control Services and SDKs with different configurations. This may be acceptable if the specialized +data job is with high priority, but having a completely separate Control Service instance for a single job deployment is +unreasonable. + +To avoid situations, where old or unsafe dependencies are used, or to avoid the necessity to deploy separate Control +Service deployments, changes will be made to the Control Service API and deployment logic to allow for different python +versions to be used per data job deployment. Additionally, minor changes will be made to the vdk-control-cli plugin to +facilitate the selection of python version at job deployment. + +## Requirements and goals + +--- +### Goals +* **Change API to accommodate passing the python version to be used in data job deployments** + * A data engineer developing a data job wants to use specific python version for their job deployment. They need to be + able to specify what python version they want to use in the config.ini file of the job, or as part of the body of the + job deployment request in case they use the Control Service API directly, and not through the vdk SDK. +* **Introduce mechanism to configure what python versions are supported by the Control Service** + * An administrator need to be able to configure what python versions are supported bby a Control Service deployment, + and what vdk and job base images correspond to a certain python version. +* **Save python version configuration in the Control Service's database.** + * The python version configuration related to a specific data job deployment needs to be stored in the database + alongside the rest of the data job's deployment configuration +* **Add python version used for a job's deployment to the job's cronjob spec.** + * The python version used for a data job's deployment needs to be added as annotation to the job's cronjob spec. +* **Update vdk-control-cli plugin to allow it to read the python version from config.ini** + * When a data engineer creates a data job and updates the job's config.ini file to set a specific python version to + used when the job is deployed, the python version needs to be read from the config file and passed to the Control + Service. +### Non-Goals +* **Extensive error handling.** + * Some basic error handling will be added to avoid common issues with mismatched python versions, etc. However, at + this stage is not possible to foresee all corner cases that may arise, so extensive error handling would not be added + as part of this initiative. +* **Python version validation at SDK level.** + * As there will be python version validation at the Control Service level, such validation will not be added at the + vdk SDK level. + +## High-level design + +--- +![high_level_design.png](diagrams/high_level_design.png) + +The proposed design will introduce changes to the Control Service API and deployment logic, as well as to the database configuration and vdk SDK. Additionally, it will allow data engineers to specify what python version their job needs to be deployed with as part of the job's config.ini file. + +Once set, the python version will be passed from the config.ini to the Control Service through the vdk SDK, or it could be passed directly as part of the deployment request body in case the Control Service API is called directly. If no python version is passed to the Control Service, a predefined default version will be used. + +## API design + + + + +## Detailed design + + + +## Implementation stories + + +## Alternatives + diff --git a/specs/vep-1739-multiple-python-versions/diagrams/high_level_design.png b/specs/vep-1739-multiple-python-versions/diagrams/high_level_design.png new file mode 100644 index 0000000000..189ce67f03 Binary files /dev/null and b/specs/vep-1739-multiple-python-versions/diagrams/high_level_design.png differ diff --git a/specs/vep-1739-multiple-python-versions/diagrams/high_level_design_plantUML_source.txt b/specs/vep-1739-multiple-python-versions/diagrams/high_level_design_plantUML_source.txt new file mode 100644 index 0000000000..fe258e46b9 --- /dev/null +++ b/specs/vep-1739-multiple-python-versions/diagrams/high_level_design_plantUML_source.txt @@ -0,0 +1,27 @@ +@startuml +!include +!include +!include +!include + +caption Figure 1: High-level Design + +User(engineer, "Data\nEngineer", " ") +ElasticContainerRegistry(ecr, "Image\nRegistry", " ") + + +rectangle "K8s Cluster" { + component "Control Service" as cs + rectangle " Data Jobs\nBuilders/Deployments\n Namespace" as djn + cs - djn +} + +rectangle "<$file>\nData Job" as data_job + + +engineer -- data_job : Set python_version in config.ini +data_job -- cs : Deploy data job + +cs -- ecr : Pull data job base and \nvdk images based on \nprovided python_version + +@enduml diff --git a/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence.png b/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence.png new file mode 100644 index 0000000000..ed15c86724 Binary files /dev/null and b/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence.png differ diff --git a/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence_plantUML_source.txt b/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence_plantUML_source.txt new file mode 100644 index 0000000000..472eb1cb2a --- /dev/null +++ b/specs/vep-1739-multiple-python-versions/diagrams/high_level_sequence_plantUML_source.txt @@ -0,0 +1,18 @@ +@startuml +actor DataEngineer as engineer +participant ControlService as cs +participant K8sNamespace_DataJobBuilders as namespace +database Database as db +database Registry as ecr + +engineer -> cs : vdk create +cs -> db : Register new data job +cs -> engineer : Download sample data job +engineer -> engineer : Set python_version in the data job's config.ini +engineer -> cs : vdk deploy +cs -> cs : Read Deployment data +cs -> db : Update data job configuration data +cs -> ecr : Pull data job base image +cs -> ecr : Pull vdk image +cs -> namespace : Start data job builder +@enduml