Skip to content

Commit

Permalink
chaim fixes 1
Browse files Browse the repository at this point in the history
  • Loading branch information
dimberman committed Feb 8, 2018
1 parent 25bff2e commit 07a025e
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 78 deletions.
75 changes: 29 additions & 46 deletions .ipynb_checkpoints/Airflow Kubernetes Operator-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Kubernetes \"Whatever-your-heart-desires\" Operator"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -12,18 +19,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Today, we are excited to announce a Kubernetes Operator to increase airflow's viability as a job orchestration engine using the power of the Kubernetes cloud deployment framework. \n",
"Today, we are excited to announce a Kubernetes Operator to increase Apache Airflow's viability as a job orchestration engine using the power of the Kubernetes cloud deployment framework. \n",
"\n",
"Since its inception, airflow's power as a job orchestrator has been its flexibility. Airflow offers a wide range of native operators (for services such as spark, hbase, etc.) while also offering easy extensibility through its plugin framework. However, one limitation of the Apache Airflow project is that...\n",
"Since its inception, Airflow's power as a job orchestrator has been its flexibility. It offers a wide range of native operators (for services such as Spark, HBase, etc.), while also offering easy extensibility through its plugin framework. However, one limitation of the project is that...\n",
"\n",
"Over the next few months, we will be offering a series of kubernetes-based offerings that will vastly expand airflows' native capabilities. With the addition of a Kubernetes Operator, users will be able to launch arbitrary docker containers with customizable resources, secrets, and..."
"Over the next few months, we will be offering a series of Kubernetes-based offerings that will vastly expand Airflow's native capabilities. With the addition of a Kubernetes Operator, users will be able to launch arbitrary Docker containers with customizable resources, secrets, and..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is kubernetes?"
"## What is Kubernetes?"
]
},
{
Expand All @@ -38,50 +45,46 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Kubernetes Operator: The \"Whatever-your-heart-desires\" Operator\n",
"# The Kubernetes Operator\n",
"\n",
"As DevOps pioneers, we are always looking for ways to make our deployments and ETL pipelines simpler to manage. Any opportunity to reduce the number of moving parts in our codebase will always lead to future opportunities to break in the future. The following are a list of benefits the Kubernetes Operator in reducing the Airflow Engineer's footprint\n",
"\n",
"* **Increased flexibility for deployments** Airflow's plugin API has always offered a significant boon to engineers wishing to test new functionalities within their DAGS, however it has always had the downside that to create a new operator, one must develop an entirely new plugin. Now any task that can be run within a docker container is accessible through the same same operator with no extra airflow code to maintain.\n",
"* **Flexibility of configurations and dependencies** For operators that are run within static airflow workers, dependency management can become quite difficult. If I want to run one task that requires scipy, and another that requires numpy, I have to either maintain both dependencies within my airflow worker, or somehow configure\n",
"* **Usage of kubernetes secrets for added security** Handling sensitive data is a core responsibility of any devops engineer. At every opportunity, we want to minimize any API keys, database passwords, or ... to a strict need-to-know basis. With the kubernetes operator, we can use the kubernetes Vault technology to store all sensitive data. This means that the airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need"
"As DevOps pioneers, Airflow is always looking for ways to make deployments and ETL pipelines simpler to manage. Any opportunity to reduce the number of moving parts in our codebase will always lead to future opportunities to break in the future. The following is a list of benefits the Kubernetes Operator has in reducing the Airflow Engineer's footprint\n",
"* **Increased flexibility for deployments:** Airflow's plugin API has always offered a significant boon to engineers wishing to test new functionalities within their DAGS. On the downside, whenever a developer wanted to create a new operator, they had to develop an entirely new plugin. Now, any task that can be run within a Docker container is accessible through the exact same operator, with no extra Airflow code to maintain.\n",
"* **Flexibility of configurations and dependencies:** For operators that are run within static Airflow workers, dependency management can become quite difficult. If I want to run one task that requires [SciPy[(https://www.scipy.org) and another that requires [NumPy](http://www.numpy.org), the developer would have to either maintain both dependencies within an Airflow worker or somehow configure ??\n",
"* **Usage of kubernetes secrets for added security:** Handling sensitive data is a core responsibility of any devops engineer. At every opportunity, airflow users want to minimize any API keys, database passwords, and login credentials to a strict need-to-know basis. With the kubernetes operator, users can utilize the kubernetes Vault technology to store all sensitive data. This means that the airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"collapsed": true
},
"source": [
"## Example 1: Running a basic container"
"# Architecture\n",
"\n",
"<img src=\"architecture.png\">\n",
"\n",
"The Kubernetes Operator uses the [Kubernetes Python Client](https://github.com/kubernetes-client/python) to generate a request that is processed by the APIServer (1). Kubernetes will then launch your pod with whatever specs you've defined (2). Images will be loaded with all the necessary environment variables, secrets and dependencies, enacting a single command. Once the job is launched, the operator only needs to monitor the health of track logs (3). Users will have the choice of gathering logs locally to the scheduler or to any distributed logging service currently in their Kubernetes cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this first example, let's create a basic docker image that runs simple python commands. This example will only have two end-results: Succeed or fail."
"# Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run this code, we'll create a DockerFile and an `entrypoint.sh` file that will run our basic python script (while this entrypoint script is pretty simple at the moment, we will later see how it can expand to more complex use-cases). With the following [Dockerfile](https://github.com/dimberman/airflow-kubernetes-operator-example/blob/master/Dockerfile) and [entrypoint](https://github.com/dimberman/airflow-kubernetes-operator-example/blob/master/entrypoint.sh) We have a working image to test!"
"## Example 1: Running a basic container"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integrating into Airflow\n",
"\n",
"To test our new image, we create a simple airflow DAG that will run two steps: passing and failing"
"In this first example, let's create a basic Docker image that runs simple Python commands. This example will only have two end-results: succeed or fail"
]
},
{
Expand Down Expand Up @@ -145,7 +148,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This will create two pods on kubernetes: one that has python and one that doesn't. The python pod will run the python request correctly, while the one without python will report a failure to the user.\n",
"This will create two pods on Kubernetes: one that has Python and one that doesn't. The Python pod will run the Python request correctly, while the one without Python will report a failure to the user.\n",
"\n",
"<img src=\"files/image.png\">"
]
Expand All @@ -161,27 +164,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2: Running a model using scipy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Architecture"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<img src=\"architecture.png\">\n",
"\n",
"1. The kubernetes operator uses the [Kubernetes Python Client](https://github.com/kubernetes-client/python) to generate a request that is processed by the APIServer\n",
"2. Kubernetes will then launch your pod with whatever specs you wish. Images will be loaded with all necessary environment varialbes, secrets, dependencies, and enact a single command.\n",
"3. Once the job is launched, the only things that the operator needs to do are monitor the health of track logs. Users will have the choice of gathering logs locally to the scheduler or any distributed logging service currently in their kubernetes cluster"
"## Example 2: Running a model using SciPy"
]
},
{
Expand Down
52 changes: 20 additions & 32 deletions Airflow Kubernetes Operator.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Kubernetes \"Whatever-your-heart-desires\" Operator"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -12,18 +19,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Today, we are excited to announce a Kubernetes Operator to increase airflow's viability as a job orchestration engine using the power of the Kubernetes cloud deployment framework. \n",
"Today, we are excited to announce a Kubernetes Operator to increase Apache Airflow's viability as a job orchestration engine using the power of the Kubernetes cloud deployment framework. \n",
"\n",
"Since its inception, airflow's power as a job orchestrator has been its flexibility. Airflow offers a wide range of native operators (for services such as spark, hbase, etc.) while also offering easy extensibility through its plugin framework. However, one limitation of the Apache Airflow project is that...\n",
"Since its inception, Airflow's power as a job orchestrator has been its flexibility. It offers a wide range of native operators (for services such as Spark, HBase, etc.), while also offering easy extensibility through its plugin framework. However, one limitation of the project is that...\n",
"\n",
"Over the next few months, we will be offering a series of kubernetes-based offerings that will vastly expand airflows' native capabilities. With the addition of a Kubernetes Operator, users will be able to launch arbitrary docker containers with customizable resources, secrets, and..."
"Over the next few months, we will be offering a series of Kubernetes-based offerings that will vastly expand Airflow's native capabilities. With the addition of a Kubernetes Operator, users will be able to launch arbitrary Docker containers with customizable resources, secrets, and..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is kubernetes?"
"## What is Kubernetes?"
]
},
{
Expand All @@ -38,13 +45,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Kubernetes Operator: The \"Whatever-your-heart-desires\" Operator\n",
"\n",
"As DevOps pioneers, we are always looking for ways to make our deployments and ETL pipelines simpler to manage. Any opportunity to reduce the number of moving parts in our codebase will always lead to future opportunities to break in the future. The following are a list of benefits the Kubernetes Operator in reducing the Airflow Engineer's footprint\n",
"# The Kubernetes Operator\n",
"\n",
"* **Increased flexibility for deployments** Airflow's plugin API has always offered a significant boon to engineers wishing to test new functionalities within their DAGS, however it has always had the downside that to create a new operator, one must develop an entirely new plugin. Now any task that can be run within a docker container is accessible through the same same operator with no extra airflow code to maintain.\n",
"* **Flexibility of configurations and dependencies** For operators that are run within static airflow workers, dependency management can become quite difficult. If I want to run one task that requires scipy, and another that requires numpy, I have to either maintain both dependencies within my airflow worker, or somehow configure\n",
"* **Usage of kubernetes secrets for added security** Handling sensitive data is a core responsibility of any devops engineer. At every opportunity, airflow users want to minimize any API keys, database passwords, and login credentials to a strict need-to-know basis. With the kubernetes operator, users can utilize the kubernetes Vault technology to store all sensitive data. This means that the airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need"
"As DevOps pioneers, Airflow is always looking for ways to make deployments and ETL pipelines simpler to manage. Any opportunity to reduce the number of moving parts in our codebase will always lead to future opportunities to break in the future. The following is a list of benefits the Kubernetes Operator has in reducing the Airflow Engineer's footprint\n",
"* **Increased flexibility for deployments:** Airflow's plugin API has always offered a significant boon to engineers wishing to test new functionalities within their DAGS. On the downside, whenever a developer wanted to create a new operator, they had to develop an entirely new plugin. Now, any task that can be run within a Docker container is accessible through the exact same operator, with no extra Airflow code to maintain.\n",
"* **Flexibility of configurations and dependencies:** For operators that are run within static Airflow workers, dependency management can become quite difficult. If I want to run one task that requires [SciPy[(https://www.scipy.org) and another that requires [NumPy](http://www.numpy.org), the developer would have to either maintain both dependencies within an Airflow worker or somehow configure ??\n",
"* **Usage of kubernetes secrets for added security:** Handling sensitive data is a core responsibility of any devops engineer. At every opportunity, airflow users want to minimize any API keys, database passwords, and login credentials to a strict need-to-know basis. With the kubernetes operator, users can utilize the kubernetes Vault technology to store all sensitive data. This means that the airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need"
]
},
{
Expand All @@ -57,9 +63,7 @@
"\n",
"<img src=\"architecture.png\">\n",
"\n",
"1. The kubernetes operator uses the [Kubernetes Python Client](https://github.com/kubernetes-client/python) to generate a request that is processed by the APIServer\n",
"2. Kubernetes will then launch your pod with whatever specs you wish. Images will be loaded with all necessary environment varialbes, secrets, dependencies, and enact a single command.\n",
"3. Once the job is launched, the only things that the operator needs to do are monitor the health of track logs. Users will have the choice of gathering logs locally to the scheduler or any distributed logging service currently in their kubernetes cluster"
"The Kubernetes Operator uses the [Kubernetes Python Client](https://github.com/kubernetes-client/python) to generate a request that is processed by the APIServer (1). Kubernetes will then launch your pod with whatever specs you've defined (2). Images will be loaded with all the necessary environment variables, secrets and dependencies, enacting a single command. Once the job is launched, the operator only needs to monitor the health of track logs (3). Users will have the choice of gathering logs locally to the scheduler or to any distributed logging service currently in their Kubernetes cluster"
]
},
{
Expand All @@ -80,23 +84,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this first example, let's create a basic docker image that runs simple python commands. This example will only have two end-results: Succeed or fail."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run this code, we'll create a DockerFile and an `entrypoint.sh` file that will run our basic python script (while this entrypoint script is pretty simple at the moment, we will later see how it can expand to more complex use-cases). With the following [Dockerfile](https://github.com/dimberman/airflow-kubernetes-operator-example/blob/master/Dockerfile) and [entrypoint](https://github.com/dimberman/airflow-kubernetes-operator-example/blob/master/entrypoint.sh) We have a working image to test!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integrating into Airflow\n",
"\n",
"To test our new image, we create a simple airflow DAG that will run two steps: passing and failing"
"In this first example, let's create a basic Docker image that runs simple Python commands. This example will only have two end-results: succeed or fail"
]
},
{
Expand Down Expand Up @@ -160,7 +148,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This will create two pods on kubernetes: one that has python and one that doesn't. The python pod will run the python request correctly, while the one without python will report a failure to the user.\n",
"This will create two pods on Kubernetes: one that has Python and one that doesn't. The Python pod will run the Python request correctly, while the one without Python will report a failure to the user.\n",
"\n",
"<img src=\"files/image.png\">"
]
Expand All @@ -176,7 +164,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2: Running a model using scipy"
"## Example 2: Running a model using SciPy"
]
},
{
Expand Down

0 comments on commit 07a025e

Please sign in to comment.