Skip to content

Commit

Permalink
added more
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Imberman authored and GitHub Enterprise committed Mar 16, 2018
1 parent 951d500 commit 86881c1
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions airflow-kube-operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,17 @@

Today, we are excited to announce a Kubernetes Operator to increase Apache Airflow's viability as a job orchestration engine using the power of the Kubernetes cloud deployment framework.

Since its inception, Airflow's power as a job orchestrator has been its flexibility. It offers a wide range of native operators (for services such as Spark, HBase, etc.), while also offering easy extensibility through its plugin framework. However, one limitation of the project is that...
Since its inception, Airflow's greatest strength has been its flexibility. Airflow offers a wide range of native operators for services ranging from Spark and HBase, to GCP and s3. Airflow also offers easy extensibility through its plugin framework. However, one limitation of the project is that Airflow users are confined to the frameworks and clients that exist on the Airflow worker at the moment of execution. If a user wishes to use a different version of scipy or test a new deep learning framework, they would need to either launch a new airflow cluster or risk conflicting with the dependencies of other users' workflows.

Over the next few months, we will be offering a series of Kubernetes-based offerings that will vastly expand Airflow's native capabilities. With the addition of a Kubernetes Operator, users will be able to launch arbitrary Docker containers with customizable resources, secrets, and...
To address this issue, we've utilized kubernetes to allow users to launch arbitrary docker containers and configurations. Airflow users can now have full power over their run-time environments, resources, and secrets, basically turning airflow into an "any job you want" scheduler.

## What is Kubernetes?

[Kubernetes](https://kubernetes.io/) is an open-source container deployment engine released by Google. Based on google's own
[Borg](http://blog.kubernetes.io/2015/04/borg-predecessor-to-kubernetes.html), kubernetes allows for easy deployment of images using a highly flexible API. Using kubernetes you can [deploy spark jobs](https://github.com/apache-spark-on-k8s/spark), launch end-to-end applications, or ... using yaml files, python, golang, or java bindings. The kubernetes API's programatic launching of containers seemed a perfect marriage with Airflow's "code as configuration" philosophy.
Before we go any further, let's take a moment for a quick overview of Kubernetes. [Kubernetes](https://kubernetes.io/) is an open-source container deployment engine released by Google. Based on Google's [Borg](http://blog.kubernetes.io/2015/04/borg-predecessor-to-kubernetes.html), kubernetes allows for easy deployment of images using a highly flexible API. Using kubernetes you can [deploy spark jobs](https://github.com/apache-spark-on-k8s/spark), launch end-to-end applications, or create multi-framework ETL pipelines using yaml, json, python, golang, or java bindings. The kubernetes API's programatic launching of containers seemed a perfect marriage with Airflow's "code as configuration" philosophy.

## The Kubernetes Operator

As DevOps pioneers, Airflow is always looking for ways to make deployments and ETL pipelines simpler to manage. Any opportunity to reduce the number of moving parts in our codebase will always lead to future opportunities to break in the future. The following is a list of benefits the Kubernetes Operator has in reducing the Airflow Engineer's footprint
As DevOps pioneers, Airflow users are always looking for ways to make deployments and ETL pipelines simpler to manage. Any opportunity to decouple our pipeline steps while increasing monitoring can reduce future outages and fire-fights. The following is a list of benefits the Kubernetes Operator has in reducing the Airflow Engineer's footprint
* **Increased flexibility for deployments:**
Airflow's plugin API has always offered a significant boon to engineers wishing to test new functionalities within their DAGS. On the downside, whenever a developer wanted to create a new operator, they had to develop an entirely new plugin. Now, any task that can be run within a Docker container is accessible through the exact same operator, with no extra Airflow code to maintain.
* **Flexibility of configurations and dependencies:**
Expand Down

0 comments on commit 86881c1

Please sign in to comment.