Skip to content

Latest commit

 

History

History
96 lines (92 loc) · 2.46 KB

README.md

File metadata and controls

96 lines (92 loc) · 2.46 KB

docker-spark

This is my docker setup for deploying spark (tailored towards pyspark) on kubernetes, based on other peeps' existing work

Notice that things like port mappings are delegated to the orchestration tool. As a result, the template for master pod should look something like

{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "{{USER_NAME}}-spark",
    "labels": {
      "name": "{{USER_NAME}}-spark-master",
      "owner": "{{USER_NAME}}"
    }
  },
  "spec": {
    "containers": [
      {
        "name": "{{USER_NAME}}-spark",
        "image": "guangyang/docker-spark:latest",
        "command": ["/bin/bash", "-c", "/start-master.sh {{USER_NAME}}-spark"],
        "env": [
          {
            "name": "SPARK_MASTER_PORT",
            "value": "7077"
          },
          {
            "name": "SPARK_MASTER_WEBUI_PORT",
            "value": "8080"
          }
        ],
        "ports": [
          {
            "containerPort": 7077,
            "protocol": "TCP"
          },
          {
            "containerPort": 8080,
            "protocol": "TCP"
          }
        ]
      }
    ]
  }
}

while the worker replication controller should look something like

{
  "kind": "ReplicationController",
  "apiVersion": "v1",
  "metadata": {
    "name": "{{USER_NAME}}-spark-worker-controller",
    "labels": {
      "name": "{{USER_NAME}}-spark-worker",
          "owner": "{{USER_NAME}}"
    }
  },
  "spec": {
    "replicas": 2,
    "selector": {
      "name": "{{USER_NAME}}-spark-worker"
    },
    "template": {
      "metadata": {
        "labels": {
          "name": "{{USER_NAME}}-spark-worker",
          "uses": "{{USER_NAME}}-spark",
          "owner": "{{USER_NAME}}"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "{{USER_NAME}}-spark-worker",
            "image": "guangyang/docker-spark:latest",
            "command": ["/bin/bash", "-c", "/start-worker.sh spark://{{USER_NAME}}-spark:7077"],
            "ports": [
              {
                "hostPort": 8888,
                "containerPort": 8888
              }
            ]
          }
        ]
      }
    }
  }
}

where {{USER_NAME}} should be replaced with a dns-friendly name of your choosing

Love to hear feedbacks! To get started with spark on kubernetes, check out kube's doc on spark