Skip to content
This repository has been archived by the owner on Jan 28, 2022. It is now read-only.

Commit

Permalink
Getting Starting Documentation Improvements (#28)
Browse files Browse the repository at this point in the history
* add project experimental note

simplify namespace creation docs

* add the 'a' to yaml for consistency

* add comment on what the event hubs line is doing

* fix the sample notebook

* consistent install experience in doco

* DOCS: download latest release
  • Loading branch information
xtellurian authored and Azadehkhojandi committed Jun 3, 2019
1 parent 329bce9 commit 2897409
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 27 deletions.
55 changes: 29 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

# Azure Databricks operator (for Kubernetes)

> This project is experimental. Expect the API to change. It is not recommended for production environments.
## Introduction

Kubernetes offers the facility of extending it's API through the concept of 'Operators' ([Introducing Operators: Putting Operational Knowledge into Software](https://coreos.com/blog/introducing-operators.html)). This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.
Expand Down Expand Up @@ -55,16 +57,15 @@ Basic commands to check your cluster

1. Download [latest release.zip](https://github.com/microsoft/azure-databricks-operator/releases)

```sh
wget https://github.com/microsoft/azure-databricks-operator/releases/latest/download/release.zip
unzip release.zip
```

2. Create the `databricks-operator-system` namespace

```yaml
apiVersion: v1
kind: Namespace
metadata:
labels:
control-plane: controller-manager
controller-tools.k8s.io: "1.0"
name: databricks-operator-system
```sh
kubectl create namespace databricks-operator-system
```

3. [Generate a databricks token](https://docs.databricks.com/api/latest/authentication.html#generate-a-token), and create Kubernetes secrets with values for `DATABRICKS_HOST` and `DATABRICKS_TOKEN`
Expand All @@ -76,7 +77,7 @@ metadata:
4. Apply the manifests for the CRD and Operator in `release/config`:

```sh
kubectl apply -f release/config
kubectl apply -f release/config
```

5. Create a test secret, you can pass the value of Kubernetes secrets into your notebook as Databricks secrets
Expand All @@ -85,15 +86,14 @@ kubectl apply -f release/config
6. In Databricks, [create a new Python Notebook](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#create-a-notebook) called `testnotebook` in the root of your [Workspace](https://docs.databricks.com/user-guide/workspace.html#folders). Put the following in the first cell of the notebook:

```py
dbutils.widgets.text("secret_scope", "", "value from CRD")
secret_scope = dbutils.widgets.get("secret_scope")
run_name = dbutils.widgets.get("run_name")
secret_scope = run_name + "_scope"

secret_value = dbutils.secrets.get(scope=secret_scope, key="dbricks_secret_key") # this will come from a kubernetes secret
print(secret_value) # will be redacted

dbutils.widgets.text("flag", "", "value from CRD")
value = dbutils.widgets.get("flag")
print(value) # 'true'
```

7. Define your Notebook job and apply it
Expand All @@ -112,32 +112,35 @@ spec:
notebookSpec:
"flag": "true"
notebookSpecSecrets:
- secretName: "test"
mapping:
- "secretKey": "my_secret_key"
"outputKey": "dbricks_secret_key"
- secretName: "test"
mapping:
- "secretKey": "my_secret_key"
"outputKey": "dbricks_secret_key"
notebookAdditionalLibraries:
- type: "maven"
properties:
coordinates: "com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.9"
coordinates: "com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.9" # installs the azure event hubs library
clusterSpec:
sparkVersion: "5.2.x-scala2.11"
nodeTypeId: "Standard_DS12_v2"
numWorkers: 1
```
8. Basic commands to check the new Notebookjob
8. Check the NotebookJob and Operator pod
```shell
kubectl get crd
kubectl -n databricks-operator-system get svc
kubectl -n databricks-operator-system get pod
kubectl -n databricks-operator-system describe pod databricks-operator-controller-manager-0
kubectl -n databricks-operator-system logs databricks-operator-controller-manager-0 -c dbricks -f
```sh
# list all notebook jobs
kubectl get notebookjob
kubectl describe notebookjob kubectl samplejob1
# describe a notebook job
kubectl describe notebookjob samplejob1
# describe the operator pod
kubectl -n databricks-operator-system describe pod databricks-operator-controller-manager-0
# get logs from the manager container
kubectl -n databricks-operator-system logs databricks-operator-controller-manager-0 -c dbricks
```

9. Check the job ran with expected output in the Databricks UI.

### Run Souce Code

1. Clone the repo
Expand Down
2 changes: 1 addition & 1 deletion databricks-operator/azure-pipelines-ci-artifacts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ steps:

- script: |
go get sigs.k8s.io/kustomize
kustomize build config > $(Build.ArtifactStagingDirectory)/setup.yml
kustomize build config > $(Build.ArtifactStagingDirectory)/setup.yaml
continueOnError: 'false'
workingDirectory: '$(modulePath)/databricks-operator'
displayName: 'install kustomize'
Expand Down

0 comments on commit 2897409

Please sign in to comment.