Skip to content

Commit

Permalink
Add default_python example (#3)
Browse files Browse the repository at this point in the history
This adds a default_python project based on the template of databricks/cli#686:
```
The 'default_python' project was generated by using the default-python template.
```

It also adds a LICENSE and removes the original stub example.
  • Loading branch information
lennartkats-db authored Sep 5, 2023
1 parent 412e180 commit d91afdb
Show file tree
Hide file tree
Showing 22 changed files with 505 additions and 35 deletions.
51 changes: 51 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
DB license

Copyright (2022) Databricks, Inc.

Definitions.

Agreement: The agreement between Databricks, Inc., and you governing the use of the Databricks Services, which shall
be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with
respect to Databricks Community Edition, the Community Edition Terms of Service located at
www.databricks.com/ce-termsofuse, in each case unless you have entered into a separate written agreement with
Databricks governing the use of the applicable Databricks Services.

Software: The source code and object code to which this license applies.

Scope of Use. You may not use this Software except in connection with your use of the Databricks Services pursuant to
the Agreement. Your use of the Software must comply at all times with any restrictions applicable to the Databricks
Services, generally, and must be used in accordance with any applicable documentation. You may view, use, copy,
modify, publish, and/or distribute the Software solely for the purposes of using the code within or connecting to the
Databricks Services. If you do not agree to these terms, you may not view, use, copy, modify, publish, and/or
distribute the Software.

Redistribution. You may redistribute and sublicense the Software so long as all use is in compliance with these terms.
In addition:

You must give any other recipients a copy of this License;
You must cause any modified files to carry prominent notices stating that you changed the files;
You must retain, in the source code form of any derivative works that you distribute, all copyright, patent,
trademark, and attribution notices from the source code form, excluding those notices that do not pertain to any part
of the derivative works; and
If the source code form includes a "NOTICE" text file as part of its distribution, then any derivative works that you
distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those
notices that do not pertain to any part of the derivative works.
You may add your own copyright statement to your modifications and may provide additional license terms and conditions
for use, reproduction, or distribution of your modifications, or for any such derivative works as a whole, provided
your use, reproduction, and distribution of the Software otherwise complies with the conditions stated in this
License.

Termination. This license terminates automatically upon your breach of these terms or upon the termination of your
Agreement. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must
permanently delete the Software and all copies thereof.

DISCLAIMER; LIMITATION OF LIABILITY.

THE SOFTWARE IS PROVIDED “AS-IS” AND WITH ALL FAULTS. DATABRICKS, ON BEHALF OF ITSELF AND ITS LICENSORS, SPECIFICALLY
DISCLAIMS ALL WARRANTIES RELATING TO THE SOURCE CODE, EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, IMPLIED
WARRANTIES, CONDITIONS AND OTHER TERMS OF MERCHANTABILITY, SATISFACTORY QUALITY OR FITNESS FOR A PARTICULAR PURPOSE,
AND NON-INFRINGEMENT. DATABRICKS AND ITS LICENSORS TOTAL AGGREGATE LIABILITY RELATING TO OR ARISING OUT OF YOUR USE OF
OR DATABRICKS’ PROVISIONING OF THE SOURCE CODE SHALL BE LIMITED TO ONE THOUSAND ($1,000) DOLLARS. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
9 changes: 9 additions & 0 deletions default_python/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

.databricks/
build/
dist/
__pycache__/
*.egg-info
.venv/
scratch/**
!scratch/README.md
3 changes: 3 additions & 0 deletions default_python/.vscode/__builtins__.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Typings for Pylance in Visual Studio Code
# see https://github.com/microsoft/pyright/blob/main/docs/builtins.md
from databricks.sdk.runtime import *
7 changes: 7 additions & 0 deletions default_python/.vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"recommendations": [
"databricks.databricks",
"ms-python.vscode-pylance",
"redhat.vscode-yaml"
]
}
14 changes: 14 additions & 0 deletions default_python/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"python.analysis.stubPath": ".vscode",
"databricks.python.envFile": "${workspaceFolder}/.env",
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"files.exclude": {
"**/*.egg-info": true
},
}
37 changes: 37 additions & 0 deletions default_python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# default_python

The 'default_python' project was generated by using the default-python template.

## Getting started

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

2. Authenticate to your Databricks workspace:
```
$ databricks configure
```
3. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
(Note that "dev" is the default target, so the `--target` parameter
is optional here.)
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] default_python-job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.
4. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
5. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html.Or read the "getting started" documentation for
**Databricks Connect** for instructions on running the included Python code from a different IDE.
6. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.
43 changes: 43 additions & 0 deletions default_python/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# This is a Databricks asset bundle definition for default_python.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: default_python

include:
- resources/*.yml

targets:
# The 'dev' target, used development purposes.
# Whenever a developer deploys using 'dev', they get their own copy.
dev:
# We use 'mode: development' to make everything deployed to this target gets a prefix
# like '[dev my_user_name]'. Setting this mode also disables any schedules and
# automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
mode: development
default: true
workspace:
host: https://myworkspace.databricks.com

# Optionally, there could be a 'staging' target here.
# (See Databricks docs on CI/CD at https://docs.databricks.com/dev-tools/bundles/index.html.)
#
# staging:
# workspace:
# host: https://myworkspace.databricks.com

# The 'prod' target, used for production deployment.
prod:
# For production deployments, we only have a single copy, so we override the
# workspace.root_path default of
# /Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
# to a path that is not specific to the current user.
mode: production
workspace:
host: https://myworkspace.databricks.com
root_path: /Shared/.bundle/prod/${bundle.name}
run_as:
# This runs as [email protected] in production. Alternatively,
# a service principal could be used here using service_principal_name
# (see Databricks documentation).
user_name: [email protected]

22 changes: 22 additions & 0 deletions default_python/fixtures/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Fixtures

This folder is reserved for fixtures, such as CSV files.

Below is an example of how to load fixtures as a data frame:

```
import pandas as pd
import os

def get_absolute_path(*relative_parts):
if 'dbutils' in globals():
base_dir = os.path.dirname(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()) # type: ignore
path = os.path.normpath(os.path.join(base_dir, *relative_parts))
return path if path.startswith("/Workspace") else os.path.join("/Workspace", path)
else:
return os.path.join(*relative_parts)

csv_file = get_absolute_path("..", "fixtures", "mycsv.csv")
df = pd.read_csv(csv_file)
display(df)
```
3 changes: 3 additions & 0 deletions default_python/pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[pytest]
testpaths = tests
pythonpath = src
48 changes: 48 additions & 0 deletions default_python/resources/default_python_job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# The main job for default_python
resources:
jobs:
default_python_job:
name: default_python_job

schedule:
quartz_cron_expression: '44 37 8 * * ?'
timezone_id: Europe/Amsterdam

email_notifications:
on_failure:
- [email protected]

tasks:
- task_key: notebook_task
job_cluster_key: job_cluster
notebook_task:
notebook_path: ../src/notebook.ipynb

- task_key: refresh_pipeline
depends_on:
- task_key: notebook_task
pipeline_task:
pipeline_id: ${resources.pipelines.default_python_pipeline.id}

- task_key: main_task
depends_on:
- task_key: refresh_pipeline
job_cluster_key: job_cluster
python_wheel_task:
package_name: default_python
entry_point: main
libraries:
- whl: ../dist/*.whl

job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 13.3.x-scala2.12
# node_type_id is the cluster node type to use.
# Typical node types on AWS include i3.xlarge;
# Standard_D3_v2 on Azure;
# n1-standard-4 on Google Cloud.
node_type_id: i3.xlarge
autoscale:
min_workers: 1
max_workers: 4
12 changes: 12 additions & 0 deletions default_python/resources/default_python_pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# The main pipeline for default_python
resources:
pipelines:
default_python_pipeline:
name: "default_python_pipeline"
target: "default_python_${bundle.environment}"
libraries:
- notebook:
path: ../src/dlt_pipeline.ipynb

configuration:
"bundle.sourcePath": "/Workspace/${workspace.file_path}/src"
4 changes: 4 additions & 0 deletions default_python/scratch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# scratch

This folder is reserved for personal, exploratory notebooks.
By default these are not committed to Git, as 'scratch' is listed in .gitignore.
50 changes: 50 additions & 0 deletions default_python/scratch/exploration.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "6bca260b-13d1-448f-8082-30b60a85c9ae",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append('../src')\n",
"from default_python import main\n",
"\n",
"main.get_taxis().show(10)"
]
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"dashboards": [],
"language": "python",
"notebookMetadata": {
"pythonIndentUnit": 2
},
"notebookName": "ipynb-notebook",
"widgets": {}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
24 changes: 24 additions & 0 deletions default_python/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
Setup script for default_python.
This script packages and distributes the associated wheel file(s).
Source code is in ./src/. Run 'python setup.py sdist bdist_wheel' to build.
"""
from setuptools import setup, find_packages

import sys
sys.path.append('./src')

import default_python

setup(
name="default_python",
version=default_python.__version__,
url="https://databricks.com",
author="<no value>",
description="my test wheel",
packages=find_packages(where='./src'),
package_dir={'': 'src'},
entry_points={"entry_points": "main=default_python.main:main"},
install_requires=["setuptools"],
)
1 change: 1 addition & 0 deletions default_python/src/default_python/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.0.1"
11 changes: 11 additions & 0 deletions default_python/src/default_python/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from pyspark.sql import SparkSession

def get_taxis():
spark = SparkSession.builder.getOrCreate()
return spark.read.table("samples.nyctaxi.trips")

def main():
get_taxis().show(5)

if __name__ == '__main__':
main()
Loading

0 comments on commit d91afdb

Please sign in to comment.