Skip to content

Commit

Permalink
Merge pull request #1 from goodwillpunning/updateReadme
Browse files Browse the repository at this point in the history
Update README file
  • Loading branch information
goodwillpunning authored Mar 4, 2023
2 parents 37a6ceb + de33991 commit d857903
Show file tree
Hide file tree
Showing 6 changed files with 133 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Contributing
Yes! Please halp!

Drop me a line at: [email protected]
43 changes: 43 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Beaker

Copyright (2023) Databricks, Inc.

This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant
to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the
Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services,
Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform
Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at
all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in
accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information
under the Agreement.

Additionally, and notwithstanding anything in the Agreement to the contrary:
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
* you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the
Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code
version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license
agreement)).

If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile
the Source Code of the Software.

This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally,
Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all
copies thereof (including the Source Code).

Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with
respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks
Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee
has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services.

Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.

Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.

Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and
executable machine code.

Source Code: the human readable portion of the Software.
87 changes: 85 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,85 @@
# beaker
Execute query benchmarks against Databricks SQL warehouses and clusters.
# Beaker 🧪
Execute query benchmark tests against Databricks SQL warehouses and clusters.

<img src="./assets/images/beaker.png" width="200">

## Getting Started
You can create a new Benchmark test by passing in the parameters to the constructor or set the parameters later.
```python
# First, create a new Benchmark object
from beaker import *

benchmark = Benchmark()
```

The Benchmark class uses a builder pattern to specify the test parameters.
```python
benchmark.setHostname(hostname=hostname)
# HTTP path to an existing warehouse/cluster
benchmark.setWarehouse(http_path=http_path)
benchmark.setConcurrency(concurrency=10)
benchmark.setWarehouseToken(token=pat)
benchmark.setQuery(query=query)
benchmark.setCatalog(catalog="hive_metastore")
benchmark.preWarmTables(tables=["table_1", "table_2", "table_3"])
```

You may even choose to provision a new SQL warehouse.
```python
new_warehouse_config = {
"type": "warehouse",
"runtime": "latest",
"size": "Large",
"min_num_clusters": 1,
"max_num_clusters": 3,
"enable_photon": True
}
benchmark.setWarehouseConfig(new_warehouse_config)
```

Finally, calling the `.execute()` function runs the benchmark test.
```python
# Run the benchmark!
metrics = benchmark.execute()
metrics.display()
```

## Setting the benchmark queries to execute
Beaker can execute benchmark queries is several formats:
1. Execute a single query
```benchmark.setQuery(query=query)```
2. Execute several queries from a file
```benchmark.setQueryFile(query_file=query_file)```
3. Execute several query files given a local directory
```benchmark.setQueryFileDir(query_file_dir=query_file_dir)```

However, if multiple query formats are provided, the following query format precedence will be followed:
1. **Query File Dir** - if a local directory is provided then Beaker will parse all query files under the directory
2. **Query File** - if no query directory is provided, but a query file is, then Beaker will parse the query file
3. **Single Query** - if no query directory or query file is provided, then Beaker will execute a single query

### Query file format
The query file must contain queries that are separated using the following format:

```sql
-- a unique query identifier (header) followed by a newline
Q1

-- the query body followed by a new line
SELECT * FROM us_population_2016 WHERE state in ('DE', 'MD', 'VA');

```

## Viewing the metrics report
Once all benchmark queries have been executed, Beaker will consolidate metrics of each query execution into a single Spark DataFrame.
A temporary view is also created, to make querying the output and building local visualizations easier.

The name of the view has the following format: `{name_of_benchmark}_vw`

<img src="./assets/images/metrics_visualization.png" />

## Contributing
Please halp! Drop me a line at: [email protected] if you're interested.

## Legal Information
This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, and feature requests can be submitted through the Issues page of this repo. Please see the [legal agreement](LICENSE) and understand that issues with the use of this code will not be answered or investigated by Databricks Support.
Binary file added assets/images/beaker.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/metrics_visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d857903

Please sign in to comment.