-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from goodwillpunning/updateReadme
Update README file
- Loading branch information
Showing
6 changed files
with
133 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.idea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Contributing | ||
Yes! Please halp! | ||
|
||
Drop me a line at: [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
Beaker | ||
|
||
Copyright (2023) Databricks, Inc. | ||
|
||
This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant | ||
to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the | ||
Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services, | ||
Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform | ||
Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at | ||
all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in | ||
accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information | ||
under the Agreement. | ||
|
||
Additionally, and notwithstanding anything in the Agreement to the contrary: | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES | ||
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE | ||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR | ||
IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
* you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the | ||
Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code | ||
version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license | ||
agreement)). | ||
|
||
If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile | ||
the Source Code of the Software. | ||
|
||
This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally, | ||
Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all | ||
copies thereof (including the Source Code). | ||
|
||
Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with | ||
respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks | ||
Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee | ||
has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services. | ||
|
||
Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used. | ||
|
||
Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company. | ||
|
||
Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and | ||
executable machine code. | ||
|
||
Source Code: the human readable portion of the Software. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,85 @@ | ||
# beaker | ||
Execute query benchmarks against Databricks SQL warehouses and clusters. | ||
# Beaker 🧪 | ||
Execute query benchmark tests against Databricks SQL warehouses and clusters. | ||
|
||
<img src="./assets/images/beaker.png" width="200"> | ||
|
||
## Getting Started | ||
You can create a new Benchmark test by passing in the parameters to the constructor or set the parameters later. | ||
```python | ||
# First, create a new Benchmark object | ||
from beaker import * | ||
|
||
benchmark = Benchmark() | ||
``` | ||
|
||
The Benchmark class uses a builder pattern to specify the test parameters. | ||
```python | ||
benchmark.setHostname(hostname=hostname) | ||
# HTTP path to an existing warehouse/cluster | ||
benchmark.setWarehouse(http_path=http_path) | ||
benchmark.setConcurrency(concurrency=10) | ||
benchmark.setWarehouseToken(token=pat) | ||
benchmark.setQuery(query=query) | ||
benchmark.setCatalog(catalog="hive_metastore") | ||
benchmark.preWarmTables(tables=["table_1", "table_2", "table_3"]) | ||
``` | ||
|
||
You may even choose to provision a new SQL warehouse. | ||
```python | ||
new_warehouse_config = { | ||
"type": "warehouse", | ||
"runtime": "latest", | ||
"size": "Large", | ||
"min_num_clusters": 1, | ||
"max_num_clusters": 3, | ||
"enable_photon": True | ||
} | ||
benchmark.setWarehouseConfig(new_warehouse_config) | ||
``` | ||
|
||
Finally, calling the `.execute()` function runs the benchmark test. | ||
```python | ||
# Run the benchmark! | ||
metrics = benchmark.execute() | ||
metrics.display() | ||
``` | ||
|
||
## Setting the benchmark queries to execute | ||
Beaker can execute benchmark queries is several formats: | ||
1. Execute a single query | ||
```benchmark.setQuery(query=query)``` | ||
2. Execute several queries from a file | ||
```benchmark.setQueryFile(query_file=query_file)``` | ||
3. Execute several query files given a local directory | ||
```benchmark.setQueryFileDir(query_file_dir=query_file_dir)``` | ||
|
||
However, if multiple query formats are provided, the following query format precedence will be followed: | ||
1. **Query File Dir** - if a local directory is provided then Beaker will parse all query files under the directory | ||
2. **Query File** - if no query directory is provided, but a query file is, then Beaker will parse the query file | ||
3. **Single Query** - if no query directory or query file is provided, then Beaker will execute a single query | ||
|
||
### Query file format | ||
The query file must contain queries that are separated using the following format: | ||
|
||
```sql | ||
-- a unique query identifier (header) followed by a newline | ||
Q1 | ||
|
||
-- the query body followed by a new line | ||
SELECT * FROM us_population_2016 WHERE state in ('DE', 'MD', 'VA'); | ||
|
||
``` | ||
|
||
## Viewing the metrics report | ||
Once all benchmark queries have been executed, Beaker will consolidate metrics of each query execution into a single Spark DataFrame. | ||
A temporary view is also created, to make querying the output and building local visualizations easier. | ||
|
||
The name of the view has the following format: `{name_of_benchmark}_vw` | ||
|
||
<img src="./assets/images/metrics_visualization.png" /> | ||
|
||
## Contributing | ||
Please halp! Drop me a line at: [email protected] if you're interested. | ||
|
||
## Legal Information | ||
This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, and feature requests can be submitted through the Issues page of this repo. Please see the [legal agreement](LICENSE) and understand that issues with the use of this code will not be answered or investigated by Databricks Support. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.