-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
**Summary**: Wrote **Quickstart** and **Overview** sections. Go [here](https://github.com/wangpatrick57/dbgym/tree/readme) to see the README on the website. **Details**: * **Overview** summarizes the research motivation behind the project, giving background as necessary. * **Quickstart** gives a single shell script which compiles Postgres with Boot, generates data, builds a Proto-X embedding, and trains a Proto-X agent. * I renamed all occurrences of "pgdata" to "dbdata" to match the project's vision of working for multiple DBMSs (as described in the README). * I removed the startup check. * I got rid of the `ssd_checker` dependency as it's a very small repository. * Fixed Postgres compilation code to work with the new `vldb_2024` branch of Boot. --------- Co-authored-by: Wan Shen Lim <[email protected]>
- Loading branch information
1 parent
3aecdd1
commit d5cc4c2
Showing
22 changed files
with
387 additions
and
284 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,86 @@ | ||
# Database Gym | ||
# 🛢️ Database Gym 🏋️ | ||
[\[Slides\]](http://www.cidrdb.org/cidr2023/slides/p27-lim-slides.pdf) [\[Paper\]](https://www.cidrdb.org/cidr2023/papers/p27-lim.pdf) | ||
|
||
*An end-to-end research vehicle for the field of self-driving DBMSs.* | ||
|
||
## Quickstart | ||
|
||
These steps were tested on a fresh repository clone, Ubuntu 22.04. | ||
|
||
``` | ||
# Setup dependencies. | ||
# You may want to create a Python virtual environment (e.g. with conda) before doing this. | ||
./dependency/install_dependencies.sh | ||
# Compile a custom fork of PostgreSQL, load TPC-H (SF 0.01), train the Proto-X agent, and tune. | ||
./scripts/quickstart.sh postgres tpch 0.01 protox | ||
``` | ||
|
||
## Overview | ||
|
||
Autonomous DBMS research often involves more engineering than research. | ||
As new advances in state-of-the-art technology are made, it is common to find that they have have | ||
reimplemented the database tuning pipeline from scratch: workload capture, database setup, | ||
training data collection, model creation, model deployment, and more. | ||
Moreover, these bespoke pipelines make it difficult to combine different techniques even when they | ||
should be independent (e.g., using a different operator latency model in a tuning algorithm). | ||
|
||
The database gym project is our attempt at standardizing the APIs between these disparate tasks, | ||
allowing researchers to mix-and-match the different pipeline components. | ||
It draws inspiration from the Farama Foundation's Gymnasium (formerly OpenAI Gym), which | ||
accelerates the development and comparison of reinforcement learning algorithms by providing a set | ||
of agents, environments, and a standardized API for communicating between them. | ||
Through the database gym, we hope to save other people time and reimplementation effort by | ||
providing an extensible open-source platform for autonomous DBMS research. | ||
|
||
This project is under active development. | ||
Currently, we decompose the database tuning pipeline into the following components: | ||
|
||
1. Workload: collection, forecasting, synthesis | ||
2. Database: database loading, instrumentation, orchestrating workload execution | ||
3. Agent: identifying tuning actions, suggesting an action | ||
|
||
## Repository Structure | ||
|
||
`task.py` is the entrypoint for all tasks. | ||
The tasks are grouped into categories that correspond to the top-level directories of the repository: | ||
|
||
- `benchmark` - tasks to generate data and queries for different benchmarks (e.g., TPC-H, JOB) | ||
- `dbms` - tasks to build and start DBMSs (e.g., PostgreSQL) | ||
- `tune` - tasks to train autonomous database tuning agents | ||
|
||
## Credits | ||
|
||
The Database Gym project rose from the ashes of the [NoisePage](https://db.cs.cmu.edu/projects/noisepage/) self-driving DBMS project. | ||
|
||
The first prototype was written by [Patrick Wang](https://github.com/wangpatrick57), integrating [Boot (VLDB 2024)](https://github.com/lmwnshn/boot) and [Proto-X (VLDB 2024)](https://github.com/17zhangw/protox) into a cohesive system. | ||
|
||
## Citing This Repository | ||
|
||
If you use this repository in an academic paper, please cite: | ||
|
||
``` | ||
@inproceedings{lim23, | ||
author = {Lim, Wan Shen and Butrovich, Matthew and Zhang, William and Crotty, Andrew and Ma, Lin and Xu, Peijing and Gehrke, Johannes and Pavlo, Andrew}, | ||
title = {Database Gyms}, | ||
booktitle = {{CIDR} 2023, Conference on Innovative Data Systems Research}, | ||
year = {2023}, | ||
url = {https://db.cs.cmu.edu/papers/2023/p27-lim.pdf}, | ||
} | ||
``` | ||
|
||
Additionally, please cite any module-specific paper that is relevant to your use. | ||
|
||
**Accelerating Training Data Generation** | ||
|
||
``` | ||
(citation pending) | ||
Boot, appearing at VLDB 2024. | ||
``` | ||
|
||
**Simultaneously Tuning Multiple Configuration Spaces with Proto Actions** | ||
|
||
``` | ||
(citation pending) | ||
Proto-X, appearing at VLDB 2024. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,34 +4,34 @@ set -euxo pipefail | |
|
||
REPO_REAL_PARENT_DPATH="$1" | ||
|
||
# download and make postgres from the boot repository | ||
# Download and make postgres from the boot repository. | ||
mkdir -p "${REPO_REAL_PARENT_DPATH}" | ||
cd "${REPO_REAL_PARENT_DPATH}" | ||
git clone [email protected]:lmwnshn/boot.git --single-branch --branch boot --depth 1 | ||
git clone [email protected]:lmwnshn/boot.git --single-branch --branch vldb_2024 --depth 1 | ||
cd ./boot | ||
./cmudb/build/configure.sh release "${REPO_REAL_PARENT_DPATH}/boot/build/postgres" | ||
make clean | ||
make install-world-bin -j4 | ||
|
||
# download and make bytejack | ||
cd ./cmudb/extension/bytejack_rs/ | ||
# Download and make boot. | ||
cd ./cmudb/extension/boot_rs/ | ||
cargo build --release | ||
cbindgen . -o target/bytejack_rs.h --lang c | ||
cbindgen . -o target/boot_rs.h --lang c | ||
cd "${REPO_REAL_PARENT_DPATH}/boot" | ||
|
||
cd ./cmudb/extension/bytejack/ | ||
cd ./cmudb/extension/boot/ | ||
make clean | ||
make install -j | ||
cd "${REPO_REAL_PARENT_DPATH}/boot" | ||
|
||
# download and make hypopg | ||
# Download and make hypopg. | ||
git clone [email protected]:HypoPG/hypopg.git | ||
cd ./hypopg | ||
PG_CONFIG="${REPO_REAL_PARENT_DPATH}/boot/build/postgres/bin/pg_config" make install | ||
cd "${REPO_REAL_PARENT_DPATH}/boot" | ||
|
||
# download and make pg_hint_plan | ||
# we need -L to follow links | ||
# Download and make pg_hint_plan. | ||
# We need -L to follow links. | ||
curl -L https://github.com/ossc-db/pg_hint_plan/archive/refs/tags/REL15_1_5_1.tar.gz -o REL15_1_5_1.tar.gz | ||
tar -xzf REL15_1_5_1.tar.gz | ||
rm REL15_1_5_1.tar.gz | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.