Skip to content

Commit 8bc5234

Browse files
fsdvhBrent Gardnerandygrovedependabot[bot]yahoNanJing
authored
Update from upstream (#30)
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520) * configure_me_codegen retroactively reserved on our `bind_host` parameter name * Add label and pray * Add more labels why not * Prepare 0.10.0 Release (apache#522) * bump version * CHANGELOG * Ballista gets a docker image!!! (apache#521) * Ballista gets a docker image!!! * Enable flight sql * Allow executing startup script * Allow executing executables * Clippy * Remove capture group (apache#527) * fix python build in CI (apache#528) * fix python build in CI * save progress * use same min rust version in all crates * fix * use image from pyo3 * use newer image from pyo3 * do not require protoc * wheels now generated * rat - exclude generated file * Update docs for simplified instructions (apache#532) * Update docs for simplified instructions * Fix whoopsie * Update docs/source/user-guide/flightsql.md Co-authored-by: Andy Grove <[email protected]> Co-authored-by: Andy Grove <[email protected]> * remove --locked (apache#533) * Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525) * Provide a memory StateBackendClient (apache#523) * Rename StateBackend::Standalone to StateBackend:Sled * Copy utility files from sled crate since they cannot be used directly * Provide a memory StateBackendClient * Fix dashmap deadlock issue * Fix for the comments Co-authored-by: yangzhong <[email protected]> * only build docker images on rc tags (apache#535) * docs: fix style in the Helm readme (apache#551) * Fix Helm chart's image format (apache#550) * Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552) * Update datafusion requirement from 14.0.0 to 15.0.0 * Fix UT * Fix python * Fix python * Fix Python Co-authored-by: yangzhong <[email protected]> * Make it concurrently to launch tasks to executors (apache#557) * Make it concurrently to launch tasks to executors * Refine for comments Co-authored-by: yangzhong <[email protected]> * fix(ui): fix last seen (apache#562) * Support Alibaba Cloud OSS with ObjectStore (apache#567) * Fix cargo clippy (apache#571) Co-authored-by: yangzhong <[email protected]> * Super minor spelling error (apache#573) * Update env_logger requirement from 0.9 to 0.10 (apache#539) Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version. - [Release notes](https://github.com/rust-cli/env_logger/releases) - [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md) - [Commits](rust-cli/env_logger@v0.9.0...v0.10.0) --- updated-dependencies: - dependency-name: env_logger dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574) Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version. - [Release notes](https://github.com/besok/graphviz-rust/releases) - [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md) - [Commits](https://github.com/besok/graphviz-rust/commits) --- updated-dependencies: - dependency-name: graphviz-rust dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * updated readme to contain correct versions of dependencies. (apache#580) * Fix benchmark image link (apache#596) * Add support for Azure (apache#599) * Remove outdated script and use evergreen version of rust (apache#597) * Remove outdated script and use evergreen version of rust * Use debian protobuf * feat: update script such that ballista-cli image is built as well (apache#601) * Fix Cargo.toml format issue (apache#616) * Refactor executor main (apache#614) * Refactor executor main * copy all configs * toml fmt * Refactor scheduler main (apache#615) * refactor scheduler main * toml fmt * Python: add method to get explain output as a string (apache#593) * Update contributor guide (apache#617) * Cluster state refactor part 1 (apache#560) * Customize session builder * Add setter for executor slots policy * Construct Executor with functions * Add queued and completed timestamps to successful job status * Add public methods to SchedulerServer * Public method for getting execution graph * Public method for stage metrics * Use node-level local limit (#20) * Use node-level local limit * serialize limit in shuffle writer * Revert "Merge pull request #19 from coralogix/sc-5792" This reverts commit 08140ef, reversing changes made to a7f1384. * add log * make sure we don't forget limit for shuffle writer * update accum correctly and try to break early * Check local limit accumulator before polling for more data * fix build Co-authored-by: Martins Purins <[email protected]> * configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520) * configure_me_codegen retroactively reserved on our `bind_host` parameter name * Add label and pray * Add more labels why not * Add ClusterState trait * Refactor slightly for clarity * Revert "Use node-level local limit (#20)" This reverts commit ff96bcd. * Revert "Public method for stage metrics" This reverts commit a802315. * Revert "Public method for getting execution graph" This reverts commit 490bda5. * Revert "Add public methods to SchedulerServer" This reverts commit 5ad27c0. * Revert "Add queued and completed timestamps to successful job status" This reverts commit c615fce. * Revert "Construct Executor with functions" This reverts commit 24d4830. * Always forget the apache header Co-authored-by: Martins Purins <[email protected]> Co-authored-by: Brent Gardner <[email protected]> * replace master with main (apache#621) * implement new release process (apache#623) * add docs on who can release (apache#632) * Upgrade to DataFusion 16 (again) (apache#636) * Update datafusion dependency to the latest version (apache#612) * Update datafusion dependency to the latest version * Fix python * Skip ut of test_window_lead due to apache/datafusion-python#135 * Fix clippy --------- Co-authored-by: yangzhong <[email protected]> * Upgrade to DataFusion 17 (apache#639) * Upgrade to DF 17 * Restore original error handling functionality * Customize session builder * Construct Executor with functions * Add queued and completed timestamps to successful job status * Add public methods to SchedulerServer * Public method for getting execution graph * Public method for stage metrics * Use node-level local limit (#20) * Use node-level local limit * serialize limit in shuffle writer * Revert "Merge pull request #19 from coralogix/sc-5792" This reverts commit 08140ef, reversing changes made to a7f1384. * add log * make sure we don't forget limit for shuffle writer * update accum correctly and try to break early * Check local limit accumulator before polling for more data * fix build Co-authored-by: Martins Purins <[email protected]> * Add ClusterState trait * Expose active job count * Remove println * Resubmit jobs when no resources available for scheduling * Make parse_physical_expr public * Reduce log spam * Fix job submitted metric by ignoring resubmissions * Record when job is queued in scheduler metrics (#28) * Record when job is queueud in scheduler metrics * add additional buckets for exec times * Upstream rebase (#29) * configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520) * configure_me_codegen retroactively reserved on our `bind_host` parameter name * Add label and pray * Add more labels why not * Prepare 0.10.0 Release (apache#522) * bump version * CHANGELOG * Ballista gets a docker image!!! (apache#521) * Ballista gets a docker image!!! * Enable flight sql * Allow executing startup script * Allow executing executables * Clippy * Remove capture group (apache#527) * fix python build in CI (apache#528) * fix python build in CI * save progress * use same min rust version in all crates * fix * use image from pyo3 * use newer image from pyo3 * do not require protoc * wheels now generated * rat - exclude generated file * Update docs for simplified instructions (apache#532) * Update docs for simplified instructions * Fix whoopsie * Update docs/source/user-guide/flightsql.md Co-authored-by: Andy Grove <[email protected]> Co-authored-by: Andy Grove <[email protected]> * remove --locked (apache#533) * Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525) * Provide a memory StateBackendClient (apache#523) * Rename StateBackend::Standalone to StateBackend:Sled * Copy utility files from sled crate since they cannot be used directly * Provide a memory StateBackendClient * Fix dashmap deadlock issue * Fix for the comments Co-authored-by: yangzhong <[email protected]> * only build docker images on rc tags (apache#535) * docs: fix style in the Helm readme (apache#551) * Fix Helm chart's image format (apache#550) * Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552) * Update datafusion requirement from 14.0.0 to 15.0.0 * Fix UT * Fix python * Fix python * Fix Python Co-authored-by: yangzhong <[email protected]> * Make it concurrently to launch tasks to executors (apache#557) * Make it concurrently to launch tasks to executors * Refine for comments Co-authored-by: yangzhong <[email protected]> * fix(ui): fix last seen (apache#562) * Support Alibaba Cloud OSS with ObjectStore (apache#567) * Fix cargo clippy (apache#571) Co-authored-by: yangzhong <[email protected]> * Super minor spelling error (apache#573) * Update env_logger requirement from 0.9 to 0.10 (apache#539) Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version. - [Release notes](https://github.com/rust-cli/env_logger/releases) - [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md) - [Commits](rust-cli/env_logger@v0.9.0...v0.10.0) --- updated-dependencies: - dependency-name: env_logger dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574) Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version. - [Release notes](https://github.com/besok/graphviz-rust/releases) - [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md) - [Commits](https://github.com/besok/graphviz-rust/commits) --- updated-dependencies: - dependency-name: graphviz-rust dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * updated readme to contain correct versions of dependencies. (apache#580) * Fix benchmark image link (apache#596) * Add support for Azure (apache#599) * Remove outdated script and use evergreen version of rust (apache#597) * Remove outdated script and use evergreen version of rust * Use debian protobuf * Customize session builder * Add setter for executor slots policy * Construct Executor with functions * Add queued and completed timestamps to successful job status * Add public methods to SchedulerServer * Public method for getting execution graph * Public method for stage metrics * Use node-level local limit (#20) * Use node-level local limit * serialize limit in shuffle writer * Revert "Merge pull request #19 from coralogix/sc-5792" This reverts commit 08140ef, reversing changes made to a7f1384. * add log * make sure we don't forget limit for shuffle writer * update accum correctly and try to break early * Check local limit accumulator before polling for more data * fix build Co-authored-by: Martins Purins <[email protected]> * Add ClusterState trait * Expose active job count * Remove println * Resubmit jobs when no resources available for scheduling * Make parse_physical_expr public * Reduce log spam * Fix job submitted metric by ignoring resubmissions * Record when job is queued in scheduler metrics (#28) * Record when job is queueud in scheduler metrics * add additional buckets for exec times * fmt * clippy * tomlfmt Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Andy Grove <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: yahoNanJing <[email protected]> Co-authored-by: yangzhong <[email protected]> Co-authored-by: Xin Hao <[email protected]> Co-authored-by: Duyet Le <[email protected]> Co-authored-by: r.4ntix <[email protected]> Co-authored-by: Jeremy Dyer <[email protected]> Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]> Co-authored-by: Aidan Kovacic <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Martins Purins <[email protected]> Co-authored-by: Dan Harris <[email protected]> * Post merge update * update message formatting * post merge update * another post-merge updates * update github actions * clippy * update script * fmt --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Andy Grove <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: yahoNanJing <[email protected]> Co-authored-by: yangzhong <[email protected]> Co-authored-by: Xin Hao <[email protected]> Co-authored-by: Duyet Le <[email protected]> Co-authored-by: r.4ntix <[email protected]> Co-authored-by: Jeremy Dyer <[email protected]> Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]> Co-authored-by: Aidan Kovacic <[email protected]> Co-authored-by: Tim Van Wassenhove <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Martins Purins <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Dan Harris <[email protected]>
1 parent 763aa23 commit 8bc5234

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+3193
-2952
lines changed

.dockerignore

+1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ target/
1111
**/data
1212
!target/release/ballista-scheduler
1313
!target/release/ballista-executor
14+
!target/release/ballista-cli
1415
!target/release/tpch

CONTRIBUTING.md

+35-177
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,28 @@ We welcome and encourage contributions of all kinds, such as:
2525
2. Documentation improvements
2626
3. Code (PR or PR Review)
2727

28-
In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
28+
In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs.
29+
Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
2930

3031
You can find a curated
3132
[good-first-issue](https://github.com/apache/arrow-ballista/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
3233
list to help you get started.
3334

34-
# Developer's guide
35+
# Developer's Guide
3536

36-
This section describes how you can get started at developing DataFusion.
37+
This section describes how you can get started with Ballista development.
3738

38-
For information on developing with Ballista, see the
39-
[Ballista developer documentation](docs/developer/README.md).
39+
## Bootstrap Environment
4040

41-
### Bootstrap environment
41+
Ballista contains components implemented in the following programming languages:
4242

43-
DataFusion is written in Rust and it uses a standard rust toolkit:
43+
- Rust (Scheduler and Executor processes, Client library)
44+
- Python (Python bindings)
45+
- Javascript (Scheduler Web UI)
46+
47+
### Rust Environment
48+
49+
We use the standard Rust development tools.
4450

4551
- `cargo build`
4652
- `cargo fmt` to format the code
@@ -50,8 +56,6 @@ DataFusion is written in Rust and it uses a standard rust toolkit:
5056
Testing setup:
5157

5258
- `rustup update stable` DataFusion uses the latest stable release of rust
53-
- `git submodule init`
54-
- `git submodule update`
5559

5660
Formatting instructions:
5761

@@ -63,192 +67,46 @@ or run them all at once:
6367

6468
- [dev/rust_lint.sh](dev/rust_lint.sh)
6569

66-
## Test Organization
67-
68-
DataFusion has several levels of tests in its [Test
69-
Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
70-
and tries to follow [Testing Organization](https://doc.rust-lang.org/book/ch11-03-test-organization.html) in the The Book.
71-
72-
This section highlights the most important test modules that exist
73-
74-
### Unit tests
75-
76-
Tests for the code in an individual module are defined in the same source file with a `test` module, following Rust convention
77-
78-
### Rust Integration Tests
79-
80-
There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests) directory.
81-
82-
You can run these tests individually using a command such as
83-
84-
```shell
85-
cargo test -p datafusion --tests sql_integration
86-
```
87-
88-
One very important test is the [sql_integraton](https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setsups.
89-
90-
### SQL / Postgres Integration Tests
91-
92-
The [integration-tests](https://github.com/apache/arrow-datafusion/blob/master/datafusion/integration-tests] directory contains a harness that runs certain queries against both postgres and datafusion and compares results
93-
94-
#### setup environment
95-
96-
```shell
97-
export POSTGRES_DB=postgres
98-
export POSTGRES_USER=postgres
99-
export POSTGRES_HOST=localhost
100-
export POSTGRES_PORT=5432
101-
```
102-
103-
#### Install dependencies
104-
105-
```shell
106-
# Install dependencies
107-
python -m pip install --upgrade pip setuptools wheel
108-
python -m pip install -r integration-tests/requirements.txt
109-
110-
# setup environment
111-
POSTGRES_DB=postgres POSTGRES_USER=postgres POSTGRES_HOST=localhost POSTGRES_PORT=5432 python -m pytest -v integration-tests/test_psql_parity.py
112-
113-
# Create
114-
psql -d "$POSTGRES_DB" -h "$POSTGRES_HOST" -p "$POSTGRES_PORT" -U "$POSTGRES_USER" -c 'CREATE TABLE IF NOT EXISTS test (
115-
c1 character varying NOT NULL,
116-
c2 integer NOT NULL,
117-
c3 smallint NOT NULL,
118-
c4 smallint NOT NULL,
119-
c5 integer NOT NULL,
120-
c6 bigint NOT NULL,
121-
c7 smallint NOT NULL,
122-
c8 integer NOT NULL,
123-
c9 bigint NOT NULL,
124-
c10 character varying NOT NULL,
125-
c11 double precision NOT NULL,
126-
c12 double precision NOT NULL,
127-
c13 character varying NOT NULL
128-
);'
129-
130-
psql -d "$POSTGRES_DB" -h "$POSTGRES_HOST" -p "$POSTGRES_PORT" -U "$POSTGRES_USER" -c "\copy test FROM '$(pwd)/testing/data/csv/aggregate_test_100.csv' WITH (FORMAT csv, HEADER true);"
131-
```
132-
133-
#### Invoke the test runner
134-
135-
```shell
136-
python -m pytest -v integration-tests/test_psql_parity.py
137-
```
138-
139-
## Benchmarks
70+
### Rust Process Configuration
14071

141-
### Criterion Benchmarks
72+
The scheduler and executor processes can be configured using toml files, environment variables and command-line
73+
arguments. The specification for config options can be found here:
14274

143-
[Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
75+
- [ballista/scheduler/scheduler_config_spec.toml](ballista/scheduler/scheduler_config_spec.toml)
76+
- [ballista/executor/executor_config_spec.toml](ballista/executor/executor_config_spec.toml)
14477

145-
Criterion integrates with Cargo's built-in [benchmark support](https://doc.rust-lang.org/cargo/commands/cargo-bench.html) and a given benchmark can be run with
78+
Those files fully define Ballista's configuration. If there is a discrepancy between this documentation and the
79+
files, assume those files are correct.
14680

147-
```
148-
cargo bench --bench BENCHMARK_NAME
149-
```
150-
151-
A full list of benchmarks can be found [here](./datafusion/benches).
152-
153-
_[cargo-criterion](https://github.com/bheisler/cargo-criterion) may also be used for more advanced reporting._
154-
155-
#### Parquet SQL Benchmarks
156-
157-
The parquet SQL benchmarks can be run with
158-
159-
```
160-
cargo bench --bench parquet_query_sql
161-
```
162-
163-
These randomly generate a parquet file, and then benchmark queries sourced from [parquet_query_sql.sql](./datafusion/core/benches/parquet_query_sql.sql) against it. This can therefore be a quick way to add coverage of particular query and/or data paths.
164-
165-
If the environment variable `PARQUET_FILE` is set, the benchmark will run queries against this file instead of a randomly generated one. This can be useful for performing multiple runs, potentially with different code, against the same source data, or for testing against a custom dataset.
166-
167-
The benchmark will automatically remove any generated parquet file on exit, however, if interrupted (e.g. by CTRL+C) it will not. This can be useful for analysing the particular file after the fact, or preserving it to use with `PARQUET_FILE` in subsequent runs.
81+
To get a list of command-line arguments, run the binary with `--help`
16882

169-
### Upstream Benchmark Suites
83+
There is an example config file at [ballista/executor/examples/example_executor_config.toml](ballista/executor/examples/example_executor_config.toml)
17084

171-
Instructions and tooling for running upstream benchmark suites against DataFusion and/or Ballista can be found in [benchmarks](./benchmarks).
85+
The order of precedence for arguments is: default config file < environment variables < specified config file < command line arguments.
17286

173-
These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
87+
The executor and scheduler will look for the default config file at `/etc/ballista/[executor|scheduler].toml` To
88+
specify a config file use the `--config-file` argument.
17489

175-
## How to add a new scalar function
90+
Environment variables are prefixed by `BALLISTA_EXECUTOR` or `BALLISTA_SCHEDULER` for the executor and scheduler
91+
respectively. Hyphens in command line arguments become underscores. For example, the `--scheduler-host` argument
92+
for the executor becomes `BALLISTA_EXECUTOR_SCHEDULER_HOST`
17693

177-
Below is a checklist of what you need to do to add a new scalar function to DataFusion:
94+
### Python Environment
17895

179-
- Add the actual implementation of the function:
180-
- [here](datafusion/physical-expr/src/string_expressions.rs) for string functions
181-
- [here](datafusion/physical-expr/src/math_expressions.rs) for math functions
182-
- [here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
183-
- create a new module [here](datafusion/physical-expr/src) for other functions
184-
- In [core/src/physical_plan](datafusion/core/src/physical_plan/functions.rs), add:
185-
- a new variant to `BuiltinScalarFunction`
186-
- a new entry to `FromStr` with the name of the function as called by SQL
187-
- a new line in `return_type` with the expected return type of the function, given an incoming type
188-
- a new line in `signature` with the signature of the function (number and types of its arguments)
189-
- a new line in `create_physical_expr`/`create_physical_fun` mapping the built-in to the implementation
190-
- tests to the function.
191-
- In [core/tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
192-
- In [core/src/logical_plan/expr](datafusion/core/src/logical_plan/expr.rs), add:
193-
- a new entry of the `unary_scalar_expr!` macro for the new function.
194-
- In [core/src/logical_plan/mod](datafusion/core/src/logical_plan/mod.rs), add:
195-
- a new entry in the `pub use expr::{}` set.
96+
Refer to the instructions in the Python Bindings [README](./python/README.md)
19697

197-
## How to add a new aggregate function
98+
### Javascript Environment
19899

199-
Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
100+
Refer to the instructions in the Scheduler Web UI [README](./ballista/scheduler/ui/README.md)
200101

201-
- Add the actual implementation of an `Accumulator` and `AggregateExpr`:
202-
- [here](datafusion/src/physical_plan/string_expressions.rs) for string functions
203-
- [here](datafusion/src/physical_plan/math_expressions.rs) for math functions
204-
- [here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
205-
- create a new module [here](datafusion/src/physical_plan) for other functions
206-
- In [src/physical_plan/aggregates](datafusion/src/physical_plan/aggregates.rs), add:
207-
- a new variant to `BuiltinAggregateFunction`
208-
- a new entry to `FromStr` with the name of the function as called by SQL
209-
- a new line in `return_type` with the expected return type of the function, given an incoming type
210-
- a new line in `signature` with the signature of the function (number and types of its arguments)
211-
- a new line in `create_aggregate_expr` mapping the built-in to the implementation
212-
- tests to the function.
213-
- In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
102+
## Integration Tests
214103

215-
## How to display plans graphically
216-
217-
The query plans represented by `LogicalPlan` nodes can be graphically
218-
rendered using [Graphviz](http://www.graphviz.org/).
219-
220-
To do so, save the output of the `display_graphviz` function to a file.:
221-
222-
```rust
223-
// Create plan somehow...
224-
let mut output = File::create("/tmp/plan.dot")?;
225-
write!(output, "{}", plan.display_graphviz());
226-
```
227-
228-
Then, use the `dot` command line tool to render it into a file that
229-
can be displayed. For example, the following command creates a
230-
`/tmp/plan.pdf` file:
104+
The integration tests can be executed by running the following command from the root of the repository.
231105

232106
```bash
233-
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
107+
./dev/integration-tests.sh
234108
```
235109

236-
## Specification
237-
238-
We formalize DataFusion semantics and behaviors through specification
239-
documents. These specifications are useful to be used as references to help
240-
resolve ambiguities during development or code reviews.
241-
242-
You are also welcome to propose changes to existing specifications or create
243-
new specifications as you see fit.
244-
245-
Here is the list current active specifications:
246-
247-
- [Output field name semantic](https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html)
248-
- [Invariants](https://arrow.apache.org/datafusion/specification/invariants.html)
249-
250-
All specifications are stored in the `docs/source/specification` folder.
251-
252110
## How to format `.md` document
253111

254112
We are using `prettier` to format `.md` files.

Cargo.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@
1717

1818
[workspace]
1919
members = [
20-
"benchmarks",
20+
"ballista-cli",
2121
"ballista/client",
2222
"ballista/core",
2323
"ballista/executor",
2424
"ballista/scheduler",
25+
"benchmarks",
2526
"examples",
26-
"ballista-cli",
2727
]
2828
exclude = ["python"]
2929

ballista-cli/Cargo.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ ballista = { path = "../ballista/client", version = "0.10.0", features = [
3333
"standalone",
3434
] }
3535
clap = { version = "3", features = ["derive", "cargo"] }
36-
datafusion = "15.0.0"
37-
datafusion-cli = "15.0.0"
36+
datafusion = "17.0.0"
37+
datafusion-cli = "17.0.0"
3838
dirs = "4.0.0"
3939
env_logger = "0.10"
4040
mimalloc = { version = "0.1", default-features = false }

ballista-cli/src/command.rs

+3-3
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ impl Command {
6767
.map_err(BallistaError::DataFusionError)
6868
}
6969
Self::DescribeTable(name) => {
70-
let df = ctx.sql(&format!("SHOW COLUMNS FROM {}", name)).await?;
70+
let df = ctx.sql(&format!("SHOW COLUMNS FROM {name}")).await?;
7171
let batches = df.collect().await?;
7272
print_options
7373
.print_batches(&batches, now)
@@ -97,10 +97,10 @@ impl Command {
9797
Self::SearchFunctions(function) => {
9898
if let Ok(func) = function.parse::<Function>() {
9999
let details = func.function_details()?;
100-
println!("{}", details);
100+
println!("{details}");
101101
Ok(())
102102
} else {
103-
let msg = format!("{} is not a supported function", function);
103+
let msg = format!("{function} is not a supported function");
104104
Err(BallistaError::NotImplemented(msg))
105105
}
106106
}

ballista-cli/src/exec.rs

+6-6
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ pub async fn exec_from_lines(
5151
if line.ends_with(';') {
5252
match exec_and_print(ctx, print_options, query).await {
5353
Ok(_) => {}
54-
Err(err) => println!("{:?}", err),
54+
Err(err) => println!("{err:?}"),
5555
}
5656
query = "".to_owned();
5757
} else {
@@ -68,7 +68,7 @@ pub async fn exec_from_lines(
6868
if !query.is_empty() {
6969
match exec_and_print(ctx, print_options, query).await {
7070
Ok(_) => {}
71-
Err(err) => println!("{:?}", err),
71+
Err(err) => println!("{err:?}"),
7272
}
7373
}
7474
}
@@ -110,7 +110,7 @@ pub async fn exec_from_repl(ctx: &BallistaContext, print_options: &mut PrintOpti
110110
if let Err(e) =
111111
command.execute(&mut print_options).await
112112
{
113-
eprintln!("{}", e)
113+
eprintln!("{e}")
114114
}
115115
} else {
116116
eprintln!(
@@ -124,7 +124,7 @@ pub async fn exec_from_repl(ctx: &BallistaContext, print_options: &mut PrintOpti
124124
}
125125
_ => {
126126
if let Err(e) = cmd.execute(ctx, &mut print_options).await {
127-
eprintln!("{}", e)
127+
eprintln!("{e}")
128128
}
129129
}
130130
}
@@ -136,7 +136,7 @@ pub async fn exec_from_repl(ctx: &BallistaContext, print_options: &mut PrintOpti
136136
rl.add_history_entry(line.trim_end());
137137
match exec_and_print(ctx, &print_options, line).await {
138138
Ok(_) => {}
139-
Err(err) => eprintln!("{:?}", err),
139+
Err(err) => eprintln!("{err:?}"),
140140
}
141141
}
142142
Err(ReadlineError::Interrupted) => {
@@ -148,7 +148,7 @@ pub async fn exec_from_repl(ctx: &BallistaContext, print_options: &mut PrintOpti
148148
break;
149149
}
150150
Err(err) => {
151-
eprintln!("Unknown error happened {:?}", err);
151+
eprintln!("Unknown error happened {err:?}");
152152
break;
153153
}
154154
}

0 commit comments

Comments
 (0)