You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)
* configure_me_codegen retroactively reserved on our `bind_host` parameter name
* Add label and pray
* Add more labels why not
* Prepare 0.10.0 Release (apache#522)
* bump version
* CHANGELOG
* Ballista gets a docker image!!! (apache#521)
* Ballista gets a docker image!!!
* Enable flight sql
* Allow executing startup script
* Allow executing executables
* Clippy
* Remove capture group (apache#527)
* fix python build in CI (apache#528)
* fix python build in CI
* save progress
* use same min rust version in all crates
* fix
* use image from pyo3
* use newer image from pyo3
* do not require protoc
* wheels now generated
* rat - exclude generated file
* Update docs for simplified instructions (apache#532)
* Update docs for simplified instructions
* Fix whoopsie
* Update docs/source/user-guide/flightsql.md
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
* remove --locked (apache#533)
* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)
* Provide a memory StateBackendClient (apache#523)
* Rename StateBackend::Standalone to StateBackend:Sled
* Copy utility files from sled crate since they cannot be used directly
* Provide a memory StateBackendClient
* Fix dashmap deadlock issue
* Fix for the comments
Co-authored-by: yangzhong <[email protected]>
* only build docker images on rc tags (apache#535)
* docs: fix style in the Helm readme (apache#551)
* Fix Helm chart's image format (apache#550)
* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)
* Update datafusion requirement from 14.0.0 to 15.0.0
* Fix UT
* Fix python
* Fix python
* Fix Python
Co-authored-by: yangzhong <[email protected]>
* Make it concurrently to launch tasks to executors (apache#557)
* Make it concurrently to launch tasks to executors
* Refine for comments
Co-authored-by: yangzhong <[email protected]>
* fix(ui): fix last seen (apache#562)
* Support Alibaba Cloud OSS with ObjectStore (apache#567)
* Fix cargo clippy (apache#571)
Co-authored-by: yangzhong <[email protected]>
* Super minor spelling error (apache#573)
* Update env_logger requirement from 0.9 to 0.10 (apache#539)
Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)
---
updated-dependencies:
- dependency-name: env_logger
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)
Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)
---
updated-dependencies:
- dependency-name: graphviz-rust
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated readme to contain correct versions of dependencies. (apache#580)
* Fix benchmark image link (apache#596)
* Add support for Azure (apache#599)
* Remove outdated script and use evergreen version of rust (apache#597)
* Remove outdated script and use evergreen version of rust
* Use debian protobuf
* feat: update script such that ballista-cli image is built as well (apache#601)
* Fix Cargo.toml format issue (apache#616)
* Refactor executor main (apache#614)
* Refactor executor main
* copy all configs
* toml fmt
* Refactor scheduler main (apache#615)
* refactor scheduler main
* toml fmt
* Python: add method to get explain output as a string (apache#593)
* Update contributor guide (apache#617)
* Cluster state refactor part 1 (apache#560)
* Customize session builder
* Add setter for executor slots policy
* Construct Executor with functions
* Add queued and completed timestamps to successful job status
* Add public methods to SchedulerServer
* Public method for getting execution graph
* Public method for stage metrics
* Use node-level local limit (#20)
* Use node-level local limit
* serialize limit in shuffle writer
* Revert "Merge pull request #19 from coralogix/sc-5792"
This reverts commit 08140ef, reversing
changes made to a7f1384.
* add log
* make sure we don't forget limit for shuffle writer
* update accum correctly and try to break early
* Check local limit accumulator before polling for more data
* fix build
Co-authored-by: Martins Purins <[email protected]>
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)
* configure_me_codegen retroactively reserved on our `bind_host` parameter name
* Add label and pray
* Add more labels why not
* Add ClusterState trait
* Refactor slightly for clarity
* Revert "Use node-level local limit (#20)"
This reverts commit ff96bcd.
* Revert "Public method for stage metrics"
This reverts commit a802315.
* Revert "Public method for getting execution graph"
This reverts commit 490bda5.
* Revert "Add public methods to SchedulerServer"
This reverts commit 5ad27c0.
* Revert "Add queued and completed timestamps to successful job status"
This reverts commit c615fce.
* Revert "Construct Executor with functions"
This reverts commit 24d4830.
* Always forget the apache header
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
* replace master with main (apache#621)
* implement new release process (apache#623)
* add docs on who can release (apache#632)
* Upgrade to DataFusion 16 (again) (apache#636)
* Update datafusion dependency to the latest version (apache#612)
* Update datafusion dependency to the latest version
* Fix python
* Skip ut of test_window_lead due to apache/datafusion-python#135
* Fix clippy
---------
Co-authored-by: yangzhong <[email protected]>
* Upgrade to DataFusion 17 (apache#639)
* Upgrade to DF 17
* Restore original error handling functionality
* Customize session builder
* Construct Executor with functions
* Add queued and completed timestamps to successful job status
* Add public methods to SchedulerServer
* Public method for getting execution graph
* Public method for stage metrics
* Use node-level local limit (#20)
* Use node-level local limit
* serialize limit in shuffle writer
* Revert "Merge pull request #19 from coralogix/sc-5792"
This reverts commit 08140ef, reversing
changes made to a7f1384.
* add log
* make sure we don't forget limit for shuffle writer
* update accum correctly and try to break early
* Check local limit accumulator before polling for more data
* fix build
Co-authored-by: Martins Purins <[email protected]>
* Add ClusterState trait
* Expose active job count
* Remove println
* Resubmit jobs when no resources available for scheduling
* Make parse_physical_expr public
* Reduce log spam
* Fix job submitted metric by ignoring resubmissions
* Record when job is queued in scheduler metrics (#28)
* Record when job is queueud in scheduler metrics
* add additional buckets for exec times
* Upstream rebase (#29)
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)
* configure_me_codegen retroactively reserved on our `bind_host` parameter name
* Add label and pray
* Add more labels why not
* Prepare 0.10.0 Release (apache#522)
* bump version
* CHANGELOG
* Ballista gets a docker image!!! (apache#521)
* Ballista gets a docker image!!!
* Enable flight sql
* Allow executing startup script
* Allow executing executables
* Clippy
* Remove capture group (apache#527)
* fix python build in CI (apache#528)
* fix python build in CI
* save progress
* use same min rust version in all crates
* fix
* use image from pyo3
* use newer image from pyo3
* do not require protoc
* wheels now generated
* rat - exclude generated file
* Update docs for simplified instructions (apache#532)
* Update docs for simplified instructions
* Fix whoopsie
* Update docs/source/user-guide/flightsql.md
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
* remove --locked (apache#533)
* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)
* Provide a memory StateBackendClient (apache#523)
* Rename StateBackend::Standalone to StateBackend:Sled
* Copy utility files from sled crate since they cannot be used directly
* Provide a memory StateBackendClient
* Fix dashmap deadlock issue
* Fix for the comments
Co-authored-by: yangzhong <[email protected]>
* only build docker images on rc tags (apache#535)
* docs: fix style in the Helm readme (apache#551)
* Fix Helm chart's image format (apache#550)
* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)
* Update datafusion requirement from 14.0.0 to 15.0.0
* Fix UT
* Fix python
* Fix python
* Fix Python
Co-authored-by: yangzhong <[email protected]>
* Make it concurrently to launch tasks to executors (apache#557)
* Make it concurrently to launch tasks to executors
* Refine for comments
Co-authored-by: yangzhong <[email protected]>
* fix(ui): fix last seen (apache#562)
* Support Alibaba Cloud OSS with ObjectStore (apache#567)
* Fix cargo clippy (apache#571)
Co-authored-by: yangzhong <[email protected]>
* Super minor spelling error (apache#573)
* Update env_logger requirement from 0.9 to 0.10 (apache#539)
Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)
---
updated-dependencies:
- dependency-name: env_logger
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)
Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)
---
updated-dependencies:
- dependency-name: graphviz-rust
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated readme to contain correct versions of dependencies. (apache#580)
* Fix benchmark image link (apache#596)
* Add support for Azure (apache#599)
* Remove outdated script and use evergreen version of rust (apache#597)
* Remove outdated script and use evergreen version of rust
* Use debian protobuf
* Customize session builder
* Add setter for executor slots policy
* Construct Executor with functions
* Add queued and completed timestamps to successful job status
* Add public methods to SchedulerServer
* Public method for getting execution graph
* Public method for stage metrics
* Use node-level local limit (#20)
* Use node-level local limit
* serialize limit in shuffle writer
* Revert "Merge pull request #19 from coralogix/sc-5792"
This reverts commit 08140ef, reversing
changes made to a7f1384.
* add log
* make sure we don't forget limit for shuffle writer
* update accum correctly and try to break early
* Check local limit accumulator before polling for more data
* fix build
Co-authored-by: Martins Purins <[email protected]>
* Add ClusterState trait
* Expose active job count
* Remove println
* Resubmit jobs when no resources available for scheduling
* Make parse_physical_expr public
* Reduce log spam
* Fix job submitted metric by ignoring resubmissions
* Record when job is queued in scheduler metrics (#28)
* Record when job is queueud in scheduler metrics
* add additional buckets for exec times
* fmt
* clippy
* tomlfmt
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
* Post merge update
* update message formatting
* post merge update
* another post-merge updates
* update github actions
* clippy
* update script
* fmt
---------
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Tim Van Wassenhove <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Copy file name to clipboardexpand all lines: CONTRIBUTING.md
+35-177
Original file line number
Diff line number
Diff line change
@@ -25,22 +25,28 @@ We welcome and encourage contributions of all kinds, such as:
25
25
2. Documentation improvements
26
26
3. Code (PR or PR Review)
27
27
28
-
In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
28
+
In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs.
29
+
Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
and tries to follow [Testing Organization](https://doc.rust-lang.org/book/ch11-03-test-organization.html) in the The Book.
71
-
72
-
This section highlights the most important test modules that exist
73
-
74
-
### Unit tests
75
-
76
-
Tests for the code in an individual module are defined in the same source file with a `test` module, following Rust convention
77
-
78
-
### Rust Integration Tests
79
-
80
-
There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests) directory.
81
-
82
-
You can run these tests individually using a command such as
83
-
84
-
```shell
85
-
cargo test -p datafusion --tests sql_integration
86
-
```
87
-
88
-
One very important test is the [sql_integraton](https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setsups.
89
-
90
-
### SQL / Postgres Integration Tests
91
-
92
-
The [integration-tests](https://github.com/apache/arrow-datafusion/blob/master/datafusion/integration-tests] directory contains a harness that runs certain queries against both postgres and datafusion and compares results
The scheduler and executor processes can be configured using toml files, environment variables and command-line
73
+
arguments. The specification for config options can be found here:
142
74
143
-
[Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
Criterion integrates with Cargo's built-in [benchmark support](https://doc.rust-lang.org/cargo/commands/cargo-bench.html) and a given benchmark can be run with
78
+
Those files fully define Ballista's configuration. If there is a discrepancy between this documentation and the
79
+
files, assume those files are correct.
146
80
147
-
```
148
-
cargo bench --bench BENCHMARK_NAME
149
-
```
150
-
151
-
A full list of benchmarks can be found [here](./datafusion/benches).
152
-
153
-
_[cargo-criterion](https://github.com/bheisler/cargo-criterion) may also be used for more advanced reporting._
154
-
155
-
#### Parquet SQL Benchmarks
156
-
157
-
The parquet SQL benchmarks can be run with
158
-
159
-
```
160
-
cargo bench --bench parquet_query_sql
161
-
```
162
-
163
-
These randomly generate a parquet file, and then benchmark queries sourced from [parquet_query_sql.sql](./datafusion/core/benches/parquet_query_sql.sql) against it. This can therefore be a quick way to add coverage of particular query and/or data paths.
164
-
165
-
If the environment variable `PARQUET_FILE` is set, the benchmark will run queries against this file instead of a randomly generated one. This can be useful for performing multiple runs, potentially with different code, against the same source data, or for testing against a custom dataset.
166
-
167
-
The benchmark will automatically remove any generated parquet file on exit, however, if interrupted (e.g. by CTRL+C) it will not. This can be useful for analysing the particular file after the fact, or preserving it to use with `PARQUET_FILE` in subsequent runs.
81
+
To get a list of command-line arguments, run the binary with `--help`
168
82
169
-
### Upstream Benchmark Suites
83
+
There is an example config file at [ballista/executor/examples/example_executor_config.toml](ballista/executor/examples/example_executor_config.toml)
170
84
171
-
Instructions and tooling for running upstream benchmark suites against DataFusion and/or Ballista can be found in [benchmarks](./benchmarks).
85
+
The order of precedence for arguments is: default config file < environment variables < specified config file < command line arguments.
172
86
173
-
These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
87
+
The executor and scheduler will look for the default config file at `/etc/ballista/[executor|scheduler].toml` To
88
+
specify a config file use the `--config-file` argument.
174
89
175
-
## How to add a new scalar function
90
+
Environment variables are prefixed by `BALLISTA_EXECUTOR` or `BALLISTA_SCHEDULER` for the executor and scheduler
91
+
respectively. Hyphens in command line arguments become underscores. For example, the `--scheduler-host` argument
92
+
for the executor becomes `BALLISTA_EXECUTOR_SCHEDULER_HOST`
176
93
177
-
Below is a checklist of what you need to do to add a new scalar function to DataFusion:
94
+
### Python Environment
178
95
179
-
- Add the actual implementation of the function:
180
-
-[here](datafusion/physical-expr/src/string_expressions.rs) for string functions
181
-
-[here](datafusion/physical-expr/src/math_expressions.rs) for math functions
182
-
-[here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
183
-
- create a new module [here](datafusion/physical-expr/src) for other functions
184
-
- In [core/src/physical_plan](datafusion/core/src/physical_plan/functions.rs), add:
185
-
- a new variant to `BuiltinScalarFunction`
186
-
- a new entry to `FromStr` with the name of the function as called by SQL
187
-
- a new line in `return_type` with the expected return type of the function, given an incoming type
188
-
- a new line in `signature` with the signature of the function (number and types of its arguments)
189
-
- a new line in `create_physical_expr`/`create_physical_fun` mapping the built-in to the implementation
190
-
- tests to the function.
191
-
- In [core/tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
192
-
- In [core/src/logical_plan/expr](datafusion/core/src/logical_plan/expr.rs), add:
193
-
- a new entry of the `unary_scalar_expr!` macro for the new function.
194
-
- In [core/src/logical_plan/mod](datafusion/core/src/logical_plan/mod.rs), add:
195
-
- a new entry in the `pub use expr::{}` set.
96
+
Refer to the instructions in the Python Bindings [README](./python/README.md)
196
97
197
-
##How to add a new aggregate function
98
+
### Javascript Environment
198
99
199
-
Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
100
+
Refer to the instructions in the Scheduler Web UI [README](./ballista/scheduler/ui/README.md)
200
101
201
-
- Add the actual implementation of an `Accumulator` and `AggregateExpr`:
202
-
-[here](datafusion/src/physical_plan/string_expressions.rs) for string functions
203
-
-[here](datafusion/src/physical_plan/math_expressions.rs) for math functions
204
-
-[here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
205
-
- create a new module [here](datafusion/src/physical_plan) for other functions
206
-
- In [src/physical_plan/aggregates](datafusion/src/physical_plan/aggregates.rs), add:
207
-
- a new variant to `BuiltinAggregateFunction`
208
-
- a new entry to `FromStr` with the name of the function as called by SQL
209
-
- a new line in `return_type` with the expected return type of the function, given an incoming type
210
-
- a new line in `signature` with the signature of the function (number and types of its arguments)
211
-
- a new line in `create_aggregate_expr` mapping the built-in to the implementation
212
-
- tests to the function.
213
-
- In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
102
+
## Integration Tests
214
103
215
-
## How to display plans graphically
216
-
217
-
The query plans represented by `LogicalPlan` nodes can be graphically
218
-
rendered using [Graphviz](http://www.graphviz.org/).
219
-
220
-
To do so, save the output of the `display_graphviz` function to a file.:
221
-
222
-
```rust
223
-
// Create plan somehow...
224
-
letmutoutput=File::create("/tmp/plan.dot")?;
225
-
write!(output, "{}", plan.display_graphviz());
226
-
```
227
-
228
-
Then, use the `dot` command line tool to render it into a file that
229
-
can be displayed. For example, the following command creates a
230
-
`/tmp/plan.pdf` file:
104
+
The integration tests can be executed by running the following command from the root of the repository.
231
105
232
106
```bash
233
-
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
107
+
./dev/integration-tests.sh
234
108
```
235
109
236
-
## Specification
237
-
238
-
We formalize DataFusion semantics and behaviors through specification
239
-
documents. These specifications are useful to be used as references to help
240
-
resolve ambiguities during development or code reviews.
241
-
242
-
You are also welcome to propose changes to existing specifications or create
243
-
new specifications as you see fit.
244
-
245
-
Here is the list current active specifications:
246
-
247
-
-[Output field name semantic](https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html)
0 commit comments