Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: re-used build cursors #3

Merged

Conversation

e-dard
Copy link

@e-dard e-dard commented Jul 15, 2021

This PR should bring in the perf work to re-use comparators rather than building them for each row.

e-dard added 2 commits July 15, 2021 16:56
This commit stores built Arrow comparators for two arrays on each of the sort key cursors, resulting in a significant reduction in the cost associated with merging record batches using the `SortPreservingMerge` operator.

Benchmarks improved as follows:

```
⇒  critcmp master pr
group                               master                                 pr
-----                               ------                                 --
interleave_batches                  1.83   623.8±12.41µs        ? ?/sec    1.00    341.2±6.98µs        ? ?/sec
merge_batches_no_overlap_large      1.56    400.6±4.94µs        ? ?/sec    1.00    256.3±6.57µs        ? ?/sec
merge_batches_no_overlap_small      1.63   425.1±24.88µs        ? ?/sec    1.00    261.1±7.46µs        ? ?/sec
merge_batches_small_into_large      1.18    228.0±3.95µs        ? ?/sec    1.00    193.6±2.86µs        ? ?/sec
merge_batches_some_overlap_large    1.68   505.4±10.27µs        ? ?/sec    1.00    301.3±6.63µs        ? ?/sec
merge_batches_some_overlap_small    1.64    515.7±5.21µs        ? ?/sec    1.00   314.6±12.66µs        ? ?/sec
```
@e-dard e-dard changed the title Er/alamb/perf integration df perf: re-used build cursors Jul 15, 2021
@alamb
Copy link
Owner

alamb commented Jul 15, 2021

Thanks -- sorry edd -- I am going to merge this into a second branch (the cleverly named alamb/perf_integration_df_2) so I can make sure it works in the context of IOx

@alamb alamb changed the base branch from alamb/perf_integration_df to alamb/perf_integration_df_2 July 15, 2021 17:27
@alamb alamb merged commit d201ebf into alamb:alamb/perf_integration_df_2 Jul 15, 2021
alamb added a commit that referenced this pull request Sep 22, 2021
* # This is a combination of 3 commits.
# This is the 1st commit message:

Add Display for Expr::BinaryExpr

# This is the commit message #2:

Update logical_plan/operators tests

# This is the commit message #3:

rebase and debug display for non binary expr

* Add Display for Expr::BinaryExpr

Update logical_plan/operators tests

rebase and debug display for non binary expr

Add Display for Expr::BinaryExpr

Update logical_plan/operators tests

Updating tests

Update aggregate display

Updating tests without aggregate

More tests

Working on agg/scalar functions

Fix binary_expr in create_name function and attendant tests

More tests

More tests

Doc tests

Rebase and update new tests

* Submodule update

* Restore submodule references from master

Co-authored-by: Andrew Lamb <[email protected]>
alamb pushed a commit that referenced this pull request Jan 13, 2023
* Initial commit

* initial commit

* failing test

* table scan projection

* closer

* test passes, with some hacks

* use DataFrame (#2)

* update README

* update dependency

* code cleanup (#3)

* Add support for Filter operator and BinaryOp expressions (#4)

* GitHub action (#5)

* Split code into producer and consumer modules (#6)

* Support more functions and scalar types (#7)

* Use substrait 0.1 and datafusion 8.0 (#8)

* use substrait 0.1

* use datafusion 8.0

* update datafusion to 10.0 and substrait to 0.2 (#11)

* Add basic join support (#12)

* Added fetch support (#23)

Added fetch to consumer

Added limit to producer

Added unit tests for limit

Added roundtrip_fill_none() for testing when None input can be converted to 0

Update src/consumer.rs

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* Upgrade to DataFusion 13.0.0 (#25)

* Add sort consumer and producer (#24)

Add consumer

Add producer and test

Modified error string

* Add serializer/deserializer (#26)

* Add plan and function extension support (#27)

* Add plan and function extension support

* Removed unwraps

* Implement GROUP BY (#28)

* Add consumer, producer and tests for aggregate relation

Change function extension registration from absolute to relative anchor
(reference)

Remove operator to/from reference

* Fixed function registration bug

* Add test

* Addressed PR comments

* Changed field reference from mask to direct reference (#29)

* Changed field reference from masked reference to direct reference

* Handle unsupported case (struct with child)

* Handle SubqueryAlias (#30)

Fixed aggregate function register bug

* Add support for SELECT DISTINCT (apache#31)

Add test case

* Implement BETWEEN (apache#32)

* Add case (apache#33)

* Implement CASE WHEN

* Add more case to test

* Addressed comments

* feat: support explicit catalog/schema names in ReadRel (apache#34)

* feat: support explicit catalog/schema names in ReadRel

Signed-off-by: Ruihang Xia <[email protected]>

* fix: use re-exported expr crate

Signed-off-by: Ruihang Xia <[email protected]>

Signed-off-by: Ruihang Xia <[email protected]>

* move files to subfolder

* RAT

* remove rust.yaml

* revert .gitignore changes

* tomlfmt

* tomlfmt

Signed-off-by: Ruihang Xia <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: JanKaul <[email protected]>
Co-authored-by: nseekhao <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>
alamb added a commit that referenced this pull request Feb 10, 2025
…4544)

* add mut annotation

* fix rust examples

* fix rust examples

* update

* fix first doctest

* fix first doctest

* fix more doctest

* fix more doctest

* fix more doctest

* adopt rustdoc syntax

* adopt rustdoc syntax

* adopt rustdoc syntax

* fix more doctest

* add missing imports

* final udtf

* reenable

* remove dep

* run prettier

* api-health

* update doc

* update doc

* temp fix

* fix doc

* fix async schema provider

* fix async schema provider

* fix doc

* fix doc

* reorder

* refactor

* s

* finish

* minor update

* add missing docs

* add deps (#3)

* fix doctest

* update doc

* fix doctest

* fix doctest

* tweak showkeys

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* update to use user_doc

* add rustdoc preprocessing

* fix dir

* revert to original doc

* add allocator

* mark type

* update

* fix doctest

* add doctest

* add doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* fix doctest

* prettier format

* revert change to datafusion-testing

* add apache header

* install cmake in setup-builder for ci workflow dependency

* taplo + fix snmalloc

* Update function docs

* preprocess user-guide

* Render examples as sql

* fix intro

* fix docs via script

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants