Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some more aggregate sqllogictests and remove rust tests #4505

Merged
merged 3 commits into from
Dec 5, 2022

Conversation

mvanschellebeeck
Copy link
Contributor

Ports more aggregate tests from aggregate.rs and removes the existing ones.

Work towards closing #4495

@github-actions github-actions bot added the core Core DataFusion crate label Dec 4, 2022
# TODO: fix decimal places
query TIR
SELECT c1, c2, AVG(c3) FROM aggregate_test_100_by_sql GROUP BY CUBE (c1, c2) ORDER BY c1, c2
----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to print so many lines? How about using limit to control the output? cc @mvanschellebeeck @alamb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was just a migrated test - should we track simplifying some of these tests in a separate PR/issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying during migration or in separate PR both ok to me. -- Seems during migration can reduce our workload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - simplified in 41fe83f35300

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large outputs there is some sort of sqllogictest standards:

https://duckdb.org/dev/sqllogictest/result_verification

mode output_hash

And

https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

The "hash-threshold" record sets a limit on the number of values that can appear in a result set. If the number of values exceeds this, then instead of recording each individual value in the full test script, an MD5 hash of all values is computed in stored. This makes the full test scripts much shorter, but at the cost of obscuring the results. If the hash-threshold is 0, then results are never hashed. A hash-threshold of 10 or 20 is recommended. During debugging, it is advantage to set the hash-threshold to zero so that all results can be seen.

@xudong963 xudong963 requested a review from alamb December 4, 2022 17:12
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 5, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the tests that were removed from rs and verified that there were corresponding tests in the .slt file. Thank you @mvanschellebeeck

# TODO: fix decimal places
query TIR
SELECT c1, c2, AVG(c3) FROM aggregate_test_100_by_sql GROUP BY CUBE (c1, c2) ORDER BY c1, c2
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large outputs there is some sort of sqllogictest standards:

https://duckdb.org/dev/sqllogictest/result_verification

mode output_hash

And

https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

The "hash-threshold" record sets a limit on the number of values that can appear in a result set. If the number of values exceeds this, then instead of recording each individual value in the full test script, an MD5 hash of all values is computed in stored. This makes the full test scripts much shorter, but at the cost of obscuring the results. If the hash-threshold is 0, then results are never hashed. A hash-threshold of 10 or 20 is recommended. During debugging, it is advantage to set the hash-threshold to zero so that all results can be seen.

@@ -274,187 +271,290 @@ SELECT approx_distinct(c9) AS a, approx_distinct(c9) AS b FROM aggregate_test_10
----
100 100

# TODO: csv_query_approx_percentile_cont
## This test executes the APPROX_PERCENTILE_CONT aggregation against the test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thank you for porting this over

@alamb alamb merged commit 237233f into apache:master Dec 5, 2022
@ursabot
Copy link

ursabot commented Dec 5, 2022

Benchmark runs are scheduled for baseline = c806cd1 and contender = 237233f. 237233f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants