Add some more aggregate sqllogictests and remove rust tests #4505

mvanschellebeeck · 2022-12-04T14:46:50Z

Ports more aggregate tests from aggregate.rs and removes the existing ones.

Work towards closing #4495

xudong963 · 2022-12-04T15:17:10Z

datafusion/core/tests/sqllogictests/test_files/aggregate.slt

-# TODO: fix decimal places
+query TIR
+SELECT c1, c2, AVG(c3) FROM aggregate_test_100_by_sql GROUP BY CUBE (c1, c2) ORDER BY c1, c2
+----


Do we still need to print so many lines? How about using limit to control the output? cc @mvanschellebeeck @alamb

Yeah this was just a migrated test - should we track simplifying some of these tests in a separate PR/issue?

Simplifying during migration or in separate PR both ok to me. -- Seems during migration can reduce our workload

👍 - simplified in 41fe83f35300

For large outputs there is some sort of sqllogictest standards:

https://duckdb.org/dev/sqllogictest/result_verification

mode output_hash

And

https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

The "hash-threshold" record sets a limit on the number of values that can appear in a result set. If the number of values exceeds this, then instead of recording each individual value in the full test script, an MD5 hash of all values is computed in stored. This makes the full test scripts much shorter, but at the cost of obscuring the results. If the hash-threshold is 0, then results are never hashed. A hash-threshold of 10 or 20 is recommended. During debugging, it is advantage to set the hash-threshold to zero so that all results can be seen.

alamb

I reviewed the tests that were removed from rs and verified that there were corresponding tests in the .slt file. Thank you @mvanschellebeeck

alamb · 2022-12-05T17:58:55Z

datafusion/core/tests/sqllogictests/test_files/aggregate.slt

-# TODO: fix decimal places
+query TIR
+SELECT c1, c2, AVG(c3) FROM aggregate_test_100_by_sql GROUP BY CUBE (c1, c2) ORDER BY c1, c2
+----


For large outputs there is some sort of sqllogictest standards:

https://duckdb.org/dev/sqllogictest/result_verification

mode output_hash

And

https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

The "hash-threshold" record sets a limit on the number of values that can appear in a result set. If the number of values exceeds this, then instead of recording each individual value in the full test script, an MD5 hash of all values is computed in stored. This makes the full test scripts much shorter, but at the cost of obscuring the results. If the hash-threshold is 0, then results are never hashed. A hash-threshold of 10 or 20 is recommended. During debugging, it is advantage to set the hash-threshold to zero so that all results can be seen.

alamb · 2022-12-05T18:04:56Z

datafusion/core/tests/sqllogictests/test_files/aggregate.slt

@@ -274,187 +271,290 @@ SELECT approx_distinct(c9) AS a, approx_distinct(c9) AS b FROM aggregate_test_10
 ----
 100 100

-# TODO: csv_query_approx_percentile_cont
+## This test executes the APPROX_PERCENTILE_CONT aggregation against the test


👍 thank you for porting this over

ursabot · 2022-12-05T18:17:57Z

Benchmark runs are scheduled for baseline = c806cd1 and contender = 237233f. 237233f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

mvanschellebeeck added 2 commits December 4, 2022 09:25

Expand median tests + fix floats

2e53ee7

Remove rust tests

ba5948c

github-actions bot added the core Core DataFusion crate label Dec 4, 2022

xudong963 reviewed Dec 4, 2022

View reviewed changes

xudong963 requested a review from alamb December 4, 2022 17:12

simplify csv_query_rollup_avg test

41fe83f

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 5, 2022

alamb mentioned this pull request Dec 5, 2022

Remove Option from window frame #4516

Merged

alamb approved these changes Dec 5, 2022

View reviewed changes

alamb merged commit 237233f into apache:master Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some more aggregate sqllogictests and remove rust tests #4505

Add some more aggregate sqllogictests and remove rust tests #4505

mvanschellebeeck commented Dec 4, 2022

xudong963 Dec 4, 2022

mvanschellebeeck Dec 4, 2022

xudong963 Dec 4, 2022

mvanschellebeeck Dec 5, 2022

alamb Dec 5, 2022

alamb left a comment

alamb Dec 5, 2022

alamb Dec 5, 2022

ursabot commented Dec 5, 2022

Add some more aggregate sqllogictests and remove rust tests #4505

Add some more aggregate sqllogictests and remove rust tests #4505

Conversation

mvanschellebeeck commented Dec 4, 2022

xudong963 Dec 4, 2022

Choose a reason for hiding this comment

mvanschellebeeck Dec 4, 2022

Choose a reason for hiding this comment

xudong963 Dec 4, 2022

Choose a reason for hiding this comment

mvanschellebeeck Dec 5, 2022

Choose a reason for hiding this comment

alamb Dec 5, 2022

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Dec 5, 2022

Choose a reason for hiding this comment

alamb Dec 5, 2022

Choose a reason for hiding this comment

ursabot commented Dec 5, 2022