-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic when GROUP BY column order doesn't match USING column order #4873
Comments
Notes:If I comment out the The physical plans are very different. Here's an online diff, where the Left is the working query and Right is the query with USING column order flipped: https://www.diffchecker.com/3jfwJyrn/ The physical plan for the broken query seems to have incorrectly flipped This probably explains the This {
let left_array = l.as_any().downcast_ref::<UInt64Array>().unwrap();
let right_array = r.as_any().downcast_ref::<UInt64Array>().unwrap();
match (left_array.is_null(left), right_array.is_null(right)) {
(false, false) => left_array.value(left) == right_array.value(right),
(true, true) => null_equals_null,
_ => false,
}
} Which performs array type downcasts, and one of these probably fails because the schema was incorrect. |
Thank you for the detailed explanation. This kind of thing should have a test, I will keep an eye on the fix PR to make sure we have a test to avoid regressions in the future. |
Thanks @ozankabak, |
I think I have it sorted out in #4878 |
Describe the bug
VegaFusion implements the Vega impute transform with a query that uses a CROSS JOIN to build a table that contains all combinations of the unique values of certain input columns, then a LEFT OUTER JOIN to a subquery that performs a GROUP BY, grouping by these same columns. This LEFT OUTER JOIN includes a USING constraint with the same columns that are used in the GROUP BY subquery.
When the order of the columns in the GROUP BY doesn't match the order of the columns in the USING constraint, the query panics. If the columns are specified in the same order, the query completes successfully.
Example
For example, here is an input table with 3 columns.
Here is a working query:
The result contains one row for every combination of the unique values in
col1
andcol2
. Thesum_col3
columns contains the sum of the inputcol3
for that combination ofcol1
andcol2
, and zeros are filled in for combinations that weren't present in the input table.Now modify the query by flipping the order of
col1
andcol2
in the USING constraint (FromUSING("col1", "col2")
toUSING("col2", "col1")
). Now the query panics.If the order of the columns in the GROUP BY query is also flipped (from
GROUP BY "col1", "col2"
toGROUP BY "col2", "col1"
) then the query works again:To Reproduce
Here is a Rust test that implements the two queries above:
Output
Expected behavior
These two queries should both complete successfully, regardless of whether the columns are specified in the same order in the GROUP BY and USING constraints.
The text was updated successfully, but these errors were encountered: