fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait #14860
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #14348
Rationale for this change
Substrait plans are intended to be interpreted literally. When you see plan nodes like:
The output mapping (e.g.
[0, 3]
) contains ordinals representing the offset of the target expression(s) within the [input, output] list. If the DataFusion LogicalPlanBuilder is introducing additional input expressions, this violates the plan's intent and will produce the incorrect output mappings. Please see the issue for a concrete example.What changes are included in this PR?
add_implicit_group_by_exprs
option to the logical plan builder. It is disabled by default.Are these changes tested?
Added a multilayer aggregation Substrait example. The first aggregation produces a unique column with a functional dependency. Despite this, the second aggregation must not introduce any additional grouping expressions.
There should be no changes in the non-Substrait paths.
Are there any user-facing changes?
No.
Appendix
This is a continuation of #14553.