-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Expr::InList to Substrait::RexType #6604
Conversation
|
||
let init_val = make_binary_op_scalar_func( | ||
&substrait_expr, | ||
&substrait_list[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
substrait_list[0] -> .first()
? This will return Option so better to check if option is defined and return Err if not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- it is awesome to see our substrait support coming along so well
I looked at the substrait docs a little bit and it seems as if it may have native support for IN list:
https://substrait.io/expressions/specialized_record_expressions/#or-list-equality-expression
A specialized structure that is often used is a large list of possible values. In SQL, these are typically large IN lists. They can be composed from one or more fields. There are two common patterns, single value and multi value. In pseudocode they are represented as:
So perhaps ExprInList
could be serialized as SingularOrList
:
https://docs.rs/substrait/0.11.0/substrait/proto/expression/struct.SingularOrList.html
I wonder if @waynexia and @nseekhao have reaction or comment
@alamb's suggestion makes sense to me. For Interestingly, I inspected the official java implementation of substrait via its cli tool Isthmus, it seems like they don't use Here is the result I get from `SELECT a FROM t where a IN ('a', 'b')`{
"extensionUris": [{
"extensionUriAnchor": 1,
"uri": "/functions_boolean.yaml"
}, {
"extensionUriAnchor": 2,
"uri": "/functions_comparison.yaml"
}],
"extensions": [{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 0,
"name": "or:bool"
}
}, {
"extensionFunction": {
"extensionUriReference": 2,
"functionAnchor": 1,
"name": "equal:any_any"
}
}],
"relations": [{
"root": {
"input": {
"project": {
"common": {
"emit": {
"outputMapping": [3]
}
},
"input": {
"filter": {
"common": {
"direct": {
}
},
"input": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": ["A", "B", "C"],
"struct": {
"types": [{
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}, {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}, {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}],
"typeVariationReference": 0,
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": ["T"]
}
}
},
"condition": {
"scalarFunction": {
"functionReference": 0,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"scalarFunction": {
"functionReference": 1,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}
}, {
"value": {
"cast": {
"type": {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"input": {
"literal": {
"fixedChar": "a",
"nullable": false,
"typeVariationReference": 0
}
},
"failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
}
}
}],
"options": []
}
}
}, {
"value": {
"scalarFunction": {
"functionReference": 1,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}
}, {
"value": {
"cast": {
"type": {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"input": {
"literal": {
"fixedChar": "b",
"nullable": false,
"typeVariationReference": 0
}
},
"failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
}
}
}],
"options": []
}
}
}],
"options": []
}
}
}
},
"expressions": [{
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}]
}
},
"names": ["A"]
}
}],
"expectedTypeUrls": []
}
`SELECT a FROM t where a not IN ('a', 'b')`{
"extensionUris": [{
"extensionUriAnchor": 1,
"uri": "/functions_boolean.yaml"
}, {
"extensionUriAnchor": 2,
"uri": "/functions_comparison.yaml"
}],
"extensions": [{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 0,
"name": "not:bool"
}
}, {
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 1,
"name": "or:bool"
}
}, {
"extensionFunction": {
"extensionUriReference": 2,
"functionAnchor": 2,
"name": "equal:any_any"
}
}],
"relations": [{
"root": {
"input": {
"project": {
"common": {
"emit": {
"outputMapping": [3]
}
},
"input": {
"filter": {
"common": {
"direct": {
}
},
"input": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": ["A", "B", "C"],
"struct": {
"types": [{
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}, {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}, {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
}],
"typeVariationReference": 0,
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": ["T"]
}
}
},
"condition": {
"scalarFunction": {
"functionReference": 0,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"scalarFunction": {
"functionReference": 1,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"scalarFunction": {
"functionReference": 2,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}
}, {
"value": {
"cast": {
"type": {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"input": {
"literal": {
"fixedChar": "a",
"nullable": false,
"typeVariationReference": 0
}
},
"failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
}
}
}],
"options": []
}
}
}, {
"value": {
"scalarFunction": {
"functionReference": 2,
"args": [],
"outputType": {
"bool": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}
}, {
"value": {
"cast": {
"type": {
"string": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_NULLABLE"
}
},
"input": {
"literal": {
"fixedChar": "b",
"nullable": false,
"typeVariationReference": 0
}
},
"failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
}
}
}],
"options": []
}
}
}],
"options": []
}
}
}],
"options": []
}
}
}
},
"expressions": [{
"selection": {
"directReference": {
"structField": {
"field": 0
}
},
"rootReference": {
}
}
}]
}
},
"names": ["A"]
}
}],
"expectedTypeUrls": []
} |
@waynexia To wrap OrList with Expr::Not, I think we need to transform Expr::Not to one of the RexType, how could we do this? Should we need Operator::Not and wrap it with RexType::ScalarFunction? |
@jayzhan211 I believe
|
It seems Operator are all for two arguments, so instead of introducing Operator::Not, I add Producer: Transform to Substrait::ScalarFunction for NOT, e.g. ScalarFunction('not', InList) |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
4192e31
to
dd2a6c2
Compare
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thanks @jayzhan211 👍
Thank you @jayzhan211 and @waynexia -- I took the liberty of merging this PR with the latest master to resolve (that I caused with #6686) |
I plan to merge this PR when the tests pass |
Which issue does this PR close?
Ref #6410
Closes #.
Rationale for this change
Expr::InList to Substrait::RexType is not supported yet.
What changes are included in this PR?
Transform InList to OrList (AndList) then convert them to Rex with
make_binary_op_scalar_func
, since there is no other equivalent RexType for InListAre these changes tested?
Unit test in datafusion/substrait/tests/roundtrip_logical_plan.rs
Are there any user-facing changes?