-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Fix edge cases in array_slice #14489
Conversation
// Note, we actually test the contrapositive, index < -length, because negating a | ||
// negative will panic if the negative is equal to the smallest representable value | ||
// while negating a positive is always safe. | ||
if index < (O::zero() - O::one()) * len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anyone knows a simpler way of negating len
, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with it and I could not find anything better than this
39175da
to
4219dc9
Compare
This commit fixes the following edge cases in the array_slice function so that it's semantics match DuckDB: - When begin < 0 and -begin > length, begin is clamped to the beginning of the list. - When step < 0 and begin = end, then the result should be a list with the single element found at index begin/end. Fixes apache#10548
4219dc9
to
fdc8b71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jkosh44 -- I think this PR is nicely implemented, commented and tested 🏆
@@ -1941,12 +1941,12 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -4 | |||
query ?? | |||
select array_slice(make_array(1, 2, 3, 4, 5), -7, -2), array_slice(make_array('h', 'e', 'l', 'l', 'o'), -7, -3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified this is consisent with DuckDB
D select array_slice([1, 2, 3, 4, 5], -7, -2)
;
┌─────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -7, -2) │
│ int32[] │
├─────────────────────────────────────────────────────┤
│ [1, 2, 3, 4] │
└─────────────────────────────────────────────────────┘
D select array_slice(['h', 'e', 'l', 'l', 'o'], -7, -3);
┌───────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value('h', 'e', 'l', 'l', 'o'), -7, -3) │
│ varchar[] │
├───────────────────────────────────────────────────────────────┤
│ [h, e, l] │
└───────────────────────────────────────────────────────────────┘
[1] [h] | ||
|
||
query ?? | ||
select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -9223372036854775808, 1), list_slice(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'), -9223372036854775808, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D select array_slice([1, 2, 3, 4, 5], -2147483648, 1)
;
┌─────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -2147483648, 1) │
│ int32[] │
├─────────────────────────────────────────────────────────────┤
│ [1] │
└─────────────────────────────────────────────────────────────┘
|
||
# array_slice scalar function #25 (with negative step and equal indexes) | ||
query ?? | ||
select array_slice(make_array(1, 2, 3, 4, 5), 2, 2, -1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), 2, 2, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D select array_slice([1, 2, 3, 4, 5], 2, 2, -1);
┌───────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), 2, 2, -1) │
│ int32[] │
├───────────────────────────────────────────────────────┤
│ [2] │
└───────────────────────────────────────────────────────
@@ -1993,6 +1993,28 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 2, | |||
---- | |||
[2, 3, 4] [h, e] | |||
|
|||
# array_slice scalar function #24 (with first negative index larger than len) | |||
query ?? | |||
select array_slice(make_array(1, 2, 3, 4, 5), -2147483648, 1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), -2147483648, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I know this is consistent with the other tests in this file (so we shouldn't change it in this PR), but DataFusion now supports the []
array literal syntax too:
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ datafusion-cli
DataFusion CLI v45.0.0
> select array_slice([1, 2, 3, 4, 5], -7, -2);
+-------------------------------------------------------------------------------+
| make_array(Int64(1),Int64(2),Int64(3),Int64(4),Int64(5))[Int64(-7):Int64(-2)] |
+-------------------------------------------------------------------------------+
| [] |
+-------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.049 seconds.
// Note, we actually test the contrapositive, index < -length, because negating a | ||
// negative will panic if the negative is equal to the smallest representable value | ||
// while negating a positive is always safe. | ||
if index < (O::zero() - O::one()) * len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with it and I could not find anything better than this
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested some other edge cases that I had collected before, and they all passed. Thank you @jkosh44 very much.
This commit fixes the following edge cases in the array_slice function so that it's semantics match DuckDB:
When begin < 0 and -begin > length, begin is clamped to the beginning of the list.
When step < 0 and begin = end, then the result should be a list with the single element found at index begin/end.
Fixes
array_slice
can't correctly handle NULL parameters or some edge cases #10548Which issue does this PR close?
array_slice
can't correctly handle NULL parameters or some edge cases #10548.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the behavior of
array_slice
has changed.