Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Fix edge cases in array_slice #14489

Merged
merged 1 commit into from
Feb 6, 2025
Merged

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Feb 4, 2025

This commit fixes the following edge cases in the array_slice function so that it's semantics match DuckDB:

Which issue does this PR close?

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, the behavior of array_slice has changed.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 4, 2025
// Note, we actually test the contrapositive, index < -length, because negating a
// negative will panic if the negative is equal to the smallest representable value
// while negating a positive is always safe.
if index < (O::zero() - O::one()) * len {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone knows a simpler way of negating len, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with it and I could not find anything better than this

@jkosh44 jkosh44 force-pushed the array-slice-edge branch 2 times, most recently from 39175da to 4219dc9 Compare February 4, 2025 19:03
This commit fixes the following edge cases in the array_slice function
so that it's semantics match DuckDB:

  - When begin < 0 and -begin > length, begin is clamped to the
    beginning of the list.
  - When step < 0 and begin = end, then the result should be a list with
    the single element found at index begin/end.

Fixes apache#10548
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkosh44 -- I think this PR is nicely implemented, commented and tested 🏆

@@ -1941,12 +1941,12 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -4
query ??
select array_slice(make_array(1, 2, 3, 4, 5), -7, -2), array_slice(make_array('h', 'e', 'l', 'l', 'o'), -7, -3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this is consisent with DuckDB

D select array_slice([1, 2, 3, 4, 5], -7, -2)
  ;
┌─────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -7, -2) │
│                       int32[]                       │
├─────────────────────────────────────────────────────┤
│ [1, 2, 3, 4]                                        │
└─────────────────────────────────────────────────────┘

D select array_slice(['h', 'e', 'l', 'l', 'o'], -7, -3);
┌───────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value('h', 'e', 'l', 'l', 'o'), -7, -3) │
│ varchar[] │
├───────────────────────────────────────────────────────────────┤
│ [h, e, l] │
└───────────────────────────────────────────────────────────────┘

[1] [h]

query ??
select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), -9223372036854775808, 1), list_slice(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'), -9223372036854775808, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D select array_slice([1, 2, 3, 4, 5], -2147483648, 1)
  ;
┌─────────────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), -2147483648, 1) │
│                           int32[]                           │
├─────────────────────────────────────────────────────────────┤
│ [1]                                                         │
└─────────────────────────────────────────────────────────────┘


# array_slice scalar function #25 (with negative step and equal indexes)
query ??
select array_slice(make_array(1, 2, 3, 4, 5), 2, 2, -1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), 2, 2, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D select array_slice([1, 2, 3, 4, 5], 2, 2, -1);
┌───────────────────────────────────────────────────────┐
│ array_slice(main.list_value(1, 2, 3, 4, 5), 2, 2, -1) │
│                        int32[]                        │
├───────────────────────────────────────────────────────┤
│ [2]                                                   │
└───────────────────────────────────────────────────────

@@ -1993,6 +1993,28 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 2,
----
[2, 3, 4] [h, e]

# array_slice scalar function #24 (with first negative index larger than len)
query ??
select array_slice(make_array(1, 2, 3, 4, 5), -2147483648, 1), list_slice(make_array('h', 'e', 'l', 'l', 'o'), -2147483648, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I know this is consistent with the other tests in this file (so we shouldn't change it in this PR), but DataFusion now supports the [] array literal syntax too:

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ datafusion-cli
DataFusion CLI v45.0.0
> select array_slice([1, 2, 3, 4, 5], -7, -2);
+-------------------------------------------------------------------------------+
| make_array(Int64(1),Int64(2),Int64(3),Int64(4),Int64(5))[Int64(-7):Int64(-2)] |
+-------------------------------------------------------------------------------+
| []                                                                            |
+-------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.049 seconds.

// Note, we actually test the contrapositive, index < -length, because negating a
// negative will panic if the negative is equal to the smallest representable value
// while negating a positive is always safe.
if index < (O::zero() - O::one()) * len {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with it and I could not find anything better than this

@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 5, 2025

I think this PR is nicely implemented, commented and tested 🏆

Thank you!

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested some other edge cases that I had collected before, and they all passed. Thank you @jkosh44 very much.

@jonahgao jonahgao merged commit 62e23a2 into apache:main Feb 6, 2025
26 checks passed
@jkosh44 jkosh44 deleted the array-slice-edge branch February 6, 2025 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

array_slice can't correctly handle NULL parameters or some edge cases
3 participants