Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9902: [Rust] [DataFusion] Add array() built-in function #8102

Closed
wants to merge 4 commits into from
Closed

ARROW-9902: [Rust] [DataFusion] Add array() built-in function #8102

wants to merge 4 commits into from

Conversation

jorgecarleitao
Copy link
Member

This adds array() built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR.

@github-actions
Copy link

github-actions bot commented Sep 2, 2020

@andygrove
Copy link
Member

@jorgecarleitao please rebase

@jhorstmann
Copy link
Contributor

@jorgecarleitao do you already have followup plans for more array functions? some functions like array_first, array_last, array_index_of or array_contains` would be useful for me too.

@jorgecarleitao
Copy link
Member Author

This is pending #8108, as it requires a bug fix to pass the tests.

@jorgecarleitao
Copy link
Member Author

@jhorstmann , those are great suggestions!

I am not planning to work on that atm, as I wanted first to have an interface to declare UDFs with the same capabilities as built-in functions (#7967 and #8032).

However, please add them to the issue tracker. You are of course also welcome to take them, if you have the time. They seem to be good entry-level issues, but of course the devil is always in the details ^_^

@andygrove
Copy link
Member

@jorgecarleitao I plan on reviewing this one tomorrow.

let mut ctx = ExecutionContext::new();
ctx.register_table("test", Box::new(table));
let sql = "SELECT array(c1, cast(c2 as varchar)) FROM test";
let actual = execute(&mut ctx, sql);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be good to have an assertion here looking at the data type of this array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have not been testing that the final schema is expected in any of these tests, and IMO we should, for all tests in this module. I will open an issue in Jira with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed in ARROW-10036


let mut r = Vec::with_capacity(*n as usize);
for i in 0..*n {
let array = array.value(row_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs null handling since elements within this array could have null values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow:

  • if array.is_null(row_index), then line 613 covers it with array.is_null(row_index)
  • if the elements within the array are null, then line 613 covers it because we call the function recursively on each element, in line 672.

I moved that statement away from the loop to make it more explicit.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple minor comments

}
DataType::FixedSizeList(_, n) => {
let array = array.as_any().downcast_ref::<FixedSizeListArray>().unwrap();
let array = array.value(row_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need a null check here in case the whole array is null. I know we check at the start of this method for a null value within an array, but I don't see a check for a null array. Maybe there are guarantees that no arrays can be null? If so, it would be good to add a comment here to explain.

let array = array.as_any().downcast_ref::<FixedSizeListArray>().unwrap();
if array.is_null(row_index) {
  // this case isn't handled
} else {
  let array = array.value(row_index);
  let mut r = Vec::with_capacity(*n as usize);
  for i in 0..*n {
    r.push(array_str(&array, i as usize));
  }
  format!("[{}]", r.join(","))
}

@jorgecarleitao
Copy link
Member Author

@andygrove , I rebased and added a comment to improve readability of the array_str. Note that this conflicts with the Min/Max of a string, #8215 . Thank you for your patience here 🥇

@andygrove
Copy link
Member

@andygrove , I rebased and added a comment to improve readability of the array_str. Note that this conflicts with the Min/Max of a string, #8215 . Thank you for your patience here 1st_place_medal

Oh ... no, this is my bad, I was misreading the code. The comment helped though. Thank you!

@andygrove andygrove closed this in 24a4a44 Sep 20, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
This adds `array()` built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR.

Closes apache#8102 from jorgecarleitao/array

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
@jorgecarleitao jorgecarleitao deleted the array branch October 28, 2020 04:18
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This adds `array()` built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR.

Closes apache#8102 from jorgecarleitao/array

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants