-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9902: [Rust] [DataFusion] Add array() built-in function #8102
Conversation
@jorgecarleitao please rebase |
@jorgecarleitao do you already have followup plans for more array functions? some functions like |
This is pending #8108, as it requires a bug fix to pass the tests. |
@jhorstmann , those are great suggestions! I am not planning to work on that atm, as I wanted first to have an interface to declare UDFs with the same capabilities as built-in functions (#7967 and #8032). However, please add them to the issue tracker. You are of course also welcome to take them, if you have the time. They seem to be good entry-level issues, but of course the devil is always in the details ^_^ |
@jorgecarleitao I plan on reviewing this one tomorrow. |
let mut ctx = ExecutionContext::new(); | ||
ctx.register_table("test", Box::new(table)); | ||
let sql = "SELECT array(c1, cast(c2 as varchar)) FROM test"; | ||
let actual = execute(&mut ctx, sql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be good to have an assertion here looking at the data type of this array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have not been testing that the final schema is expected in any of these tests, and IMO we should, for all tests in this module. I will open an issue in Jira with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed in ARROW-10036
rust/datafusion/tests/sql.rs
Outdated
|
||
let mut r = Vec::with_capacity(*n as usize); | ||
for i in 0..*n { | ||
let array = array.value(row_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs null handling since elements within this array could have null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow:
- if
array.is_null(row_index)
, then line 613 covers it witharray.is_null(row_index)
- if the elements within the array are null, then line 613 covers it because we call the function recursively on each element, in line 672.
I moved that statement away from the loop to make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple minor comments
} | ||
DataType::FixedSizeList(_, n) => { | ||
let array = array.as_any().downcast_ref::<FixedSizeListArray>().unwrap(); | ||
let array = array.value(row_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need a null check here in case the whole array is null. I know we check at the start of this method for a null value within an array, but I don't see a check for a null array. Maybe there are guarantees that no arrays can be null? If so, it would be good to add a comment here to explain.
let array = array.as_any().downcast_ref::<FixedSizeListArray>().unwrap();
if array.is_null(row_index) {
// this case isn't handled
} else {
let array = array.value(row_index);
let mut r = Vec::with_capacity(*n as usize);
for i in 0..*n {
r.push(array_str(&array, i as usize));
}
format!("[{}]", r.join(","))
}
@andygrove , I rebased and added a comment to improve readability of the |
Oh ... no, this is my bad, I was misreading the code. The comment helped though. Thank you! |
This adds `array()` built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR. Closes apache#8102 from jorgecarleitao/array Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This adds `array()` built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR. Closes apache#8102 from jorgecarleitao/array Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This adds
array()
built-in function to most primitive types. For composite types, this is more challenging and I decided to scope out of this PR.