-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10791: [Rust] StreamReader, read_dictionary duplicating schema info #8820
ARROW-10791: [Rust] StreamReader, read_dictionary duplicating schema info #8820
Conversation
I had submitted this before but I think it got lost in a rebase somewhere. I think this is more correct and informative.
This simplifies the code in `read_dictionary` and `StreamReader`; the `ipc_schema` gets parsed and converted into a `Schema` that `StreamReader` also holds onto, so use the `Schema` to find which fields use dictionaries rather than keeping both around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -489,7 +489,12 @@ fn array_from_json( | |||
Ok(Arc::new(array)) | |||
} | |||
DataType::Dictionary(key_type, value_type) => { | |||
let dict_id = field.dict_id(); | |||
let dict_id = field.dict_id().ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologise, I do remember this change, but I'm not sure of what happened; I presume while I was rebasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I remember I sent you a rebased PR to your branch that was... complicated... so I'm not surprised :) Thanks for the merge!
…info The purpose of this PR is refactoring `read_dictionary` to only need one kind of `Schema`, which lets us then remove the `find_dictionary_field` function and the `ipc_schema` field on `StreamReader` by adding a way to look up schema fields that use a particular dictionary by ID. I'm also resubmitting a change to the `dict_id`/`dict_is_ordered` methods on `Field`; I had submitted this to @nevi-me to become part of apache#8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype. Closes apache#8820 from carols10cents/simplify-dictionary-field-lookup Authored-by: Carol (Nichols || Goulding) <[email protected]> Signed-off-by: Neville Dipale <[email protected]>
The purpose of this PR is refactoring
read_dictionary
to only need one kind ofSchema
, which lets us then remove thefind_dictionary_field
function and theipc_schema
field onStreamReader
by adding a way to look up schema fields that use a particular dictionary by ID.I'm also resubmitting a change to the
dict_id
/dict_is_ordered
methods onField
; I had submitted this to @nevi-me to become part of #8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype.