Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10791: [Rust] StreamReader, read_dictionary duplicating schema info #8820

Conversation

carols10cents
Copy link
Contributor

The purpose of this PR is refactoring read_dictionary to only need one kind of Schema, which lets us then remove the find_dictionary_field function and the ipc_schema field on StreamReader by adding a way to look up schema fields that use a particular dictionary by ID.

I'm also resubmitting a change to the dict_id/dict_is_ordered methods on Field; I had submitted this to @nevi-me to become part of #8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype.

I had submitted this before but I think it got lost in a rebase
somewhere. I think this is more correct and informative.
This simplifies the code in `read_dictionary` and `StreamReader`;
the `ipc_schema` gets parsed and converted into a `Schema` that
`StreamReader` also holds onto, so use the `Schema` to find which fields
use dictionaries rather than keeping both around.
@github-actions
Copy link

github-actions bot commented Dec 2, 2020

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -489,7 +489,12 @@ fn array_from_json(
Ok(Arc::new(array))
}
DataType::Dictionary(key_type, value_type) => {
let dict_id = field.dict_id();
let dict_id = field.dict_id().ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise, I do remember this change, but I'm not sure of what happened; I presume while I was rebasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I remember I sent you a rebased PR to your branch that was... complicated... so I'm not surprised :) Thanks for the merge!

@nevi-me nevi-me closed this in 5ed4695 Dec 3, 2020
@carols10cents carols10cents deleted the simplify-dictionary-field-lookup branch December 3, 2020 14:16
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…info

The purpose of this PR is refactoring `read_dictionary` to only need one kind of `Schema`, which lets us then remove the `find_dictionary_field` function and the `ipc_schema` field on `StreamReader` by adding a way to look up schema fields that use a particular dictionary by ID.

I'm also resubmitting a change to the `dict_id`/`dict_is_ordered` methods on `Field`; I had submitted this to @nevi-me to become part of apache#8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype.

Closes apache#8820 from carols10cents/simplify-dictionary-field-lookup

Authored-by: Carol (Nichols || Goulding) <[email protected]>
Signed-off-by: Neville Dipale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants