-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17349: [C++] Allow casting map types #14198
Conversation
Mapping MapType field namesimport pyarrow as pa
# MapType
ty_named = pa.map_(pa.field("x", pa.int32(), nullable=False), pa.int32())
ty = pa.map_(pa.int32(), pa.int32())
arr_named = pa.array([[(1, 2), (2, 4)]], type=ty_named)
arr_named.type, arr_named.cast(ty).type Before (wrong) (MapType(map<int32 ('x'), int32>), MapType(map<int32 ('x'), int32>)) After (correct) (MapType(map<int32 ('x'), int32>), MapType(map<int32, int32>)) Mapping Map<T, List> namesty_named = pa.map_(pa.string(), pa.field("x", pa.list_(pa.field("x", pa.int32(), nullable=True)), nullable=False))
ty = pa.map_(pa.string(), pa.list_(pa.int32()))
arr_named = pa.array([[("string", [1, 2, 3])]], type=ty_named)
arr_named.type, arr_named.cast(ty).type Before pyarrow.lib.ArrowNotImplementedError: Unsupported cast from map<string, list<x: int32> ('x')> to map<string, list<item: int32>> (no available cast function for target type) After (MapType(map<string, list<x: int32> ('x')>), MapType(map<string, list<item: int32>>)) |
Failures seem to be related to Gandiva, not changes in this PR. |
cc @pitrou @jorisvandenbossche FYI one of the motivations for this is making sure it's easy to cast field names so the transition for Parquet to use compliant nested types is easier (ARROW-14196). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement @wjones127 . Here are some comments.
cpp/src/arrow/compute/cast.cc
Outdated
@@ -96,8 +96,11 @@ class CastMetaFunction : public MetaFunction { | |||
ARROW_ASSIGN_OR_RAISE(auto cast_options, ValidateOptions(options)); | |||
// args[0].type() could be a nullptr so check for that before | |||
// we do anything with it. | |||
// TODO: if types are equal except for field names of list types, we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to open a JIRA for this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be doing this in #13851
96499b4
to
3a16782
Compare
These CI failures are unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Just a couple more comments/questions about the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice improvement!
One question: in general we allow renaming list fields while casting, but for structs the fields have to match (since for structs the names are much more important / meaningful than for lists).
Here, you also allow renaming the fields when casting to a struct. I think I am fine with that, but just wanted to mention it explicitly, since it's a grey area between both cases mentioned above (for structs the names are important, but for a map type they also don't matter much, so the first cast to struct can ignore them?)
Yes, I was careful to make sure this only happens in the |
@pitrou could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @wjones127 .
Also I forgot to mention that we should update the list of available casts here:
https://arrow.apache.org/docs/dev/cpp/compute.html#conversions
be7bc90
to
58929e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @wjones127 !
Benchmark runs are scheduled for baseline = f25d88e and contender = 69aad53. 69aad53 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
MapType
->[MapType, list<struct>, large_list<struct>]
, with support for mapping the field names to new ones.Datum::View()
method to map an array to a new type.Datum::View()
in early return of cast dispatching (if types are "equal"). Right now this is useful for mapping field names for map arrays, but it could also be used for list types once we enable some way to check if list types are equal except for its field name. See also: https://issues.apache.org/jira/browse/ARROW-14999