-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: add variant type support #11831
base: main
Are you sure you want to change the base?
Conversation
5a18496
to
68d690e
Compare
cc @rdblue, @RussellSpitzer, @flyrain and @JonasJ-ap. This is to add the changes in core to support variant type. |
@@ -166,6 +169,10 @@ public static String toJson(Schema schema, boolean pretty) { | |||
|
|||
private static Type typeFromJson(JsonNode json) { | |||
if (json.isTextual()) { | |||
if (VARIANT.equalsIgnoreCase(json.asText())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fromPrimitiveString
should handle this.
core/src/test/java/org/apache/iceberg/TestMetadataUpdateParser.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/avro/TestAvroSchemaProjection.java
Outdated
Show resolved
Hide resolved
68d690e
to
04958ca
Compare
b276d3f
to
fe6038a
Compare
@aihuaxu very important feature that will allow a lot more options for iceberg, thank you for your contribution |
@@ -61,6 +61,14 @@ private Types() {} | |||
private static final Pattern DECIMAL = | |||
Pattern.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)"); | |||
|
|||
public static Type typeFromTypeString(String typeString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variant isn't a primitive, but it is still better to reuse fromPrimitiveString
. Callers are using this to parse type names, not to restrict parsing to only primitive types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that if we want to have an additional method to avoid confusion (i.e. "I don't want to fromPrimitiveString
that because I need support for variant") then we can add a synonym. In that case, I think there's probably a better name, like fromString
. That's not great because it implies that it would support structs, maps, and lists, but so does typeFromTypeString
so it's at least more direct. Unfortunately, we don't have a word in Iceberg that means a type that is expressed in a single string (vs a JSON-defined type).
I would probably just leave this as fromPrimitiveString
and not worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding new typeFromTypeString()
, I change fromPrimitiveString()
to fromTypeString()
to avoid confusion. Let me know if that works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that fromPrimitiveString
is a little misleading so prefer fromTypeString
. But let me know and I can change back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's public API and renaming actually will cause incompatibility. I renamed back to fromPrimitiveString()
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it will introduces incompatibility since the return type changes. Let me know if that works or maybe we can introduce a new one fromTypeString()
as I did initially.
old: method org.apache.iceberg.types.Type.PrimitiveType org.apache.iceberg.types.Types::fromPrimitiveString(java.lang.String)
new: method org.apache.iceberg.types.Type org.apache.iceberg.types.Types::fromPrimitiveString(java.lang.String)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be +1 on fromTypeString
.
@@ -56,6 +56,10 @@ class BuildAvroProjection extends AvroCustomOrderSchemaVisitor<Schema, Schema.Fi | |||
@Override | |||
@SuppressWarnings("checkstyle:CyclomaticComplexity") | |||
public Schema record(Schema record, List<String> names, Iterable<Schema.Field> schemaIterable) { | |||
if (current.isVariantType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the corresponding visit
method should be updated to call a new visitor variant
method. That will be cleaner.
The visitor should look for the variant
logical type, so we will need to implement the logical type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This is to workaround before the logical type is added in Avro by using Iceberg variant type. Let me add a comment for this.
core/src/test/java/org/apache/iceberg/TestMetadataUpdateParser.java
Outdated
Show resolved
Hide resolved
e0a430f
to
1b56bf5
Compare
1b56bf5
to
6f570cd
Compare
7c3f60a
to
760ed7d
Compare
760ed7d
to
eb11b53
Compare
assertThat(variantSchema.getType()).isEqualTo(org.apache.avro.Schema.Type.RECORD); | ||
assertThat(variantSchema.getFields().size()).isEqualTo(2); | ||
assertThat(variantSchema.getField("metadata")).isNotNull(); | ||
assertThat(variantSchema.getField("value")).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have stronger assertions on the types of these fields (they should be bytes)?
This is to add some required changes in API and core module for Variant support, including:
Part of: #10392