Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Java collections in GenericDataFile to fix Kryo serialization #546

Merged
merged 2 commits into from
Oct 15, 2019

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Oct 14, 2019

Kryo can't handle guava ImmutableList and ImmutableMap. This commit avoids using those classes in DataFile. The trade-off is that DataFile instances that are copied more than once will not reuse the same immutable metrics maps.

Fixes #446

@jzhuge
Copy link
Member

jzhuge commented Oct 14, 2019

Thanks for the fix, let me try it.

Copy link
Contributor

@rdsr rdsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@jzhuge jzhuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@aokolnychyi
Copy link
Contributor

Seems like there are some unused imports in TestKryoSerialization.

private static <E> List<E> copyList(List<E> toCopy) {
List<E> copy = Lists.newArrayListWithExpectedSize(toCopy.size());
copy.addAll(toCopy);
return copy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenericDataFile also wraps the list into Collections.unmodifiableList. Does it make sense here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is necessary. In the long term, we should move these back to using ImmutableList. Right now, we just need to fix the serialization problem. There are other mutable instances being passed in from Metrics as well. We can fix all that later, but unblock the release now.

Assert.assertEquals("Should match the serialized record upper bounds",
dataFile.upperBounds(), result.upperBounds());
Assert.assertEquals("Should match the serialized record key metadata",
dataFile.keyMetadata(), result.keyMetadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we set something in keyMetadata? There is some custom serialization logic to handle this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is empty.

Kryo can't handle ByteBuffers, but Spark's Kryo can. That's why this test uses Kryo provided by Spark. That's how this field is handled.

@rdblue rdblue merged commit 1bfab96 into apache:master Oct 15, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Oct 15, 2019

Thanks @aokolnychyi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KryoException when writing Iceberg tables in Spark
4 participants