-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Java collections in GenericDataFile to fix Kryo serialization #546
Conversation
Thanks for the fix, let me try it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Seems like there are some unused imports in |
private static <E> List<E> copyList(List<E> toCopy) { | ||
List<E> copy = Lists.newArrayListWithExpectedSize(toCopy.size()); | ||
copy.addAll(toCopy); | ||
return copy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenericDataFile
also wraps the list into Collections.unmodifiableList
. Does it make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is necessary. In the long term, we should move these back to using ImmutableList. Right now, we just need to fix the serialization problem. There are other mutable instances being passed in from Metrics as well. We can fix all that later, but unblock the release now.
Assert.assertEquals("Should match the serialized record upper bounds", | ||
dataFile.upperBounds(), result.upperBounds()); | ||
Assert.assertEquals("Should match the serialized record key metadata", | ||
dataFile.keyMetadata(), result.keyMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we set something in keyMetadata
? There is some custom serialization logic to handle this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is empty.
Kryo can't handle ByteBuffers, but Spark's Kryo can. That's why this test uses Kryo provided by Spark. That's how this field is handled.
Thanks @aokolnychyi! |
Kryo can't handle guava ImmutableList and ImmutableMap. This commit avoids using those classes in DataFile. The trade-off is that DataFile instances that are copied more than once will not reuse the same immutable metrics maps.
Fixes #446