-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark custom Kryo registrator #549
Conversation
build.gradle
Outdated
@@ -429,6 +429,8 @@ project(':iceberg-spark') { | |||
compile project(':iceberg-parquet') | |||
compile project(':iceberg-hive') | |||
|
|||
compile 'de.javakaffee:kryo-serializers' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be included in iceberg-spark-runtime, which means that we will need to make sure we add it to the shaded LICENSE and NOTICE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional LICENSE and NOTICE.
.build(); | ||
|
||
private SparkConf conf = new SparkConf() | ||
.set("spark.kryo.registrator", SparkKryoRegistrator.class.getCanonicalName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the registration to Spark's KryoSerializer
from IcebergSource
? It would be nice if users didn't have to add config properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the idea. Let me explore. Might have to instantiate the global SparkEnv
.
versions.lock
Outdated
@@ -5,8 +5,10 @@ aopalliance:aopalliance:1.0 (1 constraints: 170a83ac) | |||
asm:asm:3.1 (2 constraints: 4f19c3c6) | |||
com.carrotsearch:hppc:0.7.2 (1 constraints: f70cda14) | |||
com.clearspring.analytics:stream:2.7.0 (1 constraints: 1a0dd136) | |||
com.esotericsoftware:kryo:4.0.2 (1 constraints: 720da324) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to substitute kryo
with kryo-shaded
?
I took a quick stab at gradle dependency substitution but couldn't get it going because --write-locks
still output kryo, not kryo-shaded.
9b0b7ff
to
4928991
Compare
Rebase after merging #546. TODO:
|
This PR will also unblock #553 as we need to handle |
869ea31
to
93eafaa
Compare
Spark KryoSerializer calls Twitter's |
I think we want to avoid requiring a custom registrator for Spark to work with Kryo. Instead, we will need to use non-guava collections in places that may be serialized with Kryo. @jzhuge, @aokolnychyi, should we close this or is there still value in having a custom registrator even if we try to avoid needing it? |
Ok to close. |
Are we going to replace guava maps with non-guava ones? There are bunches of place using guava's Immutable Map. For example, here are 4 guava maps in class TableMetadata. We hit this issue when we were trying to broadcast one of them.
|
What version of kyro do you use? Newer versions support ImmutableMap. |
Thanks for the information. Nice to see you here, @jzhuge. We are using kryo-shaded-4.0.2.jar in Spark. Which version has the change in https://github.com/magro/kryo-serializers/blob/master/src/main/java/de/javakaffee/kryoserializers/guava/ImmutableMapSerializer.java? |
Oops, I referenced the wrong repo. If possible, you can add the following code to
Be sure to add dependency on |
Hi @jzhuge, not sure I understand. Do I add these lines to my Spark application or Spark itself?
|
Spark itself if you can |
Kryo can't handle guava ImmutableList and ImmutableMap. This commit add a Spark custom Kryo registrator for these classes used by GenericDataFile.
Unit test
TestKryoSerialization
adapted from the one in #546.A downstream project works fine after adding this Spark conf:
Fixes #446.
This PR replaces #546 that may be merged first before the release.