-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark ReadTask is expensive to serialize #553
Comments
I can confirm the issue is resolved if we avoid serializing |
As a short-term solution, we can broadcast @rdblue thoughts? |
Using a broadcast sounds good to me for now. Can you open a PR for this? |
Will open a PR today |
In some Spark jobs, we see a substantial scheduler delay. I assume it happens in
TaskSetManager
when Spark serializes IcebergReadTask
. The latter contains a couple of large strings (if you have a lot of columns) and an instance ofFileIO
(which can contain a full Hadoop conf).The text was updated successfully, but these errors were encountered: