-
-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue79 pickle 2 #81
Issue79 pickle 2 #81
Conversation
Current coverage is 100% (diff: 100%)@@ master #81 diff @@
===================================
Files 8 8
Lines 398 416 +18
Methods 0 0
Messages 0 0
Branches 88 93 +5
===================================
+ Hits 398 416 +18
Misses 0 0
Partials 0 0
|
This will bring a slight performance penalty to pickling (details at the end). This is not just theoretical for me, we pickle stuff at the end of every request. Now obviously slower is better than not working, but I would just suggest adding a special case for Python 3: add the special methods iff slots=True and (Python2 or frozen=True). You can look in Benchmark example:
(pickling doesn't seem to work with just creating the class in the timeit invocation) |
It gets more complicated: pickle protocol 2 and up support slots natively. Python 3.5 uses protocol 3 by defaults. Python 2 only supports protocol 0, 1 and 2 and defaults to 0. So do I:
1 causes a performance downgrade with Python 3 and probably with Python 2 using proto2. 2 means Python 3 wouldn't work if pickle is used with protocol 0 or 1. It probably slows down Python 2 if proto2 is used. 3 is more flexible, but could be more problematic, as a library could define classes with attrs that are not compatible with your pickle protocol of choice. 4 is probably the best. It would work like option 2 by default, but it would be possible to say, at the beginning of your program, "I use protocol 2" even though I use Python 2, so don't put getstate that would slow me down. But most of the time, you wouldn't need to think about it. What do you think? |
Before we consider it, is it possible, and if so, how convenient would it be for you and any other PySpark user to use protocol version 2 on Python 2.7? If it's no problem, we could just document the requirement. |
Great piece of research, btw. |
PySpark actually uses protocol 2. So it only had an issue because it tried to serialize @hynek what do you think? I don't want to force a performance penalty on everybody. A few more options:
|
My personal recommendations (identical to @mathieulongtin except for the third bullet):
Minimal performance impact (only for classes that are both slots and frozen, and these don't work without it anyway), and I think requiring pickle protocol >= 2 is not a great burden. The fact of the matter is, ordinary (i.e. non-attrs) slot classes require a custom Of course @hynek's is the last word. :) |
Requiring protocol 2 sounds good to me. |
I applied the last suggestion by @Tinche. I consider this complete now. It works fine with PySpark. |
I’ve made a few minor adjustments and merged it. Thanks to everyone involved! I don’t care about pickles myself so it’s great to have domain experts. :) |
Fix for issue #79. Add
__getstate__
and__setstate__
to classes withslots=True
, includingattr.Attribute
. Added test to make sure objects work with pickle.