-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer serialization of ::SubArray{T,N,A<:Array} #14625
Conversation
When dealing with SubArrays-of-Arrays, we currently make serialization more efficient by "trimming" any unused data. However, we were not careful to reconstruct the exact type, and when the type was listed as a type-parameter in another type this can lead to problems. This reconstructs the type exactly, making the trimming strategy safe. Note this only works for SubArrays-of-Arrays; we still serialize the entire parent array for any other kind of SubArray.
One potential issue for backporting: currently |
I could see that interacting with parallel code in unfortunate ways. That code might be currently sort-of working but making copies rather than subarrays? Would have to think about a test case. |
If you have an array and a SubArray of that array and serialize/deserialize both, I assume changing one no longer changes the other? While I can imagine cases where you don't want to serialize the whole array (e.g. splitting up a big array between workers), I'm not sure it's kosher for serialization to change semantics in this way... |
@simonster, yes, serialization breaks the coupling. It's been this way for ages; this PR doesn't change anything there. Formerly, not only would it break the coupling, but upon deserialization you'd get an We could alternatively consider dropping this optimization altogether and just serialize the whole parent array. Naturally that copies all the data in the parent. If what you're serializing is, e.g., a single image slice from an hour-long movie, you'll have to serialize the whole movie, and that's a pretty expensive operation. But now that we have SharedArrays, there are good ways around that problem. So, rather than trying to improve this method, it's a valid option to just delete it altogether. |
Link to the current code: Lines 193 to 201 in a6770b0
Array out when deserializing).
|
I think the current behavior is too surprising and we shouldn't try to trim the SubArrays, at least not by default. It would be nice to have some way to say that SubArrays should be trimmed when serialized (since e.g. SharedArrays won't work with worker processes on separate machines) but I'm not sure what that would look like. |
Are you saying that because, secretly, you want some way of preserving the link between the parent and the subarray when both get serialized? If we're not going to be happy until that's true, then I suspect there's little point changing one without the other, and we should just wait until we come up with a way to do that. If we don't think we can preserve the link, then is it really worth serializing the whole parent? Keep in mind that in the future, most subarrays will be created as |
Just ran into this again. I'd like to get this bug fixed, and I confess I'm not exactly sure I understand where people sit on this issue. So to clarify:
I'd be a bit unhappy about the last one, because we currently have a bug that needs fixing and the last choice seems unlikely to be a quick fix. |
This is definitely an improvement over the current behavior, so I think we should merge it. |
OK, I'll merge this and file a separate issue to act as a reminder to reevaluate behavior. |
Safer serialization of ::SubArray{T,N,A<:Array}
I don't know whether there will be another 0.4 release, but I just fixed a local bug by backporting this. |
Why wouldn't there be? Could you describe the bug a bit? I was hesitant to backport this because it's a bit of a behavior change. |
Fair enough. Let's take off the label, then. |
When dealing with SubArrays-of-Arrays, we currently make serialization more efficient by "trimming" any unused data. However, we were not careful to reconstruct the exact type, and when the type was listed as a type-parameter in another type this can lead to problems.
Demo of failure on current master:
Results:
This PR reconstructs the type exactly, making the trimming strategy safe. (And look ma, no generated functions!)
Note this only affects SubArrays-of-Arrays; as before, we still serialize the entire parent array for any other kind of SubArray.