-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative merge algorithm without the builder #78
Conversation
@jpivarski , the code in akimbo.io includes, for now, both the previous The merge function takes a trivial amount of time compared to the other things going on in the join function, so I am not sure if this is actually any faster at all. Am I missing something? |
That There is an ak.numba.GrowableBuffer that you can use instead of over-allocating and trimming. (Does trim really release the memory for other objects, or is it just a slice that holds a reference to the original allocation as its In principle, a numba.typed.List of numbers is the same kind of thing as a GrowableBuffer, and presumably this is better optimized than our GrowableBuffer. It could be because our GrowableBuffer passes around two array objects, which are separately reference-counted, while Numba's typed list would have only one reference count. In both the Numba-LayoutBuilder project and yours, we're seeing the dynamic ArrayBuilder being faster than it's supposed to be. It's even accessed through external pointers (no possibility for LLVM to optimize), so I really don't know what's going on there. It shouldn't be too hard to beat, if there aren't other bottlenecks elsewhere, hiding the performance of this part. |
Yes it does: it must own the memory and the memory must be contiguous. You would pass a flag asserting that no other reference points to the same array (although python references would be fine and updated in place). |
do you think there's any point in this PR, then? At scale, the over-allocation would be costly, so I would need one of the growable implementations anyway. |
I don't think there's anything wrong with the PR. In principle, one should be able to do better than ArrayBuilder. I've always taken it as some issue hiding in the implementation of As an alternative to the alternative, can the merge be implemented by creating "tags" and "index" arrays of integers using Numba and then pass them into ak.contents.UnionArray.simplified to mix them appropriately? The return value of this function is not necessarily a union array; it tries to homogenize types. |
Maybe next time! I think this is nice for now. |
No description provided.