Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary indexing in Chunk combinators #2630

Merged
merged 7 commits into from
Sep 23, 2021

Conversation

mpilquist
Copy link
Member

@mpilquist mpilquist commented Sep 22, 2021

Fixes #2603 and #2627. Also addresses the issue described here: circe/circe-fs2#273 (cc @samspills)

The Chunk.Queue type provides the ability to efficiently create a single chunk out of any number of individual chunks. In 2.x, Chunk.Queue was not itself a subtype of Chunk. After various individual chunks had been accumulated in to a Chunk.Queue, the queue had to be converted to a single chunk via toChunk. This operation returned a single array backed chunk. Unfortunately, this relied on reflection and various tricks to build the array backed chunk without access to a ClassTag at the call site and ultimately was unsound. We fixed this in #2181.

One of the main changes in #2181 was making Chunk.Queue a subtype of Chunk. Doing so meant that Chunk could no longer guarantee O(1) lookup by index -- rather, it was now O(number of chunks in chunk queue). When a chunk queue consists only of singleton chunks, that becomes O(n). We added a Chunk#compact operation which built a chunk backed by an array, potentially primitive array.

As a result of #2181, the various combinators that used to call Chunk.Queue#toChunk no longer had to, resulting in a significant performance improvement as they no longer had to copy all the accumulated elements to a new array backed chunk. This tradeoff normally pays off, though not in the case where the O(n) lookup ends up dominating. The performance regressions reported in #2603 and #2627 are examples of such cases.

So why not add a call to .compact in the implementation of operations like chunkN? In short, we can't. We don't have a ClassTag[O] in those cases so we can't create primitive arrays. This gets us right back to the initial issue #2181 was addressing -- unsoundness as a result of too clever reflection.

This PR takes a different approach. We maintain the worst case O(n) lookup and we leave chunkN/unconsN as-is. Instead, we reimplement all the combinators on Chunk to avoid indexed lookup. If a user wants O(1) indexed lookup, they must call compact.

This PR also adds a new compactUntagged operation which stores the elements in an Array[Any] (boxed).

@mpilquist mpilquist changed the title Remove unnecessary indexing in Chunk combinators and add new IndexedChunk type Remove unnecessary indexing in Chunk combinators Sep 23, 2021
@mpilquist mpilquist marked this pull request as ready for review September 23, 2021 12:54
@@ -583,33 +524,7 @@ object Chunk
if (v.isEmpty) empty
else if (v.size == 1) // Use size instead of tail.isEmpty as vectors know their size
singleton(v.head)
else new VectorChunk(v)

private final class VectorChunk[O](v: Vector[O]) extends Chunk[O] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical to IndexedSeqChunk

@SystemFw
Copy link
Collaborator

SystemFw commented Sep 23, 2021

So if I understand correctly, by using foreach in the combinators instead of apply directly, we can override foreach in Chunk.Queue to just traverse linearly once and keep the combinators O(N) rather than O(N^2) right?

@mpilquist
Copy link
Member Author

So if I understand correctly, by using foreach in the combinators instead of apply directly, we can override foreach in Chunk.Queue to just traverse linearly once and keep the combinators O(N) rather than O(N^2) right?

Yep, that's right -- likewise for iterator

Copy link
Collaborator

@SystemFw SystemFw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea for our internal code. We might still want to explore a better apply for client code, but that's an orthogonal concern, so I'd go ahead with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance degradation when using chunkN
3 participants