Remove unnecessary indexing in Chunk combinators #2630

mpilquist · 2021-09-22T19:57:57Z

Fixes #2603 and #2627. Also addresses the issue described here: circe/circe-fs2#273 (cc @samspills)

The Chunk.Queue type provides the ability to efficiently create a single chunk out of any number of individual chunks. In 2.x, Chunk.Queue was not itself a subtype of Chunk. After various individual chunks had been accumulated in to a Chunk.Queue, the queue had to be converted to a single chunk via toChunk. This operation returned a single array backed chunk. Unfortunately, this relied on reflection and various tricks to build the array backed chunk without access to a ClassTag at the call site and ultimately was unsound. We fixed this in #2181.

One of the main changes in #2181 was making Chunk.Queue a subtype of Chunk. Doing so meant that Chunk could no longer guarantee O(1) lookup by index -- rather, it was now O(number of chunks in chunk queue). When a chunk queue consists only of singleton chunks, that becomes O(n). We added a Chunk#compact operation which built a chunk backed by an array, potentially primitive array.

As a result of #2181, the various combinators that used to call Chunk.Queue#toChunk no longer had to, resulting in a significant performance improvement as they no longer had to copy all the accumulated elements to a new array backed chunk. This tradeoff normally pays off, though not in the case where the O(n) lookup ends up dominating. The performance regressions reported in #2603 and #2627 are examples of such cases.

So why not add a call to .compact in the implementation of operations like chunkN? In short, we can't. We don't have a ClassTag[O] in those cases so we can't create primitive arrays. This gets us right back to the initial issue #2181 was addressing -- unsoundness as a result of too clever reflection.

This PR takes a different approach. We maintain the worst case O(n) lookup and we leave chunkN/unconsN as-is. Instead, we reimplement all the combinators on Chunk to avoid indexed lookup. If a user wants O(1) indexed lookup, they must call compact.

This PR also adds a new compactUntagged operation which stores the elements in an Array[Any] (boxed).

…hunk type

mpilquist · 2021-09-23T12:55:31Z

core/shared/src/main/scala/fs2/Chunk.scala

@@ -583,33 +524,7 @@ object Chunk
    if (v.isEmpty) empty
    else if (v.size == 1) // Use size instead of tail.isEmpty as vectors know their size
      singleton(v.head)
-    else new VectorChunk(v)
-
-  private final class VectorChunk[O](v: Vector[O]) extends Chunk[O] {


Identical to IndexedSeqChunk

SystemFw · 2021-09-23T13:29:05Z

So if I understand correctly, by using foreach in the combinators instead of apply directly, we can override foreach in Chunk.Queue to just traverse linearly once and keep the combinators O(N) rather than O(N^2) right?

mpilquist · 2021-09-23T13:59:37Z

So if I understand correctly, by using foreach in the combinators instead of apply directly, we can override foreach in Chunk.Queue to just traverse linearly once and keep the combinators O(N) rather than O(N^2) right?

Yep, that's right -- likewise for iterator

SystemFw

I think this is a good idea for our internal code. We might still want to explore a better apply for client code, but that's an orthogonal concern, so I'd go ahead with this

mpilquist added 7 commits September 22, 2021 15:46

Remove unnecessary indexing in Chunk combinators and add new IndexedC…

3756712

…hunk type

Scalafmt

718227f

Fix compilation error and remove unnecessary compaction caching

a2b274a

Removed IndexedChunk

224d4bd

Cleanup

5924da5

Add note to unconsN and chunkN about composite chunks

617b39c

Mima

a3c04ad

mpilquist changed the title ~~Remove unnecessary indexing in Chunk combinators and add new IndexedChunk type~~ Remove unnecessary indexing in Chunk combinators Sep 23, 2021

mpilquist requested review from diesalbla and SystemFw September 23, 2021 12:54

mpilquist marked this pull request as ready for review September 23, 2021 12:54

mpilquist commented Sep 23, 2021

View reviewed changes

SystemFw approved these changes Sep 23, 2021

View reviewed changes

diesalbla approved these changes Sep 23, 2021

View reviewed changes

mpilquist merged commit 6fe2fd6 into typelevel:main Sep 23, 2021

mpilquist mentioned this pull request Sep 23, 2021

Make indexed traversal of Chunk.Queue amortized logarithmic time #2631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary indexing in Chunk combinators #2630

Remove unnecessary indexing in Chunk combinators #2630

mpilquist commented Sep 22, 2021 •

edited

Loading

mpilquist Sep 23, 2021

SystemFw commented Sep 23, 2021 •

edited

Loading

mpilquist commented Sep 23, 2021

SystemFw left a comment

Remove unnecessary indexing in Chunk combinators #2630

Remove unnecessary indexing in Chunk combinators #2630

Conversation

mpilquist commented Sep 22, 2021 • edited Loading

mpilquist Sep 23, 2021

Choose a reason for hiding this comment

SystemFw commented Sep 23, 2021 • edited Loading

mpilquist commented Sep 23, 2021

SystemFw left a comment

Choose a reason for hiding this comment

mpilquist commented Sep 22, 2021 •

edited

Loading

SystemFw commented Sep 23, 2021 •

edited

Loading