Stream.fromIterator doesn't create proper chunks #2010

susuro · 2020-08-25T19:42:44Z

Stream.fromIterator fetches values one by one, effectively creating single-value chunks. This has negative impact on performance, especially when working with primitive values.
The situation is even worse for Stream.fromBlockingIterator, because context switch for each fetched value increases the performance penalty.

I think this could be improved with following implementation of PartiallyAppliedFromIterator:

private[fs2] final class PartiallyAppliedFromIterator[F[_]](
    private val dummy: Boolean
) extends AnyVal {
  def apply[A](iterator: Iterator[A], chunkSize: Int = 1)(implicit F: Sync[F]): Stream[F, A] = {
    def getNextChunk(i: Iterator[A]): F[Option[(Chunk[A], Iterator[A])]] =
      F.delay {
        for (_ <- 1 to chunkSize if i.hasNext) yield i.next()
      }.map { s =>
        if (s.isEmpty) None else Some((Chunk.seq(s), i))
      }

    Stream.unfoldChunkEval(iterator)(getNextChunk)
  }
}

Similarly with PartiallyAppliedFromBlockingIterator.

If you find this solution OK I would be happy to prepare and submit a PR.

The text was updated successfully, but these errors were encountered:

mpilquist · 2020-08-26T11:42:17Z

Any thoughts on Stream.fromIterator(iterator).chunkN(n) instead?

djspiewak · 2020-08-26T14:53:54Z

Any thoughts on Stream.fromIterator(iterator).chunkN(n) instead?

It's more general, but it would incur the chunk management overhead. It's going to be significantly slower than batching next calls within a tight loop. I'm a little skeptical of using Iterator in performance-sensitive code to begin with, but if you are, then chunkN isn't going to solve the problem.

susuro · 2020-08-26T18:32:17Z

I haven't done proper performance testing, only a few tests within a CSV parsing library, so my claims below my not be fully correct. However, the processing time I've measured for Stream.fromIterator(iterator).chunkN(chunkSize).flatMap(Stream.chunk) is about 2x shorter that using simple Stream.fromIterator(iterator), while the tight loop I proposed gives me 10x better result.
In case of fromBlockingIterator, chunkN doesn't provide any significant improvement while tight loop executes 100x faster.

You're probably right that we should avoid using iterator for performance-sensitive operations. But sometimes, in existing projects, the iterator may be the only interface you have available and even if the code isn't really performance-sensitive, making it more efficient can still be valuable.

mpilquist · 2020-08-26T19:07:02Z

OK, let's move on to a PR then. Thanks for testing.

Create chunks in Stream.fromIterator (#2010)

susuro · 2020-08-29T18:00:36Z

Solved wit PR #2013.

susuro added a commit to susuro/fs2 that referenced this issue Aug 28, 2020

Create chunks in Stream.fromIterator (typelevel#2010)

12a4eeb

This was referenced Aug 28, 2020

Create chunks in Stream.fromIterator (#2010) #2013

Merged

Create chunks in Stream.fromIterator and Stream.fromBlockingIterator #2015

Merged

mpilquist added a commit that referenced this issue Aug 28, 2020

Merge pull request #2013 from susuro/iterator-chunk

c10bd88

Create chunks in Stream.fromIterator (#2010)

susuro closed this as completed Aug 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream.fromIterator doesn't create proper chunks #2010

Stream.fromIterator doesn't create proper chunks #2010

susuro commented Aug 25, 2020

mpilquist commented Aug 26, 2020

djspiewak commented Aug 26, 2020

susuro commented Aug 26, 2020

mpilquist commented Aug 26, 2020

susuro commented Aug 29, 2020

Stream.fromIterator doesn't create proper chunks #2010

Stream.fromIterator doesn't create proper chunks #2010

Comments

susuro commented Aug 25, 2020

mpilquist commented Aug 26, 2020

djspiewak commented Aug 26, 2020

susuro commented Aug 26, 2020

mpilquist commented Aug 26, 2020

susuro commented Aug 29, 2020