ZIO2 performance issue while performing streaming calls on version `0.6.x` #513

cipriansofronia · 2023-06-15T13:31:01Z

We recently migrated to ZIO2, so zio-grpc got bumped to 0.6.0-rc5. We've noticed some performance issues after the upgrade while performing streaming calls. Going to attach some screenshots of the IntelliJ Profiler.

On the upper hand of the screenshot there is the timeline of the previous version of zio-grpc (0.5.1) running with ZIO1 and below we have the zio-grpc version 0.6.0-rc5 with ZIO2.
The streaming request is kept alive for a few seconds for both versions but we can notice some differences between these two. On 0.5 there is a small spike when the request is triggered but then the CPU goes down and it stays like that, on 0.6 the CPU load is bigger and it says like that for the duration of the streaming request (not shown here in the screenshots but if we trigger more requests then the CPU goes even higher).
There are no other changes in our service, only the migration to ZIO2 and bumping zio-grpc, so not doing anything extra on the new version for the same streaming call. The only thing different I noticed is that 0.6 is allocating more resources while performing serverStreamingWithBackpressure and underneath is calling ZIO2 internals. Using ZIO2 streaming so far in other parts of the app did not present performance issues.

We isolated even more the streaming call, we created a separate project with zio-grpc 0.6 where we only do:

ZStream(<zio_grpc_generated_class>.defaultInstance)
      .repeat(Schedule.spaced(10.second))

So, nothing CPU consuming and we got the same behaviour.

Please advise if more informations are needed.

The text was updated successfully, but these errors were encountered:

regiskuckaertz · 2023-06-16T05:54:36Z

Have you tried increasing the back pressure queue size? The default is very small and that may explain this behaviour. We use a queue of 512k iirc in our service.

cipriansofronia · 2023-06-16T06:59:15Z

No, I did not, will give it a try! Thanks! 🙌🏻

cipriansofronia · 2023-06-16T12:25:09Z

Unfortunately, it did not help. At first I was not sure if the config was reaching that back pressure queue, but debugging the service I was able to see the new config I set, and sadly, no change in CPU usage.
I was able to reproduce the issue with the example service (helloworld) provided in the zio-grpc repo as well, I can push my changes to my fork if that helps.

Gregory-Berkman-Imprivata · 2023-06-16T20:19:08Z

We are running into a very similar performance problem. We are not using streaming but we notice that when we receive grpc requests over time, the number of FiberId$Runtime instances continuously increases and never drops. Eventually performance degrades significantly and Kubernetes kills the node.
For unary requests we can see that a forkDaemon is being called (also called for streaming requests) here:

zio-grpc/core/src/main/scalajvm/scalapb/zio_grpc/server/ListenerDriver.scala

Line 38 in 4c20b9b

.forkDaemon

Is the new fiber correctly being released?

regiskuckaertz · 2023-06-17T07:22:28Z

@cipriansofronia hello - could you give this one a try: #514 It maybe was a mistake to use toQueueOfElements, but I also wonder if calling isReady that much is contributing to it. The grpc-java internal buffer is currently fixed at 32kb so it may kick in very often depending on the workload. Another road to explore would be to use the stream observer instead, I'll look into that later. See grpc/grpc-java#5433

@Gregory-Berkman-Imprivata that looks like a separate issue but you are right that we should rather fork in a scope that is closed when the call terminates.

cipriansofronia · 2023-06-17T07:23:39Z

@regiskuckaertz, I published locally your changes from #514, tested it with the helloworld example and I can tell that this version does not stress the CPU anymore, there is a small spike at first when the request is made but then it drops down to almost 0 while the request is still active and the stream is emitting elements. I tried performing multiple requests in the same time and the CPU stayed the same. It performs well even with the default queue size of 16. Thank you for looking into it! 🙏🏻

regiskuckaertz · 2023-06-17T07:28:17Z

Yiihaaa! That is great to hear, thanks for trying it out. It's also weird to come back to something you wrote months ago and think "wow was I on the crack pipe back then? this is way too complex" 😄

regiskuckaertz · 2023-06-17T17:10:16Z

@Gregory-Berkman-Imprivata I think this will help #515

cipriansofronia · 2023-06-19T05:55:39Z

@thesamet thank you for merging these PRs, unfortunately there is an issue publishing the snapshots it appears.
edit: Actually, it appears that it was published here, not sure what's that error about.

cipriansofronia · 2023-06-19T12:16:07Z

@thesamet, @regiskuckaertz, could you release another RC with these changes, please?

thesamet · 2023-06-20T18:34:00Z

Sure, will cut a release this week.

cipriansofronia · 2023-06-21T06:46:55Z

@regiskuckaertz and @thesamet, I really appreciate your help and fast replies. The issue is solved now, so I'm closing it. Cheers!

ghostdogpr · 2023-06-28T07:33:21Z

We did a round of load testing using RC5 (which reproduced this issue, perf was bad) and then using the latest snapshot, and the latest shapshot gave us a great performance (better than our zio 1 code!). How about a first official release for ZIO 2 finally? 😄

Gregory-Berkman-Imprivata · 2023-07-28T19:56:30Z

We are running into a very similar performance problem. We are not using streaming but we notice that when we receive grpc requests over time, the number of FiberId$Runtime instances continuously increases and never drops. Eventually performance degrades significantly and Kubernetes kills the node. For unary requests we can see that a forkDaemon is being called (also called for streaming requests) here:

zio-grpc/core/src/main/scalajvm/scalapb/zio_grpc/server/ListenerDriver.scala

Line 38 in 4c20b9b

.forkDaemon

Is the new fiber correctly being released?

@regiskuckaertz Not sure if this has been fixed actually. I can open a new issue for this but I am still seeing the number of FiberId$Runtime increasing continuously.

This is from running load tests locally on my machine. the number of instance of the FiberId only increases

/service $ jmap -histo 1 | grep FiberId
   8:         86715        2774880  zio.FiberId$Runtime

ghostdogpr · 2023-08-01T02:20:29Z

I think you can close this issue since this is unrelated to streaming. Let's discuss in #537

cipriansofronia mentioned this issue Jun 15, 2023

ZIO 2 stable release #421

Closed

regiskuckaertz mentioned this issue Jun 17, 2023

refactor: channel-based backpressure #514

Merged

cipriansofronia closed this as completed Jun 21, 2023

ghostdogpr mentioned this issue Jun 28, 2023

Use the latest snapshot of zio-grpc devsisters/shardcake#73

Merged

Gregory-Berkman-Imprivata mentioned this issue Jul 28, 2023

FiberId$Runtime instances do not release and continuously accumulate #537

Closed

thesamet reopened this Jul 28, 2023

thesamet closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZIO2 performance issue while performing streaming calls on version `0.6.x` #513

ZIO2 performance issue while performing streaming calls on version `0.6.x` #513

cipriansofronia commented Jun 15, 2023

regiskuckaertz commented Jun 16, 2023

cipriansofronia commented Jun 16, 2023

cipriansofronia commented Jun 16, 2023 •

edited

Loading

Gregory-Berkman-Imprivata commented Jun 16, 2023

regiskuckaertz commented Jun 17, 2023

cipriansofronia commented Jun 17, 2023

regiskuckaertz commented Jun 17, 2023

regiskuckaertz commented Jun 17, 2023

cipriansofronia commented Jun 19, 2023 •

edited

Loading

cipriansofronia commented Jun 19, 2023 •

edited

Loading

thesamet commented Jun 20, 2023

cipriansofronia commented Jun 21, 2023

ghostdogpr commented Jun 28, 2023

Gregory-Berkman-Imprivata commented Jul 28, 2023 •

edited

Loading

ghostdogpr commented Aug 1, 2023

ZIO2 performance issue while performing streaming calls on version 0.6.x #513

ZIO2 performance issue while performing streaming calls on version 0.6.x #513

Comments

cipriansofronia commented Jun 15, 2023

regiskuckaertz commented Jun 16, 2023

cipriansofronia commented Jun 16, 2023

cipriansofronia commented Jun 16, 2023 • edited Loading

Gregory-Berkman-Imprivata commented Jun 16, 2023

regiskuckaertz commented Jun 17, 2023

cipriansofronia commented Jun 17, 2023

regiskuckaertz commented Jun 17, 2023

regiskuckaertz commented Jun 17, 2023

cipriansofronia commented Jun 19, 2023 • edited Loading

cipriansofronia commented Jun 19, 2023 • edited Loading

thesamet commented Jun 20, 2023

cipriansofronia commented Jun 21, 2023

ghostdogpr commented Jun 28, 2023

Gregory-Berkman-Imprivata commented Jul 28, 2023 • edited Loading

ghostdogpr commented Aug 1, 2023

ZIO2 performance issue while performing streaming calls on version `0.6.x` #513

ZIO2 performance issue while performing streaming calls on version `0.6.x` #513

cipriansofronia commented Jun 16, 2023 •

edited

Loading

cipriansofronia commented Jun 19, 2023 •

edited

Loading

cipriansofronia commented Jun 19, 2023 •

edited

Loading

Gregory-Berkman-Imprivata commented Jul 28, 2023 •

edited

Loading