Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about performance tuning #72

Closed
BaekGeunYoung opened this issue Jun 8, 2023 · 7 comments
Closed

Question about performance tuning #72

BaekGeunYoung opened this issue Jun 8, 2023 · 7 comments
Labels
question Further information is requested

Comments

@BaekGeunYoung
Copy link

Hello, I'm trying to migrate my project from akka-cluster to shardcake.

I finished migrating my code, and I'm comparing the overall performance of akka & shardcake, using K6.

I expected that there would not be any big difference in performance between these two library, but it seems that the performance when I use akka is about 4~5x better than when I use shardcake.

Is there any configuration that I should adjust further, or is this just shardcake's structural problem?

K6 test result

shardcake version:
image

akka version:
image

Infrastructure & Configuration about shardcake

  • Deployed using k8s
  • Used single Redis for storage
  • number of pods: 5
  • numberOfShards: 50
  • sendTimeout: 60 seconds
@BaekGeunYoung BaekGeunYoung changed the title Question about performance Question about performance tuning Jun 8, 2023
@ghostdogpr
Copy link
Collaborator

In our case, both in load tests and production, performance was pretty much the same between Akka and Shardcake. So it may come from other factors/changes but it's hard to tell without knowing the code. Out of curiosity, were you using zio with akka?
I would try to use profiling or tracing to find where the bottleneck is.

@BaekGeunYoung
Copy link
Author

BaekGeunYoung commented Jun 8, 2023

Wasn't there any bottleneck in communicating between pods via gRPC? I think performance of remote actor communication would be much better in akka, because they are using just TCP.

@ghostdogpr
Copy link
Collaborator

gRPC is pretty fast, though not as fast as direct TCP. If you only measure the transport and your actors do nothing else, you might see a difference but in our case of a real world actor that does stuff, the difference was not significant. We had the exact same latency and throughput before and after.

Note that if the transport layer speed is sensitive, you can implement your own by making your own implementation of the Pods interface instead of using GrpcPods.

@BaekGeunYoung
Copy link
Author

@ghostdogpr i've tried tracing about my RPC, and below is the result:

image

The span shardcake-actor-service-execute is created when calling Messenger.send, and finished afther receiving Response from entity. And its children spans are created after entity receives command from outside. As you see, the durations of children are trivial, and "something" is occupying dominant portion of shardcake-actor-service-execute span.
I think this is a duration of waiting for queueing or something like that.. Do you have some idea about what makes this latency, or how to reduce this latency?

@ghostdogpr
Copy link
Collaborator

Hmm that definitely looks wrong.
Is this real production code? No way you can share it by any chance?
I would look at CPU profile (if CPU usage is high) and threads (see if threads are blocked or if something blocking runs in ZIO async threadpool).

@ghostdogpr ghostdogpr added the question Further information is requested label Jul 7, 2023
@BaekGeunYoung
Copy link
Author

I think #73 might have solved my issue. After applying 2.0.6+11-e9d97295-SNAPSHOT, the performance got much better! I'll close this issue. thanks!

@ghostdogpr
Copy link
Collaborator

I think #73 might have solved my issue. After applying 2.0.6+11-e9d97295-SNAPSHOT, the performance got much better! I'll close this issue. thanks!

Ohh that was it? I discovered recently with our own load test with zio 2 that zio-grpc had a severe performance issue, and that it was fixed in the latest snapshot. We're using zio 1 in prod so I wasn't aware of this at the time you sent the first message. Glad that it's resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants