-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRD-1749: Escape send queued when blocked on connection side #18
TRD-1749: Escape send queued when blocked on connection side #18
Conversation
session.go
Outdated
s.messageOut <- msg | ||
s.stateTimer.Reset(s.HeartBtInt) | ||
select { | ||
case <-time.After(5 * time.Millisecond): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 1ms
2024/01/30 18:55:18 >>>>>>>>>>> INBOUND STATS <<<<<<<<<<<
2024/01/30 18:55:18 NumCPU: 10
2024/01/30 18:55:18 GOMAXPROCS: 10
2024/01/30 18:55:18 Sample mean is 3744.244753299986 us
2024/01/30 18:55:18 Sample max is 57088.334 us (4482)
2024/01/30 18:55:18 Standard Dev is 8178.456956331603 us
2024/01/30 18:55:18 Processed 10000 msg in 5.38205325s [effective rate: 1858.0270 msg/s]
2024/01/30 18:55:18 ----------- INBOUND STATS -----------
2024/01/30 18:55:18 >>>>>>>>>>> OUTBOUND STATS <<<<<<<<<<<
2024/01/30 18:55:18 NumCPU: 10
2024/01/30 18:55:18 GOMAXPROCS: 10
2024/01/30 18:55:18 Sample mean is 3753.0946988000005 us
2024/01/30 18:55:18 Sample max is 69729.459 us (1917)
2024/01/30 18:55:18 Standard Dev is 9290.683106959043 us
2024/01/30 18:55:18 Processed 10000 msg in 5.387390125s [effective rate: 1856.1863 msg/s]
2024/01/30 18:55:18 ----------- OUTBOUND STATS -----------
With 5ms
2024/01/30 18:57:57 >>>>>>>>>>> INBOUND STATS <<<<<<<<<<<
2024/01/30 18:57:57 NumCPU: 10
2024/01/30 18:57:57 GOMAXPROCS: 10
2024/01/30 18:57:57 Sample mean is 13627.323359899992 us
2024/01/30 18:57:57 Sample max is 179786.5 us (3812)
2024/01/30 18:57:57 Standard Dev is 28109.224029722398 us
2024/01/30 18:57:57 Processed 10000 msg in 20.271821667s [effective rate: 493.2956 msg/s]
2024/01/30 18:57:57 ----------- INBOUND STATS -----------
2024/01/30 18:57:57 >>>>>>>>>>> OUTBOUND STATS <<<<<<<<<<<
2024/01/30 18:57:57 NumCPU: 10
2024/01/30 18:57:57 GOMAXPROCS: 10
2024/01/30 18:57:57 Sample mean is 14487.111722499989 us
2024/01/30 18:57:57 Sample max is 254143.25 us (1879)
2024/01/30 18:57:57 Standard Dev is 35900.6777926075 us
2024/01/30 18:57:57 Processed 10000 msg in 20.272896125s [effective rate: 493.2694 msg/s]
2024/01/30 18:57:57 ----------- OUTBOUND STATS -----------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this locally and can see that the lesser the value, the better the throughput. Wondering how you choose 5ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why even wait 1ms? how's the throughput if you just have a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh ok I get your point. I avoided default because I think that as long as the connection is blocked this loop will keep the CPU very busy, since every time we try to send and then we trigger again the messageEvent
.
Though it makes sense. So I will use default instead of some timer.
Addressed: 21384e6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tested with modified quickfixgo/perf
session.go
Outdated
s.messageOut <- msg | ||
s.stateTimer.Reset(s.HeartBtInt) | ||
select { | ||
case <-time.After(5 * time.Millisecond): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while time.After
is a nice convenience method, it shouldn't be used in hotpaths where you call it a many times as internally it creates a timer but never stops it, so you should rather just create a timer yourself so you can stop it if you end up returning before it fires.
it is also documented as such: https://pkg.go.dev/time#After
(although I also commented above that I think maybe there shouldn't be a timer at all and just use a default?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed: 21384e6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
s.stateTimer.Reset(s.HeartBtInt) | ||
return true | ||
default: | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation: with this change, sometimes the benchmark on the outbound side stops responding. I'll capture the stuck goroutine and share later. But when it worked this is the benchmark result with the change
2024/01/31 09:54:40 >>>>>>>>>>> OUTBOUND STATS <<<<<<<<<<<
2024/01/31 09:54:40 NumCPU: 10
2024/01/31 09:54:40 GOMAXPROCS: 10
2024/01/31 09:54:40 Sample mean is 127.8728962000003 us
2024/01/31 09:54:40 Sample max is 2632.417 us (204)
2024/01/31 09:54:40 Standard Dev is 181.40784398506725 us
2024/01/31 09:54:40 Processed 10000 msg in 628.1435ms [effective rate: 15919.9291 msg/s]
2024/01/31 09:54:40 ----------- OUTBOUND STATS -----------
2024/01/31 09:54:40 >>>>>>>>>>> INBOUND STATS <<<<<<<<<<<
2024/01/31 09:54:40 NumCPU: 10
2024/01/31 09:54:40 GOMAXPROCS: 10
2024/01/31 09:54:40 Sample mean is 19.812591999999984 us
2024/01/31 09:54:40 Sample max is 941.417 us (1231)
2024/01/31 09:54:40 Standard Dev is 35.00415617401569 us
2024/01/31 09:54:40 Processed 10000 msg in 622.807375ms [effective rate: 16056.3288 msg/s]
2024/01/31 09:54:40 ----------- INBOUND STATS -----------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also curious about adding in GC runtime settings as part of the benchmark..will try
@AlexandrosKyriakakis can you also share a comparison like |
@raja Here is the one from Pyroscope |
Issue
Observations
session.sendMutex
So, if there are many messages in the slice
toSend
, session from our side is blocked until all the messages are sent.At the same time, if halfway (when half of the messages in
toSend
have been sent) the other side cannot accept more messages (maybe cause of a buffer overflow) in order to be unblocked, we need to consume some of the incoming messages, which with the current implementation cannot happen.Proposed solution
When sending queued messages, if any of the messages take more than 5ms to be sent, then we escape the
toSend
loop and leave the rest of the messages for the next session loop.