-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDP service listener performance enhancements #4676
Conversation
@@ -98,11 +101,13 @@ func (s *Service) Open() (err error) { | |||
s.Logger.Printf("Failed to set up UDP listener at address %s: %s", s.addr, err) | |||
return err | |||
} | |||
s.conn.SetReadBuffer(s.config.ReadBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the most significant change of this entire PR, increasing the read buffer size has a dramatic improvement in the ability to handle large bursts of UDP traffic. We are defaulting to 8mb. The previous default was whatever the OS was configured for, which is usually ~128kb
Change for the UDP service listener. This does 2 things:
|
@@ -64,6 +66,7 @@ func NewService(c Config) *Service { | |||
return &Service{ | |||
config: d, | |||
done: make(chan struct{}), | |||
bytebuf: make(chan []byte, 1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this magic number come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a better name. parserChan
perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to parserChan
I really don't have any particular reason to set it to 1000. In my testing I rarely saw the length of this channel get beyond 10. So I thought I'd just go up two orders of magnitude, since 1000 isn't going to consume much memory at all and should allow it to handle very high loads with smaller packets. I will make it a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
OK, this generally looks good but I have some feedback before we can merge. We need to remove the magic number What testing have you actually performed to show that this improves performance? |
I would also like us to wait for feedback from the others CC on this ticket before we merge (wait 24-48 hours anyway) since they have all been good enough to help us with UDP performance. |
If you SIGINT influxd you'll get Fixing it requires changing the order you close stuff, adding a timeout to the UDP read, and doing a continue in the UDP read loop if you get a timeout error. That will allow you to stop the read loop cleanly. It's not a big deal, but I can see the message causing confusion. |
@otoolep Re: performance testing I've been doing local testing with 64kb packet sizes. When bursting very large numbers of UDP packets (100,000 metrics in one batch, so 100s of 64kb packets within nanoseconds) I would see ~20% loss on the old listener. After splitting out I wrote some test scripts at various burst sizes, I can send you them if you want for reference |
When I last tested Go's network code, adding that timeout on the read had a significant impact on performance. I would not add it without testing. It could be a distinct PR. |
Sounds great @sparrc -- sounds like you may have cracked it. What you were seeing sounds rather like what others were seeing. |
+1 on increasing the read buffer size. It made my UDP service much more performant as well. I also stumbled upon |
@nkatsaros I think that issue with the SIGINT is not specific to this change though, correct? We could fix that in a separate PR. I also wonder if simply changing the ordering of these lines would fix that: https://github.com/influxdb/influxdb/blob/master/services/udp/service.go#L178-L180 |
No, it wasn't specific to this change and it wasn't related to performance but I figured I'd mention it. I don't think changing the order of those lines would fix it unless you were actively receiving UDP packets. |
+1 on a distinct PR for that. |
Out of curiosity, is the collectd input plugin derived from this same UDP listener code? Back when we still were using the collectd input, I was aware that InfluxDB was simply using the OS default receive buffer size. I don't remember exactly which version this was, but we experimented with increasing the OS defaults, and confirmed that InfluxDB was inheriting these. I think we pushed it as high as 32 MB, which is starting to get a bit silly. The buffer still ultimately filled up however, and the kernel started to drop packets. Bear in mind however that this was long before tsm1 engine, when we still had major IOPS issues. I suspect the DB simply wasn't keeping up, and the buffer bloat eventually overflowed. |
@dswarbrick, that is true, @otoolep opened a case to increase the buffer size of those input plugins also #4678 |
849df98
to
471902d
Compare
Looks great @sparrc |
+1 |
No description provided.