You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently send alerts at every eval interval, and we sent alerts synchronously, see https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L371-L373, so when the reader read alerts from channel, it will not continue to read next alert until it successfully send the alert to all kinds of clients. And as we all know sending alert message to clients involve networking, which is slow. So, consumer can not catch up with producer, thus the channel will saturate.
Then, I propose to send alerts asynchronously:
// current synchronous code at https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L371-L373ag.flush(func(alerts...*types.Alert) bool {
returnnf(ctx, alerts...)
})
// asynchronous code I propose:gofunc() {
ag.flush(func(alerts...*types.Alert) bool {
returnnf(ctx, alerts...)
})
}()
With this simple optimization, the runtime.mach_semaphore_signal time can decrease from 33.3% to 16.6%
profile for current synchronous code
profile for asynchronous code I proposed:
The text was updated successfully, but these errors were encountered:
One thing to watch for is that currently we will be sending at most one notification per group, if this is made asynchronous this exasperate overload on the receiver. I think we should maintain the property that at most one notification attempt is ongoing for a group at once.
@brancz
cc @brian-brazil
We currently send alerts at every eval interval, and we sent alerts synchronously, see https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L371-L373, so when the reader read alerts from channel, it will not continue to read next alert until it successfully send the alert to all kinds of clients. And as we all know sending alert message to clients involve networking, which is slow. So, consumer can not catch up with producer, thus the channel will saturate.
Then, I propose to send alerts asynchronously:
With this simple optimization, the
runtime.mach_semaphore_signal
time can decrease from 33.3% to 16.6%profile for current synchronous code
profile for asynchronous code I proposed:
The text was updated successfully, but these errors were encountered: