Add internal alert ingest queue #1832

stuartnelson3 · 2019-04-11T12:26:18Z

Currently, every alert received via the API is sent directly to the internal memory store for alerts:

API:

alertmanager/api/v2/api.go

Line 348 in 8ca1f66

if err := api.alerts.Put(validAlerts...); err != nil {

Memory store:

alertmanager/provider/mem/mem.go

Lines 149 to 180 in 8ca1f66

    
           func (a *Alerts) Put(alerts ...*types.Alert) error { 
        
           	for _, alert := range alerts { 
        
           		fp := alert.Fingerprint() 
        
           		// Check that there's an alert existing within the store before 
        
           		// trying to merge. 
        
           		if old, err := a.alerts.Get(fp); err == nil { 
        
           			// Merge alerts if there is an overlap in activity range. 
        
           			if (alert.EndsAt.After(old.StartsAt) && alert.EndsAt.Before(old.EndsAt)) || 
        
           				(alert.StartsAt.After(old.StartsAt) && alert.StartsAt.Before(old.EndsAt)) { 
        
           				alert = old.Merge(alert) 
        
           			} 
        
           		} 
        
           		if err := a.alerts.Set(alert); err != nil { 
        
           			level.Error(a.logger).Log("msg", "error on set alert", "err", err) 
        
           			continue 
        
           		} 
        
           		a.mtx.Lock() 
        
           		for _, l := range a.listeners { 
        
           			select { 
        
           			case l.alerts <- alert: 
        
           			case <-l.done: 
        
           			} 
        
           		} 
        
           		a.mtx.Unlock() 
        
           	} 
        
           	return nil 
        
           }

Every set locks the internal store, which can become a performance issue (cf. #1201). Prometheus sends messages in batches of 50, but there's no enforcing of this on the alertmanager end. Anecdotally, I've seen a single alertmanager become unresponsive when receiving ~50 ingest requests per second, with each request containing a single message.

Batching at the store level is probably the right place to implement this:

alertmanager/store/store.go

Line 95 in 8ca1f66

func (a *Alerts) Set(alert *types.Alert) error {

Adding a basic benchmark and a receive queue would be a good first step.

This is more or less inspired by common sense and being reminded of how kafka ingests messages.

The text was updated successfully, but these errors were encountered:

palash25 · 2022-11-06T05:47:07Z

hi i would like to take this up if this is still valid and no one is working on this

stuartnelson3 added area/performance component/storage labels Apr 11, 2019

simonpasquier added the help wanted label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add internal alert ingest queue #1832

Add internal alert ingest queue #1832

stuartnelson3 commented Apr 11, 2019

palash25 commented Nov 6, 2022

Add internal alert ingest queue #1832

Add internal alert ingest queue #1832

Comments

stuartnelson3 commented Apr 11, 2019

palash25 commented Nov 6, 2022