Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper locks when modifying alert status #760

Merged
merged 1 commit into from
May 4, 2017
Merged

Use proper locks when modifying alert status #760

merged 1 commit into from
May 4, 2017

Conversation

prymitive
Copy link
Contributor

@prymitive prymitive commented Apr 28, 2017

Read locks are used, which will cause panics @fabxc @stuartnelson3

fatal error: concurrent map read and map write

goroutine 58 [running]:
runtime.throw(0x9fdd14, 0x21)
	/home/lukasz/usr/go/src/runtime/panic.go:596 +0x95 fp=0xc42058e730 sp=0xc42058e710
runtime.mapaccess2_fast64(0x946e20, 0xc4201472f0, 0x24e4121619386f95, 0xffffffffffffffff, 0xc42058e798)
	/home/lukasz/usr/go/src/runtime/hashmap_fast.go:168 +0x1b2 fp=0xc42058e758 sp=0xc42058e730
github.com/prometheus/alertmanager/types.(*memMarker).SetSilenced(0xc42014cb80, 0x24e4121619386f95, 0x0, 0x0, 0x0)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/types/types.go:115 +0x92 fp=0xc42058e7a8 sp=0xc42058e758
github.com/prometheus/alertmanager/notify.(*SilenceStage).Exec(0xc4201d8620, 0xc7a360, 0xc4200f3bf0, 0xc4200caf40, 0x3, 0x4, 0xc7a360, 0xc4200f3bf0, 0xc4200caf40, 0x3, ...)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/notify/notify.go:360 +0x2dd fp=0xc42058e8a8 sp=0xc42058e7a8
github.com/prometheus/alertmanager/notify.MultiStage.Exec(0xc42023d410, 0x3, 0x3, 0xc7a360, 0xc4200f3bf0, 0xc4200cad40, 0x3, 0x3, 0xc4200f3b01, 0x91cfa0, ...)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/notify/notify.go:259 +0xa5 fp=0xc42058e930 sp=0xc42058e8a8
github.com/prometheus/alertmanager/notify.(*MultiStage).Exec(0xc4201d86c0, 0xc7a360, 0xc4200f3bf0, 0xc4200cad40, 0x3, 0x3, 0xffffffffffffffff, 0xc42058ea40, 0x7252d5, 0xc420232640, ...)
	<autogenerated>:6 +0xad fp=0xc42058e9b8 sp=0xc42058e930
github.com/prometheus/alertmanager/notify.RoutingStage.Exec(0xc42023d380, 0xc7a360, 0xc4200f3bf0, 0xc4200cad40, 0x3, 0x3, 0x6, 0x91e020, 0xc4200135e0, 0xc420014b40, ...)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/notify/notify.go:245 +0xde fp=0xc42058ea30 sp=0xc42058e9b8
github.com/prometheus/alertmanager/dispatch.(*Dispatcher).processAlert.func1(0xc7a360, 0xc4200f3bf0, 0xc4200cad40, 0x3, 0x3, 0x2)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:264 +0x86 fp=0xc42058eae8 sp=0xc42058ea30
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run.func1(0xc4200cad40, 0x3, 0x3, 0x2)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:372 +0x58 fp=0xc42058eb28 sp=0xc42058eae8
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).flush(0xc420192360, 0xc42058ef20)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:432 +0x47f fp=0xc42058ed80 sp=0xc42058eb28
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run(0xc420192360, 0xc4201de480)
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:373 +0x88c fp=0xc42058efd0 sp=0xc42058ed80
runtime.goexit()
	/home/lukasz/usr/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42058efd8 sp=0xc42058efd0
created by github.com/prometheus/alertmanager/dispatch.(*Dispatcher).processAlert
	/home/lukasz/work/go/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:269 +0x32c

@prymitive
Copy link
Contributor Author

Hmm, CI is taking long, might be that there's a lock that's not getting released and blocking

@prymitive
Copy link
Contributor Author

Oh, that's likely because SetSilenced can call SetActive, which also does the locking

@@ -175,16 +180,15 @@ func (m *memMarker) Status(alert model.Fingerprint) AlertStatus {
s, found := m.m[alert]
if !found {
s = &AlertStatus{}
m.m[alert] = s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to get rid of populating marker with empty state in the Status() method? This way we avoid lock switching and as soon as we can Set*() we'll insert proper status into the map.

@prymitive
Copy link
Contributor Author

Updated and now passing tests

Locking is a bit tricky there since some state-modifying methods will call to other methods that also modify marker state, simple defer won't work.
@davidkarlsen
Copy link

Should this get merged and 0.6.2 be released?

@prymitive
Copy link
Contributor Author

prymitive commented May 4, 2017

I would hope so, I'm sure @fabxc or @stuartnelson3 will review this when they get a chance, until that happens it's best to stay on 0.6.0 which shouldn't have any issues

@fabxc
Copy link
Contributor

fabxc commented May 4, 2017

Sorry about not being responsive. LGTM

@fabxc fabxc merged commit cb8b507 into prometheus:master May 4, 2017
@prymitive prymitive deleted the fix-types-lock branch May 4, 2017 13:21
@prymitive
Copy link
Contributor Author

Does this issue justify a new 0.6.x release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants