-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deadman's switch #149
Add deadman's switch #149
Conversation
f0fc7de
to
94c1649
Compare
@gunnaraasen @rossmcdonald I know you both had some original input. How does the final result look? |
@nathanielc Just to make sure, if I wanted to alert if throughput drops below 1 event every 10s, would that be the following?
I can see this notation being a little confusing, since it's not obvious that it samples every second. Without looking at any documentation, I would assume that this:
Means "alert when you receive less than 100 events in 10 seconds". |
@rossmcdonald Correct. I was torn as well on which I liked. With this one For this one |
To be clear, I can easily do it either way, I just decided that once you know its always events/second it becomes easier to use. |
I'll second @rossmcdonald's confusion on the arguments to the It would be nice if the rate interval in the Also, what would be the recommendation for alerting on an absolute time since the last event? Say alerting only when a service hasn't sent any events in the last 10 minutes. |
@gunnaraasen So do you want a third argument, like
or is
good enough? |
@gunnaraasen As for alerting on an absolute time since last event that is the same as a 0 threshold for that time period correct?
If you want both its doable(but not via the configuration) var data = stream.from()...
data.deadman(10m, 0.0)
data.deadman(1h, 5.0)// using Ross's notation read as less than 5 events for the hour
// Do normal data fprocessing
data.... As an aside I am leaning towards Ross's notation since otherwise its a pain to calculate. |
Regarding:
I like the idea, but I think in practice it will just end up becoming confusing since it's not explicitly obvious from the function name. It also requires a little bit of mental arithmetic that I think may be a bit too cumbersome. |
I like Ross' notation as well. Although switching the order of the arguments might be slightly more readable:
However, then it's unclear how often the rate checked? Does it default to checking once an hour? Also, the 0 rate threshold works perfectly for checking the absolute time since the last event. |
I also think that switching the parameters would help improve readability. |
94c1649
to
38ada0e
Compare
38ada0e
to
a955283
Compare
Fixes #137
A deadman's switch can now be added to any node within any task. This is accomplished via exposing the internal statistics per node to the TICKscript itself via the
stats
method.The
stats
method emits the internal stats of a node at a given interval. The deaman's switch uses thecollected
stat to trigger an alert if it drops below a threshold.The method
deadman
is available on all nodes and is a helper function to easily create a deadman's switch.This:
is equivalent to this:
In addition to the
deadman
method you can globally configure all stream tasks to have deadman's switches.The
id
andmessage
fields can also be configured globally.EDIT: Changed examples to match final implementation