Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breaker #362

Merged
merged 13 commits into from
Jul 13, 2017
Merged

Circuit breaker #362

merged 13 commits into from
Jul 13, 2017

Conversation

aryszka
Copy link
Contributor

@aryszka aryszka commented Jul 3, 2017

OP:

In the simplest case, the proxy gets a breaker for the current backend host from the registry, if configured. The breakers are always bound to a backend host, even if they share their settings. There are two breaker types, one opens after N consecutive failures, the other opens when the failure rate crosses a threshold within a sliding window (numeric window, not time based). The idle breakers are automatically dropped from the registry when new breakers need to be created, e.g. when some backend hosts were removed from the configuration but new ones were added.

CONFIG:

  1. it is possible to define global breaker settings (common settings for each backend host breaker)
  2. it is possible to override or disable the global settings for each host individually
  3. it is possible to override the global or host based settings for each route individually, using standard filters

DOCS:
https://github.com/zalando/skipper/blob/circuit-breaker/circuit/doc.go

@mo-gr mo-gr requested review from evangelion1204 and removed request for evangelion1204 July 10, 2017 11:37
Copy link
Contributor

@DeTeam DeTeam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes on docs.

The Host field indicates to which backend host should the current set of settings be applied. Leaving it empty
indicates global settings.

Command line name: host.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is glob supported (*)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The answer is no, but mb it's a featue worth considering..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would it work? currently if the host is not set, that means global settings (still with individual breaker instances for the hosts)


The window value sets the size of the sliding counter window of the failure rate breaker.

Command line name: window. Possible command line values: any positive integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any defaults worth mentioning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are.. but currently they are used from sony/gobreaker and may be replaced in a later wave with https://github.com/ideahitme/circuit-breaker, so considered not declaring the defaults in the docs atm. Basically, 5 consecutive failures, 1 minute timeout, 1 half-open request. The default IdleTTL is documented.


The failures value sets the max failure count for both the "consecutive" and "rate" breakers.

Command line name: failures. Possible command line values: any positive integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any defaults worth mentioning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@zalando zalando deleted a comment from DeTeam Jul 10, 2017
@zalando zalando deleted a comment from DeTeam Jul 10, 2017
@DeTeam
Copy link
Contributor

DeTeam commented Jul 10, 2017

@aryszka
Copy link
Contributor Author

aryszka commented Jul 10, 2017

sorry, didn't mean to delete those comments 🤕

@DeTeam
Copy link
Contributor

DeTeam commented Jul 10, 2017

👍

circuit/list.go Outdated
// It can return the first consecutive items that match a
// condition (in our case, the ones that were inactive for a
// while).
type list struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the first time ever since university, that I see a linked list implementation :-)

Anyway. Why not just use an array? This seems like premature optimisation and just adds fuckup-surface. Please consider just using an array. Yes, it might be slower, no it won't matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also been waiting long for the use case ;)

i think it is simply optimization, but not premature, since it is already in the PR. Using an array would be like not using anything, because then we could also just iterate over the timestamps to see which breaker is idle. The reason for the optimization here is that this is in a synchronized scope, happening on every request, and blocking every concurrent request. Why do you think that it won't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that thinking about it.. maybe there is a way to bind the breakers to each route. Will play with that as a separate effort.

}

settings, _ := c.stateBag[circuitfilters.RouteSettingsKey].(circuit.BreakerSettings)
settings.Host = c.outgoingHost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I set breakers for <shunt> or <loopback>? Is that desirable behaviour? Or should we check and early escape in those cases?

Copy link
Contributor Author

@aryszka aryszka Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot set breakers for <shunt> and <loopback> . I think the "shunt", by its name, already breaks the circuit by itself. For the loopback, not sure, but i believe it is enough to see where it loops to.

@mo-gr
Copy link
Contributor

mo-gr commented Jul 11, 2017

👍

@mo-gr
Copy link
Contributor

mo-gr commented Jul 11, 2017

A general question, maybe more like a feature request: If a circuit breaker flips open, is it possible to purge the DNS cache for that host? Seems like a nice-to-have feature...

@@ -515,13 +547,26 @@ func (p *Proxy) do(ctx *context) error {
ctx.outgoingDebugRequest = debugReq
ctx.setResponse(&http.Response{Header: make(http.Header)}, p.flags.PreserveOriginal())
} else {
done, allow := p.checkBreaker(ctx)
if !allow {
return errCircuitBreakerOpen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read this correct, that we will only log that a circuit breaker is open, but not which one? If a breaker flips open (or more general flips state), I would like to see a log line stating clearly, which breaker for which host goes from which old state into which new state.

Is this not there or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid point. It should be there.

Besides the logs, the plan is to add an endpoint to the "metrics" -> "support" listener that reports all the breaker states. After #364 gets finished.

@aryszka
Copy link
Contributor Author

aryszka commented Jul 11, 2017

purging the DNS cache. Will look into it.

@mo-gr
Copy link
Contributor

mo-gr commented Jul 11, 2017

👍

@aryszka
Copy link
Contributor Author

aryszka commented Jul 13, 2017

👍

@DeTeam
Copy link
Contributor

DeTeam commented Jul 13, 2017

👍

@aryszka aryszka merged commit a6f2131 into master Jul 13, 2017
@aryszka aryszka deleted the circuit-breaker branch July 13, 2017 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants