-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hystrix circuitbreaker / dashboard issue #236
Comments
Sorry, I don't understand the problem, nor do I see how the That doesn't mean a bug doesn't exist, I'm just not seeing it as I read through the code. Can you please provide a unit test demonstrating the issue? The existing unit tests are here: https://github.com/Netflix/Hystrix/blob/master/hystrix-core/src/test/java/com/netflix/hystrix/HystrixCircuitBreakerTest.java Some background on the related code ... The When the circuit is tripped, The only way it will start returning false again is if an actual request (not the dashboard) gets permitted via The "single test" route is permitted via this code inside return !isOpen() || allowSingleTest(); |
I created a little unit test to recreate the problem.
|
I solved this (in my local version of Hystrix) by changing the resetCounter() logic in HystrixCommandMetrics like so:
because the getHealthCounts() used an invalid/old version of HealthCounts and this invalid/old version is also retrieved by the HystrixCircuitBreaker isOpen() : metrics.getHealthCounts(). Can you fix this in the code for 1.3.14 ? |
@rvdwenden Out of curiosity, what kind of load do you have on your circuits? This might explain some flapping behavior we've seen. |
@rvdwenden I'll be looking at this soon, been busy elsewhere. |
We have applications running on 50 tps (order of magnitude) |
Next to above mentioned proposed fix/solution, I still strongly believe that the HystrixCircuitBreaker |
I'll be looking at this closely in the near future. I'm not jumping on it as it has been this way in production for over 2 years with circuits ranging from <1rps to >5000rps and I've been busy recently on some other higher priority things. |
Don't hurry. Now we can explain this behavior, we can take migating measures to this issue on our side! But, by the way, we changed the load program, as described above, a little bit. we manage to produce loads between 1000 to 10.000 tps easily. |
Great bug report everyone ... I've fixed it in both 1.3.x and master (1.4) and will release both versions shortly. Amazing this has escaped detection for years. |
Thank you! |
Used Hystrix version (hystrix-core, hystrix-metrics-event-stream, hystrix-codehale-metrics-publisher): 1.3.13
Used Hystrix dashboard: 1.3.9
The circuitbreaker works fine (i.e. opens/closes) in our application, as one aspects, without using a dashboard session for this application.
But as soon as we use the dashboard on this application, the circuitbreaker doens't work fine anymore: i.e. the circuitbreaker opens after failures (that's ok), but doesn't recover (closes) after the services work fine again.
When we stop the dashboard for a little while, and start again, we see that the open circuit breaker is closed again ! We see this behavior over and over.
I looked at the Java circuitbreaker logic and found out that the isOpen() method does some extra bookkeeping. The circuitbreaker and streaming-servlet calls isOpen().
I changed the Hystrix' Circuitbreaker code: I changed the isOpen() to 'return circuitOpen.get()' only. I created a new method isOpenPlus() based on the existing isOpen() functionality and changed the isOpen() calls in allowRequest() into a call to isOpenPlus().
And hey presto, this works fine together with dashboard. We cannot explain why this opproach works (or at least seems to work).
Can you explain and/or fix this in a new release of Hystrix (no need to change streaming-servlet, nor the dashboard)
The text was updated successfully, but these errors were encountered: