-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new rabbitmqctl features for monitoring #916
Use new rabbitmqctl features for monitoring #916
Conversation
if [ "$rc_timeouts" -eq 2 ]; then | ||
master_score 0 | ||
return $OCF_ERR_GENERIC | ||
elif [ $rc -ne 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be something like (rc_timeouts == 0 AND rc != 0) because if (rc_timeouts == 1 AND rc == 137), that is a timeout which should be ignored
stopped/demoted. | ||
</longdesc> | ||
<shortdesc lang="en">Use --local option for list_queues</shortdesc> | ||
<content type="string" default="${OCF_RESKEY_rmq_feature_local_list_queues_default}" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and above you can use type="boolean", like it is done for 'debug' parameter, for example. Though I am not sure if it will affect anything at all.
@binarin: overall patch looks good to me, it looks like it should remove load done by our current monitoring. |
a81272b
to
464f54b
Compare
What's the conclusion on this? Should we merge it? |
+1, but please remove DO NOT MERGE if you think it's done and it works for you |
@michaelklishin Please don't merge it yet, I'm still testing. |
This currently doesn't merge cleanly. |
This will stop wasting network bandwidth for monitoring. E.g. a 200-node OpenStack installation produces aronud 10k queues and 10k channels. Doing single list_queues/list_channels in cluster in this environment results in 27k TCP packets and around 12 megabytes of network traffic. Given that this calls happen ~10 times a minute with 3 controllers, it results in pretty significant overhead. To enable those features you shoud have rabbitmq containing following patches: - rabbitmq#883 - rabbitmq#911 - rabbitmq#915
464f54b
to
99f2a48
Compare
Rebased and tested again. Now I'm happy with this patch. |
Thank you! |
This will stop wasting network bandwidth for monitoring. E.g. a 200-node OpenStack installation produces aronud 10k queues and 10k channels. Doing single list_queues/list_channels in cluster in this environment results in 27k TCP packets and around 12 megabytes of network traffic. Given that this calls happen ~10 times a minute with 3 controllers, it results in pretty significant overhead. Upstream change: - rabbitmq/rabbitmq-server#916 To enable those features you shoud have rabbitmq containing following patches: - rabbitmq/rabbitmq-server#883 - rabbitmq/rabbitmq-server#911 - rabbitmq/rabbitmq-server#915 Change-Id: Icfde3360b42a841ad3a219b94f65a69b2a18cea7 Closes-Bug: 1614071
@dmitrymex @bogdando WDYT? I haven't tested it yet, but if you are OK with overall shape of this patch, I'll start polishing and testing it.
To stop wasting network bandwidth during health checks (e.g. list_queues
in 3-node cluster with 10k queues costs on average 12 megabytes of
traffic and 27k TCP packets).
Features are disabled by default to preserve compatibility, but they
SHOULD be enabled when following patches are present in currently used
rabbitmq version:
node_health_check
completely node-local #883