Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new rabbitmqctl features for monitoring #916

Conversation

binarin
Copy link
Contributor

@binarin binarin commented Aug 10, 2016

@dmitrymex @bogdando WDYT? I haven't tested it yet, but if you are OK with overall shape of this patch, I'll start polishing and testing it.

To stop wasting network bandwidth during health checks (e.g. list_queues
in 3-node cluster with 10k queues costs on average 12 megabytes of
traffic and 27k TCP packets).

Features are disabled by default to preserve compatibility, but they
SHOULD be enabled when following patches are present in currently used
rabbitmq version:

if [ "$rc_timeouts" -eq 2 ]; then
master_score 0
return $OCF_ERR_GENERIC
elif [ $rc -ne 0 ]; then
Copy link
Contributor

@dmitrymex dmitrymex Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be something like (rc_timeouts == 0 AND rc != 0) because if (rc_timeouts == 1 AND rc == 137), that is a timeout which should be ignored

@michaelklishin michaelklishin self-assigned this Aug 11, 2016
stopped/demoted.
</longdesc>
<shortdesc lang="en">Use --local option for list_queues</shortdesc>
<content type="string" default="${OCF_RESKEY_rmq_feature_local_list_queues_default}" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and above you can use type="boolean", like it is done for 'debug' parameter, for example. Though I am not sure if it will affect anything at all.

@dmitrymex
Copy link
Contributor

@binarin: overall patch looks good to me, it looks like it should remove load done by our current monitoring.

@binarin binarin force-pushed the rabbitmq-server-new-shiny-ocf-health-check branch from a81272b to 464f54b Compare August 17, 2016 12:19
@michaelklishin
Copy link
Member

What's the conclusion on this? Should we merge it?

@bogdando
Copy link

bogdando commented Aug 18, 2016

+1, but please remove DO NOT MERGE if you think it's done and it works for you

@binarin
Copy link
Contributor Author

binarin commented Aug 18, 2016

@michaelklishin Please don't merge it yet, I'm still testing.

@michaelklishin
Copy link
Member

This currently doesn't merge cleanly.

This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

To enable those features you shoud have rabbitmq containing following
patches:
- rabbitmq#883
- rabbitmq#911
- rabbitmq#915
@binarin binarin force-pushed the rabbitmq-server-new-shiny-ocf-health-check branch from 464f54b to 99f2a48 Compare August 23, 2016 09:27
@binarin binarin changed the title DO NOT MERGE Use new rabbitmqctl features for monitoring Use new rabbitmqctl features for monitoring Aug 23, 2016
@binarin
Copy link
Contributor Author

binarin commented Aug 23, 2016

Rebased and tested again. Now I'm happy with this patch.

@michaelklishin michaelklishin added this to the 3.6.6 milestone Aug 23, 2016
@michaelklishin michaelklishin merged commit 29a12b6 into rabbitmq:stable Aug 23, 2016
@michaelklishin
Copy link
Member

Thank you!

openstack-gerrit pushed a commit to openstack-archive/fuel-library that referenced this pull request Aug 24, 2016
This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

Upstream change:
- rabbitmq/rabbitmq-server#916

To enable those features you shoud have rabbitmq containing following patches:
- rabbitmq/rabbitmq-server#883
- rabbitmq/rabbitmq-server#911
- rabbitmq/rabbitmq-server#915

Change-Id: Icfde3360b42a841ad3a219b94f65a69b2a18cea7
Closes-Bug: 1614071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants