Skip to content
This repository has been archived by the owner on Nov 14, 2020. It is now read-only.

Prevent non-zero publisher count in Grafana when aggregating metrics #61

Merged
merged 1 commit into from
Nov 13, 2020

Conversation

coro
Copy link
Contributor

@coro coro commented Nov 12, 2020

In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually n publishers in an n-node
cluster.

Proposed Changes

This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known). This avoids the issue where there is ambiguity where a channel
which is publishing, but has not actively published yet.

Prior to this change, a cluster configured with prometheus.return_per_object_metrics = false will always report a non-zero number of publishers, even if there is no traffic on that cluster. For my example screenshots, I am using a three-node cluster with a random RabbitMQ pod being killed every minute, over the course of 5 minutes. You can see that there are always n publishers reported in Grafana for n nodes:
image

After this change, this is correctly reported as 0:
image

I confirmed that the metric behaves as usual where there are non-zero publishers using PerfTest with 5 consumers & publishers:
image

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • [N/A?] All tests pass locally with my changes
  • [N/A] I have added tests that prove my fix is effective or that my feature works
  • [N/A] I have added necessary documentation (if appropriate)
  • [N/A] Any dependent changes have been merged and published in related repositories

Further Comments

Discovered this with @gerhard while we were working on rabbitmq/tgir#19

In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually `n` publishers in an `n`-node
cluster. This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known).
@coro coro requested a review from gerhard as a code owner November 12, 2020 15:44
@gerhard
Copy link
Contributor

gerhard commented Nov 13, 2020

Works as expected, thanks! We need to update the dashboard json to the one generated by Grafana 7 before we can upload this to grafana.com. Do this post-merge.

The failing tests are not related to this since no source has changed. We need to drop OTP v21.3 support, Bump Elixir to >= 1.10 & add OTP v23.1 support. This will happen part of the monorepo migration, no point in fixing these GitHub Actions.

@gerhard gerhard merged commit 9b97a33 into master Nov 13, 2020
@gerhard gerhard deleted the grafana-publisher-fix branch November 13, 2020 12:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants