Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: fix health metrics #2987

Merged
merged 2 commits into from
Sep 17, 2020
Merged

cluster: fix health metrics #2987

merged 2 commits into from
Sep 17, 2020

Conversation

disksing
Copy link
Contributor

Signed-off-by: disksing [email protected]

What problem does this PR solve?

What is changed and how it works?

  • add metrics reset (after removing a member, you will still need to restart PD or transfer leader to reset metrics)
  • fix wrong value

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. start 3pd+1tikv cluster
  2. check /metrics output
pd_cluster_health_status{name="pd1"} 1
pd_cluster_health_status{name="pd2"} 1
pd_cluster_health_status{name="pd3"} 1
  1. delete member pd3
  2. transfer leader
  3. check /metrics output again, removed member is not displayed
pd_cluster_health_status{name="pd1"} 1
pd_cluster_health_status{name="pd2"} 1

Release note

  • Fix the issue that member health metrics not correct

@disksing disksing added component/metrics Metrics. type/bugfix This PR fixes a bug. labels Sep 17, 2020
@disksing disksing added this to the v4.0.7 milestone Sep 17, 2020
}
healthStatusGauge.WithLabelValues(member.GetName()).Set(1)
healthStatusGauge.WithLabelValues(member.GetName()).Set(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 0 is unhealthy now.

Copy link
Member

@HunDunDM HunDunDM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 17, 2020
Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Sep 17, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 17, 2020
@nolouch
Copy link
Contributor

nolouch commented Sep 17, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 17, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit 66c2616 into tikv:master Sep 17, 2020
@disksing disksing deleted the reset-metrics branch September 17, 2020 16:35
@nolouch nolouch added needs-cherry-pick-release-3.0 The PR needs to cherry pick to release-3.0 branch. needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. labels Jan 20, 2021
@nolouch
Copy link
Contributor

nolouch commented Jan 20, 2021

/run-cherry-picker

ti-srebot pushed a commit to ti-srebot/pd that referenced this pull request Jan 20, 2021
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #3368

ti-srebot pushed a commit to ti-srebot/pd that referenced this pull request Jan 20, 2021
@ti-srebot
Copy link
Contributor

cherry pick to release-3.0 in PR #3369

ti-chi-bot added a commit that referenced this pull request Jan 26, 2021
* cherry pick #2987 to release-4.0

Signed-off-by: ti-srebot <[email protected]>

* resolve conflict

Signed-off-by: nolouch <[email protected]>

Co-authored-by: disksing <[email protected]>
Co-authored-by: nolouch <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/metrics Metrics. needs-cherry-pick-release-3.0 The PR needs to cherry pick to release-3.0 branch. needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

collectHealthStatus() in server/cluster/cluster.go doesn't handle removed member
4 participants