Harvest should create resolution metrics for health alerts #2804

rahulguptajss · 2024-04-03T08:23:06Z

Thanks @faguayot for raising here

rahulguptajss · 2024-06-24T07:00:10Z

@faguayot This feature is available through nightly builds. Please do provide your feedback once you've had a chance to try these changes.

We publish a value of 1 for health metrics when an alert is detected and 0 once it is resolved.

faguayot · 2024-06-25T09:22:11Z

Hello @rahulguptajss,
I was trying the nightly but I don't know what is the name of new metric that you have created for health? I was checking health_ha_alerts which is an old metric but I don't find any information on that.

The other alerts related to health that I have in my prometheus database are the following:

Thanks.

rahulguptajss · 2024-06-25T09:39:00Z

@faguayot The names of the alerts will remain the same. The only difference is in the value: a value of 1 indicates that the alert is raised or active, while a value of 0 indicates that the alert is resolved.

For example, health_ha_alerts == 1 means there is an HA issue. Once this issue is resolved, a metric health_ha_alerts == 0 will be published to mark the earlier alert as resolved. Note that health_ha_alerts == 0 is not always published, it is only published when an issue related to health_ha_alerts == 1 is resolved and is done so once per relevant issue instance resolution.

faguayot · 2024-06-25T10:55:58Z

Ok, in that case I can't check the information until something happens in the HA. I thought that you write everytime the good state (in this case a 0) and when something happens you will write a 1 (failed state). I don't know if you can test it and show me the information available for the metric.

When something happens in our environment, I will be pending to review these parameters. Thanks for the implementation.

rahulguptajss · 2024-06-26T07:24:18Z

@faguayot We only record a failed state (1) when an issue occurs. Once the issue is resolved, we write a good state (0) to signal the resolution. Below is an example with health_lif_alerts.

When the LIF is not home, we continuously publish the following until a failure state is detected:

Once the LIF is back home, we publish a good state (0) once to indicate the issue is resolved:

Hardikl · 2024-08-09T09:17:23Z

Verified in 24.08 with commit 4e3945c

When home port is changed via UI

When the port reverted to home port via UI

rahulguptajss added feature New feature or request customer labels Apr 3, 2024

rahulguptajss mentioned this issue Apr 3, 2024

Alert for HA Pair Going Down #2315

Closed

rahulguptajss self-assigned this Apr 8, 2024

cgrinds added the 24.08 label May 23, 2024

rahulguptajss added status/open and removed status/open labels May 28, 2024

rahulguptajss linked a pull request Jun 10, 2024 that will close this issue

feat: Create resolution metrics for health alerts #2977

Merged

cgrinds closed this as completed in #2977 Jun 12, 2024

rahulguptajss added status/testme and removed status/open labels Jun 12, 2024

cgrinds unassigned rahulguptajss Aug 6, 2024

Hardikl added status/done and removed status/testme labels Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harvest should create resolution metrics for health alerts #2804

Harvest should create resolution metrics for health alerts #2804

rahulguptajss commented Apr 3, 2024

rahulguptajss commented Jun 24, 2024 •

edited

Loading

faguayot commented Jun 25, 2024

rahulguptajss commented Jun 25, 2024

faguayot commented Jun 25, 2024

rahulguptajss commented Jun 26, 2024

Hardikl commented Aug 9, 2024

Harvest should create resolution metrics for health alerts #2804

Harvest should create resolution metrics for health alerts #2804

Comments

rahulguptajss commented Apr 3, 2024

rahulguptajss commented Jun 24, 2024 • edited Loading

faguayot commented Jun 25, 2024

rahulguptajss commented Jun 25, 2024

faguayot commented Jun 25, 2024

rahulguptajss commented Jun 26, 2024

Hardikl commented Aug 9, 2024

rahulguptajss commented Jun 24, 2024 •

edited

Loading