Some job metrics are missing if job has no conditions #2443

DerRockWolf · 2024-07-07T17:42:35Z

What happened:
Some job metrics (kube_job_status_failed, kube_job_complete, kube_job_failed) are missing for Jobs without conditions.

What you expected to happen:
Metrics are present regardless if conditions exist.

How to reproduce it (as minimally and precisely as possible):

Create this job

job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: test
spec:
  template:
    spec:
      containers:
      - image: alpine
        name: test
        command:
          - "sh"
          - "-c"
        args:
          - sleep 5 && exit 1
      restartPolicy: Never

Observe that the .status is missing a conditions objects (before the backoffLimit is reached)
Observe that e.g., the kube_job_status_failed metric is missing (curl localhost:8080/metrics | grep kube_job_status_failed)

Anything else we need to know?:

The reason for this bug is that the labels and values are only set within a for loop (if a condition of type Failed exists):

kube-state-metrics/internal/store/job.go

Lines 222 to 224 in 85d1423

    
           for _, c := range j.Status.Conditions { 
        
           	condition := c 
        
           	if condition.Type == v1batch.JobFailed {

I can provide a fix that sets the value regardless if there are any condition with type Failed.

Environment:

kube-state-metrics version: v2.12.0 (master 85d1423)
Kubernetes version (use kubectl version): v1.29.6
Cloud provider or hardware configuration: homelab

The text was updated successfully, but these errors were encountered:

dgrisonnet · 2024-08-08T16:45:41Z

/assign @richabanker
/triage accepted

richabanker · 2024-08-20T01:49:50Z

I just tried to reproduce this, so far I am actually able to see the kube_job_status_failed metric being reported

# HELP kube_job_status_failed The number of pods which reached Phase Failed.
# TYPE kube_job_status_failed gauge
kube_job_status_failed{namespace="default",job_name="test"} 0

but yes, I dont see the kube_job_complete,kube_job_failed metrics, which I believe is WAI since these metrics should only be reported when the job's Status.Condition.Type changes to Complete or Failed

DerRockWolf · 2024-08-20T21:43:42Z

kube_job_status_failed was there for roughly 30 seconds and then disappeared.

From my perspective the kube_job_status_succeeded, kube_job_status_failed & kube_job_status_active metrics should all be emitted regardless of the overall job conditions. kube_job_status_failed is currently the only one depending on the condition being present.

I listed kube_job_complete & kube_job_failed because they also depend on the existence of an condition, but I agree that these probably work as designed.

I also found that kube_job_status_failed isn't correctly implemented.
The description states: "The number of pods which reached Phase Failed and the reason for failure", but currently the number of failed pods is only present if reasonKnown is false...

richabanker · 2024-08-22T04:40:30Z

Discussed briefly with @dgrisonnet, the suggestion to emit kube_job_status_failed even when there are no job conditions seems worthwhile. @DerRockWolf would you be open to create a PR for that? If not, then I can get started on one, please let us know what you prefer. Thanks!

DerRockWolf added the kind/bug Categorizes issue or PR as related to a bug. label Jul 7, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 7, 2024

k8s-ci-robot assigned richabanker Aug 8, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 8, 2024

richabanker mentioned this issue Aug 23, 2024

fix: set kube_job_status_failed metric even when there are no job.Status.Conditions present #2485

Merged

k8s-ci-robot closed this as completed in #2485 Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some job metrics are missing if job has no conditions #2443

Some job metrics are missing if job has no conditions #2443

DerRockWolf commented Jul 7, 2024 •

edited

Loading

dgrisonnet commented Aug 8, 2024

richabanker commented Aug 20, 2024

DerRockWolf commented Aug 20, 2024

richabanker commented Aug 22, 2024

Some job metrics are missing if job has no conditions #2443

Some job metrics are missing if job has no conditions #2443

Comments

DerRockWolf commented Jul 7, 2024 • edited Loading

dgrisonnet commented Aug 8, 2024

richabanker commented Aug 20, 2024

DerRockWolf commented Aug 20, 2024

richabanker commented Aug 22, 2024

DerRockWolf commented Jul 7, 2024 •

edited

Loading