Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s.container.status.waiting metric to semantic conventions #1672

Open
povilasv opened this issue Dec 11, 2024 · 7 comments · May be fixed by #1784
Open

Add k8s.container.status.waiting metric to semantic conventions #1672

povilasv opened this issue Dec 11, 2024 · 7 comments · May be fixed by #1784
Labels
area:k8s enhancement New feature or request

Comments

@povilasv
Copy link
Contributor

Area(s)

area:k8s

Is your change request related to a problem? Please describe.

K8s Cluster receiver uses would like to monitor K8s CrashLoop Back off state. We had a multiple issues about it:

Previously I tried to model status.waiting as Resource attribute, but because it's mutable it cannot be Resource attribute. See #997

I would like to propose adding this as metric to unblock this PR open-telemetry/opentelemetry-collector-contrib#35668

Describe the solution you'd like

  # k8s.container.* metrics
  - id: metric.k8s.container.status.waiting
    type: metric
    metric_name: k8s.container.status.waiting
    stability: experimental
    brief: "Whether container is in waiting state. (0 for no, 1 for yes)"
    instrument: gauge
    unit: ""

Describe alternatives you've considered

No response

Additional context

No response

@povilasv
Copy link
Contributor Author

@TylerHelmuth / @ChrsMark / @dmitryax woukd appreciate thoughts / feeedback 🙇

@ChrsMark
Copy link
Member

Thank's for filing this @povilasv, modeling this as a metric makes sense!

There is a proposal on how we should model such "state"/"phase"/"status" metrics in SemConv. It is already in use by hw metrics and jmx metrics as mentioned at #1032 (comment) (see #1554).
Process' status will also be modeled like this: #1212 (comment)
I have filed another one for k8s.namespace.phase: #1668 which I think will help us verify the proposal.

My only question here would be how the modeling should actually look like. What you propose aligns with kube-state-metrics but I would like to see how this will be combined with other statuses like running and terminated as well as how we reflect the reason part. Can we deal with them all at once so as to ensure that the decision will scale and cover all?

FWIWI, If I'm not mistaken we can't have k8s.container.status.waiting and k8s.container.status.waiting.reason (metric name cannot be namespace at the same time).

@povilasv
Copy link
Contributor Author

Thanks for review, I'll try to work on this next week to model this differently

@povilasv
Copy link
Contributor Author

I did some experimenting and looked at some of the k8s code (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L2057-L2423)

Some examples of the data for different states:

Terminated

  containerStatuses:
  - containerID: containerd://be5d7d76ab64bdc354b97bb7d11ea81068f70d1a633574f72afe44d374f89628
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5
    lastState:
      terminated:
        containerID: containerd://67c60d2ab1e31bbf82d27d96ffb507a3d407dc4314b72e328442fa2b5a63df1b
        exitCode: 1
        finishedAt: "2024-12-17T14:30:21Z"
        reason: Error
        startedAt: "2024-12-17T14:30:21Z"
    name: crashloop-container
    ready: false
    restartCount: 2
    started: false
    state:
      terminated:
        containerID: containerd://be5d7d76ab64bdc354b97bb7d11ea81068f70d1a633574f72afe44d374f89628
        exitCode: 1
        finishedAt: "2024-12-17T14:30:39Z"
        reason: Error
        startedAt: "2024-12-17T14:30:39Z"

Running:

- containerID: containerd://9e1c366c3ef52f6a5df772b7b26d5950612d75e5c96b2fb27b49882a48ecce35
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5
    lastState: {}
    name: always-running-container
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-12-17T14:33:50Z"

Waiting

  containerStatuses:
  - containerID: containerd://dba57c3f17b046e4ebad10351495d7f6f86db568abec77a611f81ffd3b10a378
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5
    lastState:
      terminated:
        containerID: containerd://dba57c3f17b046e4ebad10351495d7f6f86db568abec77a611f81ffd3b10a378
        exitCode: 1
        finishedAt: "2024-12-17T14:46:23Z"
        reason: Error
        startedAt: "2024-12-17T14:46:23Z"
    name: crashloop-container
    ready: false
    restartCount: 8
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=crashloop-container pod=crashloop-pod_default(f8e3ee00-f1ed-465a-bd70-e88de51219c2)
        reason: CrashLoopBackOff

Terminated is rather short state it quickly goes to "Waiting" state. All those states are exclusive.

I think we can model something like this:

  - id: metric.k8s.container.status.state
    type: metric
    metric_name: k8s.container.status.state
    stability: experimental
    brief: ""
    instrument: updowncounter
    unit: ""
    attributes:
      - ref: k8s.container.state
        requirement_level: required
      - ref: k8s.container.reason
        requirement_level: required
      - ref: k8s.pod.name
        requirement_level: required
      - ref: k8s.container.name
        requirement_level: required

Thinks to consider:

  1. Should reason be in seperate metric? As it increases cardinality quite a bit? Though we would have two metrics something like k8s.container.status.state and k8s.container.status.reason
  2. "reason" is a "string" (not an enum) in k8s and can contain any value, but usually it has format like ErrImagePull / ImagePullBackOff / ContainerStatusUnknown / CrashLoopBackOff
  3. There is also the last_terimated_reason, which is very useful to troubleshoot Waiting containers (was it oomed killed? Did process errored out? ) Should this be seperate metric?
  4. Also Container can be Readyto accept traffic or not, so likely in future we would have k8s.container.status.ready metric
  5. There is also seperate state for init cointainers, so we would likely need k8s.initcontainer.status metric?

FYI Kube-state-metrics did this:

kube_pod_container_status_terminated{container=""} 0 | 1
kube_pod_container_status_terminated_reason{container="", reason=""} 1
kube_pod_container_status_waiting{container=""} 0 | 1
kube_pod_container_status_waiting_reason{container="", reason=""} 1
kube_pod_container_status_running{container=""} 1
kube_pod_container_status_ready{container=""} 0 | 1
kube_pod_init_container_status_running{container=""} 0 | 1

@ChrsMark
Copy link
Member

ChrsMark commented Jan 9, 2025

Thank's for investigating this @povilasv!

We discussed this in the K8s SemConv WG (Wed Jan 8th) and one note here is that we should probably ensure that those attributes should be ready to be re-used in Entities in the future.

Based on the above I think we need to define the attributes explicitly in the attributes registry under a full meaningful namespace like k8s.container.status.reason and k8s.container.status.state. This would allow us to use them both as metric attributes (for now) and as entity attributes potentially in the future.

As far as having 1 or 2 different metrics is concerned, I don't have a strong preference here. Having them under 1 metric is more tidy but less flexible in case we want to change only one of them in the future. I'm not sure if this decision actually affects cardinality.

@povilasv is there any reason that you found on why KSM splits the states in different metrics? For example the kube_namespace_status_phase is a single metric where the phases are reflected in the label/attribute: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/cluster/namespace-metrics.md. Whatever we decide here should be consistent across the similar metrics in SemConv.

@povilasv
Copy link
Contributor Author

@povilasv is there any reason that you found on why KSM splits the states in different metrics? For example the kube_namespace_status_phase is a single metric where the phases are reflected in the label/attribute:

I've looked thru PRs and issues and couldn't find why. they've been like that since the beginning. Ref:

https://github.com/kubernetes/kube-state-metrics/pull/256/files
kubernetes/kube-state-metrics#276

I guess, there is one problem that "running" state doesn't have a reason. So we will either have to do set reason to empty string like k8s.container.status.state{8s.container.status.state="Running",k8s.container.status.reason="" }, or split into different metrics like KSM.

One interesting point about reasons - KSM white lists only certain reasons to prevent cardinality explosion:

Agreed. We can white-list these reasons and prevent carnality explosion, but still have useful information.
As a side note: it more and more feels like we're building the infra-store rather than time-series, which is not necessarily a bad thing (if using Prometheus 2.0), just a thought. Actually kind of neat.

I think maybe we should do the same?

Thoughts @ChrsMark ? :)

@ChrsMark
Copy link
Member

I guess, there is one problem that "running" state doesn't have a reason. So we will either have to do set reason to empty string like k8s.container.status.state{8s.container.status.state="Running",k8s.container.status.reason="" }, or split into different metrics like KSM.

Sounds good!

One interesting point about reasons - KSM white lists only certain reasons to prevent cardinality explosion:

Agreed. We can white-list these reasons and prevent carnality explosion, but still have useful information.
As a side note: it more and more feels like we're building the infra-store rather than time-series, which is not necessarily a bad thing (if using Prometheus 2.0), just a thought. Actually kind of neat.

I think maybe we should do the same?

Yeap, limiting this with an enum would make sense!

@povilasv povilasv linked a pull request Jan 22, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:k8s enhancement New feature or request
Projects
Status: In Review
2 participants