Skip to content

Commit

Permalink
fix: GPU metrics scraping (#3202) (#3205)
Browse files Browse the repository at this point in the history
Co-authored-by: Christophe Jauffret <[email protected]>
  • Loading branch information
mesosphere-ci and tuxtof authored Feb 24, 2025
1 parent e52cd50 commit 3e46edb
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 15 deletions.
15 changes: 0 additions & 15 deletions services/kube-prometheus-stack/69.1.2/defaults/cm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -225,21 +225,6 @@ data:
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- job_name: 'gpu_metrics'
metrics_path: /metrics
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '$${1}:9400'
target_label: __address__
- source_labels: [__meta_kubernetes_node_labelpresent_nvidia_com_gpu_count]
regex: true
action: keep
- job_name: 'kubernetes-calico-node'
metrics_path: /metrics
tls_config:
Expand Down
4 changes: 4 additions & 0 deletions services/nvidia-gpu-operator/24.9.2/defaults/cm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ data:
version: 3.3.9-1-ubuntu22.04
dcgmExporter:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
prometheus.kommander.d2iq.io/select: "true"
version: 4.0.0-4.0.1-ubuntu22.04
validator:
repository: nvcr.io/nvidia/cloud-native
Expand Down

0 comments on commit 3e46edb

Please sign in to comment.