Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The add_kubernetes_metadata function of filebeat cannot collect .out file logs and match the metdata information of k8s pod #42318

Open
LTP7534 opened this issue Jan 16, 2025 · 1 comment
Labels
needs_team Indicates that the issue/PR needs a Team:* label

Comments

@LTP7534
Copy link

LTP7534 commented Jan 16, 2025

In the k8s environment, we need to collect the contents of files in the emptydir, and we want the output information to include the metadata information of the pod.

I can reproduce this with the following config:

- type: log
  enabled: true
  tags: ["emptydir"]
  fields_under_root: true
  ignore_older: 72h
  fields:
    filebeat_cluster: ai-offline-dev
    hostname: ${NODE_NAME}
    host_ip: ${NODE_IP}
  paths:
    - /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/ray-log/session_*/logs/*
  recursive_glob.enabled: true
  encoding: utf-8
  scan_frequency: 5s
  harvester_buffer_size: 16384
  max_bytes: 10485760
  multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:'
  multiline.match: after
  multiline.negate: false
  multiline.max_lines: 1000
  multiline.timeout: 5s
  processors:
  - add_kubernetes_metadata:
      host: ${NODE_NAME}
      in_cluster: true
      labels.dedot: true
      annotations.dedot: true
      default_indexers.enabled: false
      default_matchers.enabled: false
      indexers:
        - pod_uid:
      matchers:
        - logs_path:
            logs_path: '/var/lib/kubelet/pods/'
            resource_type: 'pod'

/var/lib/kubelet/pods//volumes/kubernetes.io~empty-dir/ray-log/session_/logs/ contains log files ending with .out and .log, such as a.out and b.log. Only log files ending with .log can match the metdata of the pod.

2025-01-15T07:54:43.791Z DEBUG [kubernetes] add_kubernetes_metadata/matchers.go:88 Incoming log.file.path value: /var/lib/kubelet/pods/988c369d-baaf-43ba-a576-b489244a806b/volumes/kuberetes.io~empty-dir/ray-log/session_2024-01-14/logs/gcs_server.out
2025-01-14T09:42:57.774Z DEBUG [kubernetes] add_kubernetes_metadata/kubernetes.go:248 No container match string, not adding kubernetes data   {"libbeat.processor": "add_kubernetes_metadata"}

We see in the code that only .log files can extract pod_uid. Is it possible to remove this restriction?
https://github.com/elastic/beats/blob/main/filebeat/processor/add_kubernetes_metadata/matchers.go

		if strings.Contains(source, ".log") && !strings.HasSuffix(source, ".gz") {
			// Specify a pod resource type when writing logs into manually mounted log volume,
			// those logs apper under under "/var/lib/kubelet/pods/<pod_id>/volumes/..."
			if strings.HasPrefix(f.LogsPath, podKubeletLogsPath()) {
				pathDirs := strings.Split(source, pathSeparator)
				podUIDPos := 5
				if len(pathDirs) > podUIDPos {
					podUID := strings.Split(source, pathSeparator)[podUIDPos]
					f.logger.Debugf("Using pod uid: %s", podUID)
					return podUID
				}
			}
			// In case of the Kubernetes log path "/var/log/pods/",
			// the pod ID will be extracted from the directory name,
			// file name example: "/var/log/pods/'<namespace>_<pod_name>_<pod_uid>'/container_name/0.log".
			if strings.HasPrefix(f.LogsPath, podLogsPath()) {
				pathDirs := strings.Split(source, pathSeparator)
				podUIDPos := 4
				if len(pathDirs) > podUIDPos {
					podUID := strings.Split(pathDirs[podUIDPos], "_")
					if len(podUID) > 2 {
						f.logger.Debugf("Using pod uid: %s", podUID[2])
						return podUID[2]
					}
				}
			}

			f.logger.Error("Error extracting pod uid - source value does not contains matcher's logs_path")
			return ""
		}
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 16, 2025
@botelastic
Copy link

botelastic bot commented Jan 16, 2025

This issue doesn't have a Team:<team> label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

No branches or pull requests

1 participant