Skip to content

Commit

Permalink
Add optional native OTel logs collection (#197)
Browse files Browse the repository at this point in the history
* Add optional native OTel logs collection

We want to replace fluentd logs collection with native OTel logs collection. For now we adding this as an option. Later it will become the default way to collect logs

The configuration for native logs collection mostly borrowed from https://github.com/splunk/sck-otel

Extra changes to the borrowed code:
- move "stream" from resource to log attribute
- fix otel agent container path: otelcollector -> otel-collector
  • Loading branch information
dmitryax authored Sep 23, 2021
1 parent 61a0926 commit c343815
Show file tree
Hide file tree
Showing 8 changed files with 267 additions and 10 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## Unreleased

### Added

- Add native OTel logs collection as an option (#197)

### Removed

- Remove PodSecurityPolicy installation option (#195)
Expand Down
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,35 @@ $ helm install my-splunk-otel-collector \
splunk-otel-collector-chart/splunk-otel-collector
```

## Logs collection

The helm chart currently utilizes [fluentd](https://docs.fluentd.org/) for Kubernetes logs
collection. Logs collected with fluentd are sent through Splunk OTel Collector agent which
does all the necessary metadata enrichment.

OpenTelemetry Collector also has
[native functionality for logs collection](https://github.com/open-telemetry/opentelemetry-log-collection).
This chart soon will be migrated from fluentd to the OpenTelemetry logs collection.

You already have an option to use OpenTelemetry logs collection instead of fluentd.
The following configuration can be used to achieve that:

```yaml
fluentd:
enabled: false
logsCollection:
enabled: true
```

There are following known limitations of native OTel logs collection:

- Container attributes `container.id` and `container.image.name` are missed.
This means that correlation between Splunk Log Observer and Splunk Infrastructure will not work
on container level, but only on kubernetes pod level.
- `service.name` attribute will not be automatically constructed in istio environment.
This means that correlation between logs and traces will not work in Splunk Observability.
Logs collection with fluentd is still recommended if chart deployed with `autodetect.istio=true`.

## Additional telemetry sources

Use `autodetect` config option to enable additional telemetry sources.
Expand Down
151 changes: 149 additions & 2 deletions helm-charts/splunk-otel-collector/templates/config/_otel-agent.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ The values can be overridden in .Values.otelAgent.config
*/}}
{{- define "splunk-otel-collector.otelAgentConfig" -}}
extensions:
{{- if .Values.logsCollection.enabled}}
file_storage:
directory: {{ .Values.logsCollection.checkpointPath }}
{{- end }}

health_check:

k8s_observer:
Expand Down Expand Up @@ -94,6 +99,137 @@ receivers:
listenAddress: 0.0.0.0:9080
{{- end }}

{{- if and .Values.logsCollection.enabled .Values.logsCollection.containers.enabled }}
filelog:
include: ["/var/log/pods/*/*/*.log"]
# Exclude logs. The file format is
# /var/log/pods/<namespace_name>_<pod_name>_<pod_uid>/<container_name>/<run_id>.log
exclude:
{{- if .Values.logsCollection.containers.excludeAgentLogs }}
- /var/log/pods/{{ .Release.Namespace }}_{{ include "splunk-otel-collector.fullname" . }}*_*/otel-collector/*.log
{{- end }}
{{- range $_, $excludePath := .Values.logsCollection.containers.exclude_paths }}
- {{ $excludePath }}
{{- end }}
start_at: beginning
include_file_path: true
include_file_name: false
poll_interval: 200ms
max_concurrent_files: 1024
encoding: nop
fingerprint_size: 1kb
max_log_size: 1MiB
operators:
{{- if not .Values.logsCollection.containers.containerRuntime }}
- type: router
id: get-format
routes:
- output: parser-docker
expr: '$$$$body matches "^\\{"'
- output: parser-crio
expr: '$$$$body matches "^[^ Z]+ "'
- output: parser-containerd
expr: '$$$$body matches "^[^ Z]+Z"'
{{- end }}
{{- if or (not .Values.logsCollection.containers.containerRuntime) (eq .Values.logsCollection.containers.containerRuntime "cri-o") }}
# Parse CRI-O format
- type: regex_parser
id: parser-crio
regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<log>.*)$'
timestamp:
parse_from: time
layout_type: gotime
layout: '2006-01-02T15:04:05.000000000-07:00'
- type: recombine
id: crio-recombine
combine_field: log
is_last_entry: "($$.logtag) == 'F'"
- type: restructure
id: crio-handle_empty_log
output: filename
if: $$.log == nil
ops:
- add:
field: log
value: ""
{{- end }}
{{- if or (not .Values.logsCollection.containers.containerRuntime) (eq .Values.logsCollection.containers.containerRuntime "containerd") }}
# Parse CRI-Containerd format
- type: regex_parser
id: parser-containerd
regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<log>.*)$'
timestamp:
parse_from: time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
- type: recombine
id: containerd-recombine
combine_field: log
is_last_entry: "($$.logtag) == 'F'"
- type: restructure
id: containerd-handle_empty_log
output: filename
if: $$.log == nil
ops:
- add:
field: log
value: ""
{{- end }}
{{- if or (not .Values.logsCollection.containers.containerRuntime) (eq .Values.logsCollection.containers.containerRuntime "docker") }}
# Parse Docker format
- type: json_parser
id: parser-docker
timestamp:
parse_from: time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
{{- end }}
- type: metadata
id: filename
resource:
com.splunk.source: EXPR($$$$attributes["file.path"])
# Extract metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^\/var\/log\/pods\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\._]+)\/(?P<run_id>\d+)\.log$'
parse_from: $$$$attributes["file.path"]
# Move out attributes to Attributes
- type: metadata
resource:
k8s.pod.uid: 'EXPR($$.uid)'
run_id: 'EXPR($$.run_id)'
k8s.container.name: 'EXPR($$.container_name)'
k8s.namespace.name: 'EXPR($$.namespace)'
k8s.pod.name: 'EXPR($$.pod_name)'
com.splunk.sourcetype: 'EXPR("kube:container:"+$$.container_name)'
attributes:
stream: 'EXPR($$.stream)'
{{- if .Values.logsCollection.containers.multilineConfigs }}
- type: router
routes:
{{- range $.Values.logsCollection.containers.multilineConfigs }}
- output: {{ .containerName | quote }}
expr: '($$$$resource["k8s.container.name"]) == {{ .containerName | quote }}'
{{- end }}
default: clean-up-log-record
{{- range $.Values.logsCollection.containers.multilineConfigs }}
- type: recombine
id: {{.containerName | quote }}
output: clean-up-log-record
combine_field: log
is_first_entry: '($$.log) matches {{ .first_entry_regex | quote }}'
{{- end }}
{{- end }}
{{- with .Values.logsCollection.containers.extraOperators }}
{{ . | toYaml | nindent 6 }}
{{- end }}
# Clean up log record
- type: restructure
id: clean-up-log-record
ops:
- move:
from: log
to: $$
{{- end }}
# By default k8s_tagger and batch processors enabled.
processors:
# k8s_tagger enriches traces and metrics with k8s metadata
Expand Down Expand Up @@ -231,7 +367,13 @@ exporters:
sync_host_metadata: true

service:
extensions: [health_check, k8s_observer, zpages]
extensions:
{{- if .Values.logsCollection.enabled }}
- file_storage
{{- end }}
- health_check
- k8s_observer
- zpages

# By default there are two pipelines sending metrics and traces to standalone otel-collector otlp format
# or directly to signalfx backend depending on otelCollector.enabled configuration.
Expand All @@ -240,7 +382,12 @@ service:
pipelines:
{{- if .Values.logsEnabled }}
logs:
receivers: [fluentforward, otlp]
receivers:
{{- if and .Values.logsCollection.enabled .Values.logsCollection.containers.enabled }}
- filelog
{{- end }}
- fluentforward
- otlp
processors:
- memory_limiter
- groupbyattrs/logs
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{ if and .Values.logsEnabled .Values.otelAgent.enabled }}
{{ if and .Values.logsEnabled .Values.otelAgent.enabled .Values.fluentd.enabled }}
{{/*
Fluentd config parts applied only to clusters with containerd/cri-o runtime.
*/}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{ if and .Values.logsEnabled .Values.otelAgent.enabled }}
{{ if and .Values.logsEnabled .Values.otelAgent.enabled .Values.fluentd.enabled }}
{{/*
Fluentd config parts applied only to clusters with docker runtime.
*/}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{ if and .Values.logsEnabled .Values.otelAgent.enabled }}
{{ if and .Values.logsEnabled .Values.otelAgent.enabled .Values.fluentd.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
30 changes: 27 additions & 3 deletions helm-charts/splunk-otel-collector/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
chart: {{ template "splunk-otel-collector.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- if .Values.logsEnabled }}
{{- if and .Values.logsEnabled .Values.fluentd.enabled }}
engine: fluentd
{{- end }}
{{- if .Values.otelAgent.annotations }}
Expand Down Expand Up @@ -54,7 +54,7 @@ spec:
tolerations:
{{ toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.logsEnabled }}
{{- if and .Values.logsEnabled .Values.fluentd.enabled }}
initContainers:
- name: prepare-fluentd-config
image: {{ .Values.image.fluentd.initContainer.image }}
Expand Down Expand Up @@ -86,7 +86,7 @@ spec:
mountPath: /fluentd/etc/cri
{{- end }}
containers:
{{- if .Values.logsEnabled }}
{{- if and .Values.logsEnabled .Values.fluentd.enabled }}
- name: fluentd
image: {{ template "splunk-otel-collector.image.fluentd" . }}
imagePullPolicy: {{ .Values.image.fluentd.pullPolicy }}
Expand Down Expand Up @@ -220,12 +220,23 @@ spec:
readOnly: true
mountPropagation: HostToContainer
{{- end }}
{{- if and .Values.logsEnabled .Values.logsCollection.enabled }}
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: checkpoint
mountPath: {{ .Values.logsCollection.checkpointPath }}
{{- end }}
{{- if .Values.otelAgent.extraVolumeMounts }}
{{- toYaml .Values.otelAgent.extraVolumeMounts | nindent 8 }}
{{- end }}
terminationGracePeriodSeconds: {{ .Values.terminationGracePeriodSeconds }}
volumes:
{{- if .Values.logsEnabled }}
{{- if .Values.fluentd.enabled }}
- name: varlog
hostPath:
path: {{ .Values.fluentd.config.containers.path }}
Expand All @@ -250,6 +261,19 @@ spec:
configMap:
name: {{ template "splunk-otel-collector.fullname" . }}-fluentd-json
{{- end}}
{{- if .Values.logsCollection.enabled }}
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: checkpoint
hostPath:
path: {{ .Values.logsCollection.checkpointPath }}
type: DirectoryOrCreate
{{- end}}
{{- end}}
{{- if .Values.metricsEnabled }}
- name: hostfs
hostPath:
Expand Down
57 changes: 55 additions & 2 deletions helm-charts/splunk-otel-collector/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -234,11 +234,64 @@ otelK8sClusterReceiver:
# existing fields can be disabled by setting them to null value.
config: {}

################################################################################
# Fluentd configuration for logs collection
#################################################################
# Native OtelTelemetry logs collection using
# https://github.com/open-telemetry/opentelemetry-log-collection.
# Status: Experimental / disabled by default in favor of fluentd.
#################################################################

logsCollection:
# Otel native logs collection is disabled by default.
# If you want to enabled it, make sure to disable fluentd to avoid duplication.
enabled: false

# Container logs collection
containers:
enabled: true
# Container runtime. One of `docker`, `cri-o`, or `containerd`
# Automatically discovered if not set.
containerRuntime: ""
# Paths of logfiles to exclude. object type is array:
# i.e. to exclude `kube-system` namespace,
# excludePaths: ["/var/log/pods/kube-system_*/*/*.log"]
excludePaths: []
# Boolean for ingesting the agent's own log
excludeAgentLogs: true
# Extra operators for container logs.
# https://github.com/open-telemetry/opentelemetry-log-collection/blob/main/docs/operators/README.md#what-operators-are-available
extraOperators: []

# Multiline logs currently support only container name which is not unique within the cluster.
# TODO: support k8s object owner name (deployment, daemonset, statefulset) along with namespace
multilineConfigs: []
# Example configuration for processing and handling multiline logs/ java stack trace from test
# container "buttercup-app" by specifying a regular expression (first_entry_regex) to parse first
# entry in a multiline log series.
# Sample java stack trace/ multiline log from container "buttercup-app"
# .........
# Exception in thread "main" java.lang.NumberFormatException: For input string: "3.1415"
# at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
# at java.lang.Integer.parseInt(Integer.java:580)
# at ExampleCli.parseNumericArgument(ExampleCli.java:47)
# at ExampleCli.parseCliOptions(ExampleCli.java:27)
# at ExampleCli.main(ExampleCli.java:11)
# .........
# Sample configuration to handle multiline java stack trace from buttercup-app container
# multilineConfigs:
# - containerName: buttercup-app
# first_entry_regex: ^.+Exception[^\n]++(\s+at.*)+

checkpointPath: "/var/lib/otel_pos"

################################################################################
# Fluentd sidecar configuration for logs collection.
# As of now, this is the recommended way to collect k8s logs,
# but it will be replaced by the native otel logs collection soon.
################################################################################

fluentd:
enabled: true

resources:
limits:
cpu: 500m
Expand Down

0 comments on commit c343815

Please sign in to comment.