-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTel Agent Feature Support #1559
Conversation
func volumeMountsForInitConfig() []corev1.VolumeMount { | ||
return []corev1.VolumeMount{ | ||
common.GetVolumeMountForLogs(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete initContainer configs, is this even in otel agent code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake, meant to remove for otel agent voluments. undid the change
@@ -44,6 +44,8 @@ type DatadogAgentSpec struct { | |||
type DatadogFeatures struct { | |||
// Application-level features | |||
|
|||
// OTelAgent configuration. | |||
OTelAgent *OTelAgentFeatureConfig `json:"otelAgent,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is otelAgent
really best name for this feature? should this be otelCollector
or just otel
? other features don't mention 'Agent'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed feature from otelAgent
to otelCollector
.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1559 +/- ##
==========================================
+ Coverage 48.56% 48.96% +0.39%
==========================================
Files 226 227 +1
Lines 20435 20626 +191
==========================================
+ Hits 9925 10100 +175
- Misses 9989 10001 +12
- Partials 521 525 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
… instance to replace
…ations example with remove note
@@ -104,6 +104,11 @@ func agentImage() string { | |||
return fmt.Sprintf("%s/%s:%s", v2alpha1.DefaultImageRegistry, v2alpha1.DefaultAgentImageName, defaulting.AgentLatestVersion) | |||
} | |||
|
|||
func otelAgentImage() string { | |||
// todo(mackjmr): make this dynamic once we have a non-dev image. | |||
return "datadog/agent-dev:nightly-ot-beta-main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the ETA of dev image? Will this todo be resolved in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this todo be resolved in this PR?
No, and defering to @dineshg13 for ETA of non-dev image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved with the rest of images constants
datadog-operator/pkg/defaulting/images.go
Lines 25 to 37 in 756ed08
GCRContainerRegistry ContainerRegistry = "gcr.io/datadoghq" | |
// DockerHubContainerRegistry corresponds to the datadoghq docker.io registry | |
DockerHubContainerRegistry ContainerRegistry = "docker.io/datadog" | |
// PublicECSContainerRegistry corresponds to the datadoghq PublicECSContainerRegistry registry | |
PublicECSContainerRegistry ContainerRegistry = "public.ecr.aws/datadog" | |
// DefaultImageRegistry corresponds to the datadoghq containers registry | |
DefaultImageRegistry = GCRContainerRegistry // TODO: this is also defined elsewhere and not used; consolidate | |
// JMXTagSuffix prefix tag for agent JMX images | |
JMXTagSuffix = "-jmx" | |
agentImageName = "agent" | |
clusterAgentImageName = "cluster-agent" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still discussing with Dinesh what image to use, I'll keep this in mind when I update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}, | ||
} | ||
} | ||
o.ports = dda.Spec.Features.OtelCollector.Ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not assign the configured ports' structure directly to o.ports
? As a general practice, we don't modify DDA withing the features, instead save relevant config in feature struct otelCollectorFeature
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, updated in: ff7868e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed latest commits and looks good to me. Please verify the test cluster state isn't caused by this change before merging.
How would one enable OTEL prior to this PR if using the operator? I see that there's a command line flag, but it's unclear from the docs how to use those in a datadog manifest. Relatedly, is there any target release planned that will include this feature? |
Hi @wspurgin, command line flag was added to enable internal testing using overrides section in the DatadogAgent CRD. This PR delivers feature support and will be released with v1.12, ETA early February. |
Thanks for the response @levan-m - so am I right in my understanding that there is not any way a user of the DD operator helm chart could enable the "beta" OTeL flag themselves? Is there any concern with allowing us to opt in for the time being (for users of the operator from versions v1.8 -> v1.11.1)? Otherwise I'll just have to drop the operator entirely and configure the agents in Helm directly (the operator is the recommended way, so I'd prefer obviously not to undo my last week of work to do the non-recommended approach 😓) |
@wspurgin, there are ways to enabled it (render helm chart and edit manifest or use annotation added here #1475) however main challenge will be correctly configuring the OTEL agent. I'd recommend waiting till next week for 1.12 release candidate and use the feature added with this PR. For the RC setup you will need to install CRDs manually as helm chart isn't getting updated until final release. |
Thanks @levan-m - I'll just wait for that RC. I tried the annotation (but the agent would crash with an error saying |
This PR removes support for enabling otel-agent via annotations in favor of enabling otel agent vie otelCollector feature: #1559. The annotation use has already been removed in staging: DataDog/k8s-datadog-agent-ops#4602.
This PR removes support for enabling otel-agent via annotations in favor of enabling otel agent vie otelCollector feature: #1559. The annotation use has already been removed in staging: DataDog/k8s-datadog-agent-ops#4602.
What does this PR do?
Motivation
OTEL-2290
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
This PR adds support for deploying the otel-agent in a new manner. In order to confirm these changes work as expected, we need to validate that the otel agent runs without error, but also that it is able to receive data from OTel clients and forward it to Datadog.
The following docker image:
mackjmr/app-all:v2
contains an app that sends OTLP traces every 5 seconds, which we can use to send data to the otel agent.This app can be deployed using the following
deployment.yaml
:Noteworthy:
OTEL_EXPORTER_OTLP_ENDPOINT
is sent to send traces to the NODE_IP:4317 (otel agent listens on 4317 and binds container port to host port)OTEL_RESOURCE_ATTRIBUTES
.Known issues when testing:
Agent will fail to startup in some kubernetes environments (e.g. Kind cluster) if it fails to detect the hostname. You can prevent this from happening by setting the hostname yourself:
In Kind/ Minikube, you also need to disable TLS Verify:
Scenarios to test:
1. Otel Agent Feature with collector config in configData:
Test 1: Default ports
Deploy the agent using the following manifest: https://github.com/DataDog/datadog-operator/blob/1a69194e7c958641288a3f0489372acb2cecfa34/examples/datadogagent/datadog-agent-with-otel-agent.yaml.
Test 2: Non default ports:
Same steps as above, but in manifest add ports sections:
and change collector config port:
and in app-all deployment change
OTEL_EXPORTER_OTLP_ENDPOINT
tohttp://$(MY_NODE_IP):3333
.2. Otel Agent Feature with collector config in configMap:
Deploy the agent using the following manifest: https://github.com/DataDog/datadog-operator/blob/1a69194e7c958641288a3f0489372acb2cecfa34/examples/datadogagent/datadog-agent-with-otel-agent-configmap.yaml.
3. Otel Agent Feature without passing collector config:
When we don't pass a collector config, a default config is used.
Test 1: Default ports
Deploy the agent using the following:
Test 2: Non default ports:
Deploy the agent using the following:
4. Otel Agent via Annotation:
We need to validate that deploying otel agent with annotations still works.
Deploy the agent using the following manifest: https://github.com/DataDog/datadog-operator/blob/1a69194e7c958641288a3f0489372acb2cecfa34/examples/datadogagent/datadog-agent-with-otel-agent-annotations.yaml.
5. Flare:
Test 1: Core config disabled
Disable the core config:
Trigger a flare (you can check the flare locally, or if you want to submit it, you can use ticket #1970459), and ensure that in
otel/otel-agent.log
you see log:Test 2: Core config enabled
The
otelcollector.enabled
should be enabled by default when the otel collector feature is enabled, so you can leave the coreConfig empty (this config: https://github.com/DataDog/datadog-operator/blob/1a69194e7c958641288a3f0489372acb2cecfa34/examples/datadogagent/datadog-agent-with-otel-agent.yaml will do). Trigger a flare (you can check the flare locally, or if you want to submit it, you can use ticket #1970459), and ensure that the otel directory contains the following data:Test 3: Core config other params
Add the following params:
describe the pod (
kubectl describe pod -n <namespace> <pod_name>
), and ensure the following env vars are set:Checklist
bug
,enhancement
,refactoring
,documentation
,tooling
, and/ordependencies
qa/skip-qa
label