-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed a PodTemplate or Container[] list in the OpentelemetryCollector CRD #901
Comments
For example, this
would be spelled
The Then it would inject the appropriate This has many advantages:
Existing CRD options could be retained for BC, to prevent the need for an incompatible API version bump. Or it could move to |
Here's an expanded example of what the
|
I'm pretty new to the stack but I could possibly attempt this, if I had some indication the idea would be welcomed. It doesn't look excessively complex. But I'd really want project owner feedback before I launched into it. |
I agree with the design if the objective is to be as flexible as possible. However, the goal of CRD is to abstract deployment and operational details. If the pod spec is exposed in the CRD it gives a lot of flexibility but is more error-prone. |
@pavolloffay Yeah, that's definitely a concern. And there are some places where the operator will have to override what's in the pod spec. But if you look at the list of linked issues, there are lots of different things people are asking for that boil down to "expose elements of the Pod and Container specs for configuration via the operator". That currently involves custom CRD additions and code for each such configuration element. An alternative would be to cherry-pick subsections like the The more I think about it the more I think this should really be something the operator-sdk helps operators out with. But it'd be worth doing here first. |
I'm going to withdraw this, as I've retired my use of the opentelemetry-operator in favour of directly managed manifests. Would you like the issue left open in case someone else wants to adopt it, or just close it? |
We can keep it open and see what other people think. What was your main pain point to move to the plain manifests? |
I'm currently evaluating this operator and one pain point I'm having is that there is no ReadinessProbe. I've seen how to provide that (it's a one line code change) but having the PodTemplate as mentioned in this issue would allow me to do that directly in configuration today with an official release. Right now I either contribute it, get it accepted and wait for next release to include it, or I have to fork this project to complete our testing. |
@miguelbernadi could you please share your readiness probe definition for the OTELcol? |
We have several tenants with diverse background and needs in our systems and we want to evaluate OTEL as a scheme to simplify our internal observability pipelines and custom data processors. So we need to reproduce and support the current setups with just replacing the internal tooling before we can extend into the other features available in OTEL. The reason we want to use ReadinessProbes is because we prefer to use the Deployment approach, and when we change configurations or scale out/in replicas we may lose data if the pods are not ready but the Rollout continues. We are choosing the Deployment approach for now as we need to aggregate some infrastructure metrics currently present in an existing Prometheus server and application metrics sent with statsd. The result of these should be sent to the tenent's DataDog and GrafanaCloud accounts, for some tenents to one of them to one of them and for others to both. I'm using the
It is enough for me to add an identical ReadinessProbe:
We may need to add other requirements to these manifests once we go into production, so a PodTemplate would allow us to do set these values as configuration in the CRD itself instead of requiring code changes. We will eventually need resources and securityContext as well, though our current testing effort does not get in there. |
Another related issue: #1684 |
^ Yes, I filed the above. I agree that overriding the entire podspec is maybe a bit too much given how useful it is for the operator to manage a lot of the defaults (which would probably mean that the podspec overrides would have to be intelligently merged, which is hard to do well/probably comes with various footguns.) It does seem that, in general, k8s operators end up exposing knobs on most of the pod spec fields eventually, because questions about overriding X or Y always come up at scale :) Re: the initial comment:
This approach, mutating a podspec given by the user, as opposed to the user mutating the default podspec provided by the operator, could be interesting. I can see it being a bit difficult to explain that, e.g., you have to make sure there's some container named |
I'm convinced this is actually a fundamental defect in the concept of how operators are designed and used in k8s. It's completely impractical to have each operator expose its own custom, CR-specific configuration for each of these. But right now there's no sensible k8s base type to embed in a CR to allow their configuration in a generic way. And it gets even worse if a CR defines multiple different resources, where you might have to apply some specific annotation/label/setting to only one of the operator-deployed sub-resources but not others... So if you use operators that define workloads you're pretty much forced to introduce mutating webhooks to munge what the operator deploys into what you actually need. Environments will need to control security settings like capabilities flags, apparmor or seccomp filters; CSP-specific annotations for managed identity/pod authorization; service mesh annotations; resource requests/limits; etc. But then you can face issues with the operator entering an infinite reconciler loop if it notices the changes made by the webhook and decides it needs to update the object, as I've seen with env-var injecting webhooks amongst others. It's a nightmare. This isn't specific to the otel operator. It's really something the k8s SIGs need to deal with. But I still think embedding a PodTemplate is the least-bad way to work around it until or unless the base k8s operator design and sample implementation is improved with a proper solution to the problem. I'm no longer offering to work on it for otel operator though, as I had to retire my use of it due to the aforementioned issues. My org has had to replace many common operators recently, or fork them with org-specific patches, due to issues with controlling resources or security related configs. |
The Prometheus operator supports this by allowing a |
Hello! I just merged in a PR for our v2 spec that will have common fields for any of our CRDs that deploy pods. We plan on ensuring that this spec is as close to the full pod template going forward. Please refer to our v2 milestone if there are fields you find missing. |
The
OpentelemetryCollector
CRD currently defines a few configuration options for env-var injection fromConfigMap
orSecret
resources and a couple of other options. But it does not expose most of the configuration of aPod
: You cannot controlPod
's annotations (Propagate annotations to Deployment's PodTemplate #900)imagePullSecrets
(DaemonSet deloyment with private container registry unstable (imagePullSecrets not configurable) #846)readinessProbe
These issues share a common underlying problem: The operator does not use a
PodTemplate
in theOpentelemetryCollector
resource. It instead constructs theDeployment
and embeddedPodTemplate
spec.template
entirely using individual configuration settings.That won't scale or be maintainable. What about security policies? What about ... anything you can name that can appear in a pod?
The future-resistant solution is to add an optional
PodTemplate
to theOpentelemetryCollector
CRD, and deprecate the current keysimage
,serviceAccount
,env
etc.The text was updated successfully, but these errors were encountered: