-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Integration #222
Comments
Agreed. We have been thinking about similar things but have not gotten to those yet. (https://github.com/operator-framework/operator-sdk/blob/master/doc/design/long-term/simple-extensions.md) We would welcome any proposal if you want to give it a shot. |
@spahl I'm working on adding basic Prometheus metrics in the operator, should we generate service and service monitor to expose metrics port ? |
I think that makes sense @etiennecoutaud. The main consideration is whether there will be a cluster-wide prometheus watching for all service monitors or one per namespace. We're leaning towards the per-namespace method for most use-cases, as this gives more control over metrics retention and access control to the data. |
Sorry I missed that. It does make a lot of sense. |
@spahl @robszumski Who is working on prometheus integration ? I will add service and servicemonitor when kubernetes manifest are generating |
No one at this point. It is in the top list of things we want to add though. Any help is welcome. |
We need to consider how the SDK should help an operator its metrics to Prometheus via a Service and ServiceMonitor. One way is to generate the manifests at build time and expect the user to configure and create them. This is the approach in #241 There are a couple of issues with this approach:
A preferred approach would be to have the operator expose the metrics on a desired port and create only the Service at runtime by default. The user can pass in some configuration to the operator to override the default values for the port and Service. The following fields can be made configurable via flags passed to the operator:
With the above flags the operator can be configured to setup the prometheus integration in 3 ways:
Creating the ServiceMonitor: So for a start I propose we only have the operator expose the port and create a Service at runtime. @etiennecoutaud Let me know what you think. I think it's important to decide whether we want to generate manifests vs let the operator create the resources, before we move forward with #241 |
Regarding creating the ServiceMonitor, you the operator could probe available CRDs and if it exists, it can attempt to create it. This auto-creation could be default on, with the ability to turn it off using a flag (since RBAC will probably be a factor too). |
Might I pretty strongly suggest we not add flags until a real user asks? And only then only if they have, what we agree to be, an extremely compelling use case. Even if we think someone 'will want this' just do the right thing, always, for them. Don't let them customize. Pick a random container port that has an extremely low chance of conflict (something like 41783) and just always turn it on. Don't let it be turned off. Don't let it be configured. Just make all operators the same. I'm good with the operator creating it's own service definition at run time instead of build time, but if the service it just static, it could come later... I'm big on no, config options. Just do the right thing for people. Our users should follow the best practice because it is the only path we give them. |
You are right that limiting flags is a wiser approach. What we could do easily is to not have: --disable-metrics=true and same for --metrics-port=8080 by choosing a port like you described. I don't know how to get rid of the service-selector. Maybe we could come up with a naming convention for all operators @hasbro17? |
I agree with @spahl and @eparis for limiting flags. Today we discuss about prometheus integration but tomorrow it will be something else and we will have ton of options with ton of possibilities. We are generating code, we consider as normal that users add code, but is it normal that users remove some generating parts because they don't need it ? My opinion join @eparis, we generating operator in the state of the art way. We provide everything needed to run the operator in the best way. Free to users to remove parts and change the way we purpose. We cannot handle all the possibilities, we have to choose the most smart path and propose it |
For serviceSelector how about generating a label on the pod which we know to be unique and use that? |
@eparis @etiennecoutaud Thanks for the responses. I agree that the number of flags could easily pile up if we add more and more configuration options so perhaps it's best to just pick one approach and stick with that.
Right now we generate the label
With the service being static should we still have the operator create it at runtime? The benefit of that is the service is always created by default now. If not then we simply go back to generating the manifests as in #241 and let the user create it as needed. |
@etiennecoutaud @eparis I'm hoping we can move forward based on the discussion so far. If we're all agreed on not having configuration flags then for #241 can we at start with the following:
So for e.g the generated deployment manifest would look like: apiVersion: apps/v1
kind: Deployment
metadata:
name: memcached-operator
labels:
name: memcached-operator
spec:
replicas: 1
selector:
matchLabels:
name: memcached-operator
template:
metadata:
labels:
name: memcached-operator
spec:
containers:
- name: memcached-operator
image: quay.io/example/memcached-operator:v0.0.1
ports:
- containerPort: 9090
name: metrics
command:
- memcached-operator
imagePullPolicy: Always
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: OPERATOR_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name And the Service created by the operator would in turn be: apiVersion: v1
kind: Service
metadata:
name: memcached-operator
labels:
name: memcached-operator
spec:
selector:
name: memcached-operator
ports:
- protocol: TCP
targetPort: metrics
port: 9090
name: metrics Let's leave out the ServiceMonitor from #241 for now. |
Sounds good to me, I will update #241 according to this change. |
@lilic just making you aware of this issue (if you aren't already). Can this be closed once your metrics work is merged? |
We should consider adding in some basic Prometheus support in the generated operator. Just including the client library and exposing the metrics path I think would be a good start, perhaps define/increase a counter in the handler function to set the pattern up.
The text was updated successfully, but these errors were encountered: