-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(ansible): Create service monitor #2179
Conversation
Hi @djzager. Thanks for your PR. I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@fabianvf @jmrodri I think one thing that would be helpful, in the future to prevent needing these kinds of PRs, would be to pull out the startup tasks that are common to all operators into an operator-sdk lib that is used in the scaffolding. I'm sure there are risks in that kind of effort but worth considering. |
/ok-to-test |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @djzager,
Really thank you for your contribution 🥇
What is missing here for we go forward is a Changelog entry.
Also, would be great to have a test to ensure that the Service Monitor was created on it. Could you please add these small nits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
It appears that my suspicion was correct (that the ServiceMonitor resource doesn't exist in the CI environment):
It may be possible to create the ServiceMonitor CRD simply to allow for it to be created and verified. wdyt? |
pkg/ansible/run.go
Outdated
services := []*v1.Service{service} | ||
_, err = metrics.CreateServiceMonitors(cfg, namespace, services) | ||
if err != nil { | ||
log.Info("Could not create ServiceMonitor object", "error", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like CI is unhappy because the error is showing up in the logs (https://travis-ci.org/operator-framework/operator-sdk/jobs/635443111#L1665-L1667), we might need to either change that log or change the way we search for errors in the test.
Maybe we could call the error
field something like reason
instead?
Also minor not, I don't think we need to log twice here when the service monitor is not present, maybe make these two logs into an if .. else
like
if err == metrics.ServiceMonitorNotPresent {
// log about installing the operator
} else {
// generic error log
}
just to cut down on the noise in the logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the error
to reason
. However, I elected not to prevent logging 2x for two reasons:
- Match the scaffolding provided for go based operators
- This only happens once on startup which makes me believe this isn't an unreasonable contribution to noise.
If you would still like me to restructure the logging here @fabianvf , I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djzager we need test locally as well operator-sdk run --local. I will do fully test with and let you know.
ec08132
to
9423960
Compare
pkg/ansible/run.go
Outdated
// necessary to configure Prometheus to scrape metrics from this operator. | ||
services := []*v1.Service{service} | ||
_, err = metrics.CreateServiceMonitors(cfg, namespace, services) | ||
if err == metrics.ErrServiceMonitorNotPresent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and made this ServiceMonitor creation block equivalent to what is found in internal/scaffold/cmd.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean: Should we not?
- Add in the main func.
// Add the Metrics Service
addMetrics(ctx, cfg, namespace)
-
At the end, add all addMetrics implementation
-
Add the customization requested by @fabianvf in the addMetrics
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see what you are saying. I missed the addMetrics
piece from the scaffold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about which customization request you are referring to from @fabianvf :
- If
reason
vserror
, this isn't needed anymore because we are testing with the servicemonitor CRD installed in the cluster - If referring to removing one of the
log.Info
's .. I just don't agree that it's a valuable deviation from the scaffold. It would be one thing to me if this were in the Reconcile loop and it was an additional line of log output on every reconcile. However, in the majority of cases it will be two lines of output at the very beginning of operator startup saying "couldn't create serviceMonitor" and "install prometheus-operator to create ServiceMonitor objects".
CI will fail until I make the necessary updates to the molecule based testing. I will wait to do that until #2425 is merged. |
Ansible based operator's should also create the service monitor as appropriate.
- Add a function to common that applies the servicemonitor CRD - Update the e2e ansible and e2e ansible molecule tests to verify the service monitor is created
@fabianvf @camilamacedo86 I believe I have addressed all of the comments and CI is passing. Please, take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
CHANGELOG.md
Outdated
@@ -5,6 +5,7 @@ | |||
- Add a new option to set the minimum log level that triggers stack trace generation in logs (`--zap-stacktrace-level`) ([#2319](https://github.com/operator-framework/operator-sdk/pull/2319)) | |||
- Added `pkg/status` with several new types and interfaces that can be used in `Status` structs to simplify handling of [status conditions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties). ([#1143](https://github.com/operator-framework/operator-sdk/pull/1143)) | |||
- Added support for relative Ansible roles and playbooks paths in the Ansible operator's file. ([#2273](https://github.com/operator-framework/operator-sdk/pull/2273)) | |||
- Ansible based operators now creates prometheus service monitor, if available. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ansible based operators now creates prometheus service monitor, if available. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179)) | |
- Add Prometheus metrics support to Ansible - based operators. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally as well. 👍 Great contribution.
@djzager could just address the nit in the CHANGELOG for we merge this one?
Running locally and in the cluster whiout promethues with success:
/lgtm |
/lgtm |
Ansible based operator's should also create the service monitor as
appropriate.