Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ansible): Create service monitor #2179

Merged
merged 7 commits into from
Feb 7, 2020

Conversation

djzager
Copy link
Contributor

@djzager djzager commented Nov 12, 2019

Ansible based operator's should also create the service monitor as
appropriate.

@openshift-ci-robot
Copy link

Hi @djzager. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 12, 2019
@djzager
Copy link
Contributor Author

djzager commented Nov 12, 2019

@fabianvf @jmrodri I think one thing that would be helpful, in the future to prevent needing these kinds of PRs, would be to pull out the startup tasks that are common to all operators into an operator-sdk lib that is used in the scaffolding. I'm sure there are risks in that kind of effort but worth considering.

@jmrodri
Copy link
Member

jmrodri commented Nov 12, 2019

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 12, 2019
@djzager
Copy link
Contributor Author

djzager commented Nov 12, 2019

/retest

1 similar comment
@djzager
Copy link
Contributor Author

djzager commented Nov 13, 2019

/retest

@camilamacedo86 camilamacedo86 added language/ansible Issue is related to an Ansible operator project kind/feature Categorizes issue or PR as related to a new feature. labels Nov 16, 2019
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @djzager,

Really thank you for your contribution 🥇
What is missing here for we go forward is a Changelog entry.
Also, would be great to have a test to ensure that the Service Monitor was created on it. Could you please add these small nits?

pkg/ansible/run.go Outdated Show resolved Hide resolved
@camilamacedo86 camilamacedo86 added kind/bug Categorizes issue or PR as related to a bug. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Dec 4, 2019
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2020
@camilamacedo86 camilamacedo86 self-requested a review January 10, 2020 20:29
@camilamacedo86 camilamacedo86 removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2020
@djzager
Copy link
Contributor Author

djzager commented Jan 17, 2020

It appears that my suspicion was correct (that the ServiceMonitor resource doesn't exist in the CI environment):

Ensure that no errors appear in the log
{"level":"info","ts":1578688227.027858,"logger":"cmd","msg":"Could not create ServiceMonitor object","Namespace":"default","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1578688227.0278876,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","Namespace":"default","error":"no ServiceMonitor registered with the API"}

It may be possible to create the ServiceMonitor CRD simply to allow for it to be created and verified. wdyt?

services := []*v1.Service{service}
_, err = metrics.CreateServiceMonitors(cfg, namespace, services)
if err != nil {
log.Info("Could not create ServiceMonitor object", "error", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CI is unhappy because the error is showing up in the logs (https://travis-ci.org/operator-framework/operator-sdk/jobs/635443111#L1665-L1667), we might need to either change that log or change the way we search for errors in the test.

Maybe we could call the error field something like reason instead?

Also minor not, I don't think we need to log twice here when the service monitor is not present, maybe make these two logs into an if .. else like

if err == metrics.ServiceMonitorNotPresent {
    // log about installing the operator
} else {
    // generic error log
}

just to cut down on the noise in the logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the error to reason. However, I elected not to prevent logging 2x for two reasons:

  1. Match the scaffolding provided for go based operators
  2. This only happens once on startup which makes me believe this isn't an unreasonable contribution to noise.

If you would still like me to restructure the logging here @fabianvf , I can do that.

Copy link
Contributor

@camilamacedo86 camilamacedo86 Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @djzager
I think we should keep the same impl done in go and then, since it was updated to it lastest version, I do not think that the errors will appear too. WDYT @fabianvf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djzager we need test locally as well operator-sdk run --local. I will do fully test with and let you know.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 6, 2020
hack/lib/common.sh Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@djzager djzager force-pushed the ao-monitor branch 2 times, most recently from ec08132 to 9423960 Compare February 6, 2020 19:20
// necessary to configure Prometheus to scrape metrics from this operator.
services := []*v1.Service{service}
_, err = metrics.CreateServiceMonitors(cfg, namespace, services)
if err == metrics.ErrServiceMonitorNotPresent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See that the golang scaffolded was changed indeed to not show log errors regards it when should not. See #2190. So, could you please update the implementation in order to reflect the current one made in go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and made this ServiceMonitor creation block equivalent to what is found in internal/scaffold/cmd.go

Copy link
Contributor

@camilamacedo86 camilamacedo86 Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean: Should we not?

  1. Add in the main func.

// Add the Metrics Service
addMetrics(ctx, cfg, namespace)

  1. At the end, add all addMetrics implementation

  2. Add the customization requested by @fabianvf in the addMetrics

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you are saying. I missed the addMetrics piece from the scaffold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about which customization request you are referring to from @fabianvf :

  1. If reason vs error, this isn't needed anymore because we are testing with the servicemonitor CRD installed in the cluster
  2. If referring to removing one of the log.Info's .. I just don't agree that it's a valuable deviation from the scaffold. It would be one thing to me if this were in the Reconcile loop and it was an additional line of log output on every reconcile. However, in the majority of cases it will be two lines of output at the very beginning of operator startup saying "couldn't create serviceMonitor" and "install prometheus-operator to create ServiceMonitor objects".

@djzager
Copy link
Contributor Author

djzager commented Feb 6, 2020

CI will fail until I make the necessary updates to the molecule based testing. I will wait to do that until #2425 is merged.

Ansible based operator's should also create the service monitor as
appropriate.
- Add a function to common that applies the servicemonitor CRD
- Update the e2e ansible and e2e ansible molecule tests to verify the
  service monitor is created
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 6, 2020
@djzager
Copy link
Contributor Author

djzager commented Feb 7, 2020

@fabianvf @camilamacedo86 I believe I have addressed all of the comments and CI is passing. Please, take another look.

Copy link
Member

@fabianvf fabianvf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2020
CHANGELOG.md Outdated
@@ -5,6 +5,7 @@
- Add a new option to set the minimum log level that triggers stack trace generation in logs (`--zap-stacktrace-level`) ([#2319](https://github.com/operator-framework/operator-sdk/pull/2319))
- Added `pkg/status` with several new types and interfaces that can be used in `Status` structs to simplify handling of [status conditions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties). ([#1143](https://github.com/operator-framework/operator-sdk/pull/1143))
- Added support for relative Ansible roles and playbooks paths in the Ansible operator's file. ([#2273](https://github.com/operator-framework/operator-sdk/pull/2273))
- Ansible based operators now creates prometheus service monitor, if available. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Ansible based operators now creates prometheus service monitor, if available. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179))
- Add Prometheus metrics support to Ansible - based operators. ([#2179](https://github.com/operator-framework/operator-sdk/pull/2179))

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally as well. 👍 Great contribution.

@djzager could just address the nit in the CHANGELOG for we merge this one?

Running locally and in the cluster whiout promethues with success:

Screenshot 2020-02-07 at 17 47 21

Screenshot 2020-02-07 at 17 49 51

@camilamacedo86
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2020
@fabianvf
Copy link
Member

fabianvf commented Feb 7, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2020
@fabianvf fabianvf merged commit 96ce467 into operator-framework:master Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants