Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible: how to expose CR status condition in metrics? #2576

Closed
tomsucho opened this issue Feb 18, 2020 · 11 comments
Closed

Ansible: how to expose CR status condition in metrics? #2576

tomsucho opened this issue Feb 18, 2020 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs discussion priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@tomsucho
Copy link

Type of question

how to implement a specific feature

Question

I can see there are some built in metrics endpoints but they provide some overall counts only (like: ansible_operator_reconciles_count or controller_runtime_reconcile_errors_total).
But Is there any way to expose metrics which would allow to quantify how many CRs in Status Condition True|False|Unknown? I think this is sort of a standard as per:
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
If we have it as part of the metric label together with CR name and namespace, then one could easily monitor the situation when CR changes status to False|Unknown for example.

@bharathi-tenneti bharathi-tenneti added language/ansible Issue is related to an Ansible operator project triage/support Indicates an issue that is a support question. labels Feb 18, 2020
@tomsucho
Copy link
Author

tomsucho commented Mar 4, 2020

Hey Guys, any thoughts on this one?

@asmacdo asmacdo added kind/feature Categorizes issue or PR as related to a new feature. metrics and removed triage/support Indicates an issue that is a support question. labels Mar 4, 2020
@djzager
Copy link
Contributor

djzager commented Mar 5, 2020

Disclaimer: not an authoritative source when it comes to controller metrics. I have only tied into the work that @lilic had done to expose custom resource metrics (#1277) for Ansible based operators specifically #1723 and have looked at the metrics provided by controller-runtime.

Short answer: No. I do not believe that this is currently possible for operator-sdk based operators in general; this issue isn't specific to Ansible.

It looks like kube-state-metrics does something similar for deployments kubernetes/kube-state-metrics#889 but handling this generically for custom resources would probably take some investigation.

@tomsucho
Copy link
Author

tomsucho commented Mar 5, 2020

Thanks for checking this. I think it would be a great feature to have :)

@estroz estroz added this to the Backlog milestone Mar 23, 2020
@estroz estroz added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 23, 2020
@joelanford
Copy link
Member

Another possible route for having a feature like this would be the hybrid operator approach that we're considering.

Basically, the idea is that we would work toward making the Ansible (and Helm) operator Go code more reusable and extensible. Then, for example, folks would be able to instantiate an Ansible controller in a Go operator and add custom code to support this kind of thing.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 22, 2020
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@camilamacedo86
Copy link
Contributor

/reopen

@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs discussion priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

9 participants