Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure controller names are unique when emitting metrics #5799

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

chaosi-zju
Copy link
Member

@chaosi-zju chaosi-zju commented Nov 11, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

set fixed names for each controller to prevent metrics conflicts

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
The controller-runtime will validate controller names starting v0.19.x.
See kubernetes-sigs/controller-runtime#2902 for reference.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Unique controller names and remove ambitions when reporting metrics.

@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Nov 11, 2024
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2024
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.
I'm currently trying to bump Kubernetes dependencies (#5796). I will test this commit to ensure all controllers have a unique name.

pkg/controllers/cluster/taint_manager.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 2.27273% with 43 lines in your changes missing coverage. Please review.

Project coverage is 42.99%. Comparing base (b5c6660) to head (b605d9d).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pkg/controllers/mcs/endpointslice_controller.go 0.00% 4 Missing ⚠️
pkg/controllers/mcs/service_import_controller.go 0.00% 4 Missing ⚠️
pkg/controllers/binding/binding_controller.go 0.00% 3 Missing ⚠️
...ers/binding/cluster_resource_binding_controller.go 0.00% 3 Missing ⚠️
...lusterservice/endpointslice_dispatch_controller.go 0.00% 3 Missing ⚠️
...ependenciesdistributor/dependencies_distributor.go 0.00% 3 Missing ⚠️
pkg/controllers/status/crb_status_controller.go 0.00% 2 Missing ⚠️
pkg/controllers/status/rb_status_controller.go 0.00% 2 Missing ⚠️
operator/pkg/controller/karmada/controller.go 0.00% 1 Missing ⚠️
...ionfailover/crb_application_failover_controller.go 0.00% 1 Missing ⚠️
... and 17 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5799      +/-   ##
==========================================
+ Coverage   42.88%   42.99%   +0.11%     
==========================================
  Files         656      656              
  Lines       55888    55921      +33     
==========================================
+ Hits        23968    24044      +76     
+ Misses      30369    30333      -36     
+ Partials     1551     1544       -7     
Flag Coverage Δ
unittests 42.99% <2.27%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RainbowMango
Copy link
Member

/retitle Ensure controller names are unique when emitting metrics

@karmada-bot karmada-bot changed the title set fixed names for each controller to prevent metrics conflicts Ensure controller names are unique when emitting metrics Nov 11, 2024
@RainbowMango
Copy link
Member

/assign

@RainbowMango RainbowMango added this to the v1.12 milestone Nov 11, 2024
@chaosi-zju
Copy link
Member Author

Background and Test result

ResourceBindingController and RBStatusController are defined as follows without controller name:

// ResourceBindingController
controllerruntime.NewControllerManagedBy(mgr).
        For(&workv1alpha2.ResourceBinding{}).
	...
	Complete(c)

// RBStatusController
controllerruntime.NewControllerManagedBy(mgr).
	For(&workv1alpha2.ResourceBinding{}, bindingPredicateFn).
	...
	Complete(c)

they all reconcile ResourceBinding type object, so their metrics all use resourcebinding as controller field, like:

controller_runtime_active_workers{controller="resourcebinding"} 0
controller_runtime_max_concurrent_reconciles{controller="resourcebinding"} 5
controller_runtime_reconcile_errors_total{controller="resourcebinding"} 72
controller_runtime_reconcile_total{controller="resourcebinding",result="error"} 72
controller_runtime_reconcile_total{controller="resourcebinding",result="requeue"} 0
controller_runtime_reconcile_total{controller="resourcebinding",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="resourcebinding",result="success"} 278

so this PR set a unique name to each controller, then their metics can be distinguished as the following:

controller_runtime_active_workers{controller="binding-controller"} 0
controller_runtime_max_concurrent_reconciles{controller="binding-controller"} 5
controller_runtime_reconcile_errors_total{controller="binding-controller"} 0
controller_runtime_reconcile_total{controller="binding-controller",result="error"} 0
controller_runtime_reconcile_total{controller="binding-controller",result="requeue"} 0
controller_runtime_reconcile_total{controller="binding-controller",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="binding-controller",result="success"} 0


controller_runtime_active_workers{controller="resource-binding-status-controller"} 0
controller_runtime_max_concurrent_reconciles{controller="resource-binding-status-controller"} 5
controller_runtime_reconcile_errors_total{controller="resource-binding-status-controller"} 0
controller_runtime_reconcile_total{controller="resource-binding-status-controller",result="error"} 0
controller_runtime_reconcile_total{controller="resource-binding-status-controller",result="requeue"} 0
controller_runtime_reconcile_total{controller="resource-binding-status-controller",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="resource-binding-status-controller",result="success"} 0

@RainbowMango
Copy link
Member

/retest
https://github.com/karmada-io/karmada/actions/runs/11771917143/job/32786708528?pr=5799
Seems the failed test is unrelated:

[FAILED] Unexpected error:
      <*errors.StatusError | 0xc000c7a5a0>: 
      etcdserver: request timed out
      {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "etcdserver: request timed out",
              Reason: "",
              Details: nil,
              Code: 500,
          },
      }
  occurred
  In [It] at: /home/runner/work/karmada/karmada/test/e2e/failover_test.go:158 @ 11/11/24 04:50:28.118

@RainbowMango
Copy link
Member

/lgtm

Hold for a while as I'm testing it on my private branch which bumps controller-runtime.
The E2E test passes means the patch is completed.
https://github.com/RainbowMango/karmada/actions/runs/11773915359

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2024
@RainbowMango
Copy link
Member

/approve

I verified on my side with [email protected] which validates controller names to be unique.

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2024
@karmada-bot karmada-bot merged commit ed3f48c into karmada-io:master Nov 11, 2024
18 checks passed
@chaosi-zju
Copy link
Member Author

I verified on my side with [email protected] which validates controller names to be unique.

Is there any relevant log that can show us what it would look like if the validation fails ?

@RainbowMango
Copy link
Member

RainbowMango commented Nov 11, 2024

The karmada-controller-manager or karmada-agent would failed to start and raising logs like:

controller with name namespace-sync-controller already exists. Controller names must be unique to avoid multiple controllers reporting to the same metric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants