Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] workqueue metrics are mising #5954

Open
9 of 10 tasks
RainbowMango opened this issue Dec 14, 2024 · 12 comments
Open
9 of 10 tasks

[bugfix] workqueue metrics are mising #5954

RainbowMango opened this issue Dec 14, 2024 · 12 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@RainbowMango
Copy link
Member

RainbowMango commented Dec 14, 2024

What happened:
We received reports that the controller's work queue metrics can not be emitted. (See #5945 for more details, thanks to @CharlesQQ for reporting this).

This issue occurred before and was fixed in release 1.0 in #945, and then we made some subsequent impairments in #2899 and #3012. (Thanks to @Garrybest for doing this)

However after #2998 and #4706, this issue was introduced again in release-1.10.

This issue is used to track any relevant fix to ensure the problem is fully resolved without introducing any potential risk.

Root Cause:
TBD

Iteration tasks:

Other Potential risks

References:

  • From Go 1.21, the program initialization order was specified more precisely. See Go 1.21 release notes for more details.

Echo from the release notes:

Package initialization order is now specified more precisely. The new algorithm is:

  • Sort all packages by import path.
  • Repeat until the list of packages is empty:
    • Find the first package in the list for which all imports are already initialized.
    • Initialize that package and remove it from the list.
@RainbowMango RainbowMango added the kind/bug Categorizes issue or PR as related to a bug. label Dec 14, 2024
@RainbowMango RainbowMango added this to the v1.13 milestone Dec 14, 2024
@RainbowMango RainbowMango changed the title workqueue metrics are mising [bugfix] workqueue metrics are mising Dec 18, 2024
@RainbowMango RainbowMango moved this to Planned In Release 1.13 in Karmada Overall Backlog Dec 18, 2024
@RainbowMango RainbowMango added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 24, 2024
@chaosi-zju
Copy link
Member

the PR #6003 works for the task Introduce tests to cover the metrics

@RainbowMango
Copy link
Member Author

Got it.

@chaosi-zju
Copy link
Member

chaosi-zju commented Jan 9, 2025

I found another sub issue:

https://github.com/karmada-io/karmada/blob/master/pkg/estimator/server/metrics/metrics.go

you can refer to this file and find that karmada-schedule-estimator has some custom metrics, but they are also missed.


so does scheduler's custom metrics:

https://github.com/karmada-io/karmada/blob/master/pkg/estimator/server/metrics/metrics.go

@chaosi-zju
Copy link
Member

you can refer to this file and find that karmada-schedule-estimator has some custom metrics, but they are also missed.

The actual situation is that these custom metrics are not available right after installing the two components (not even zero values).

They can only be viewed once actually resources are scheduled.

Besides, ginkgo does not guarantee order of specs. in fact it prefers to enable randomized spec orders to prevent spec pollution, refer to onsi/ginkgo#1165 (comment)

@RainbowMango
Copy link
Member Author

The actual situation is that these custom metrics are not available right after installing the two components (not even zero values).
They can only be viewed once actually resources are scheduled.

Just a guess, this is due to the metrics needing some labels to be initiated. Like result and type in estimating_request_total.

Show an example of the metrics?

@chaosi-zju
Copy link
Member

Show an example of the metrics?

# HELP karmada_scheduler_schedule_attempts_total Number of attempts to schedule a ResourceBinding or ClusterResourceBinding
# TYPE karmada_scheduler_schedule_attempts_total counter
karmada_scheduler_schedule_attempts_total{result="scheduled",schedule_type="ReconcileSchedule"} 5
# HELP karmada_scheduler_scheduling_algorithm_duration_seconds Scheduling algorithm latency in seconds(exclude scale scheduler)
# TYPE karmada_scheduler_scheduling_algorithm_duration_seconds histogram
karmada_scheduler_scheduling_algorithm_duration_seconds_bucket{schedule_step="AssignReplicas",le="0.001"} 5
karmada_scheduler_scheduling_algorithm_duration_seconds_bucket{schedule_step="AssignReplicas",le="0.002"} 5
karmada_scheduler_scheduling_algorithm_duration_seconds_bucket{schedule_step="AssignReplicas",le="0.004"} 5
karmada_scheduler_scheduling_algorithm_duration_seconds_bucket{schedule_step="AssignReplicas",le="0.008"} 5
...

@RainbowMango
Copy link
Member Author

Thanks. That's the reason.

@stulzq
Copy link
Contributor

stulzq commented Jan 14, 2025

v1.9.0 can not find metric workqueue_adds_total

@XiShanYongYe-Chang
Copy link
Member

Hi @stulzq, thanks for your feedback~

@chaosi-zju
Copy link
Member

chaosi-zju commented Jan 14, 2025

v1.9.0 can not find metric workqueue_adds_total

Hi @stulzq , from previous investigation:
In Karmada v1.9.9, we still use go v1.20, the Karmada controller didn't miss workqueue metrics.
From Karmada v1.10.0, we use go v1.21, start introducing this problem, metrics missed.

Can you please tell me how you installed karmada?

@chaosi-zju
Copy link
Member

v1.9.0 can not find metric workqueue_adds_total

There is another situation, if you use karmada v1.9.0, but your go version is go v1.21 or higher, you will also have problems.

@husnialhamdani
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Planned In Release 1.13
Development

No branches or pull requests

5 participants