Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance CPU manager with L3 cache aware #2621

Closed
4 tasks
hustcat opened this issue Apr 13, 2021 · 27 comments
Closed
4 tasks

Enhance CPU manager with L3 cache aware #2621

hustcat opened this issue Apr 13, 2021 · 27 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@hustcat
Copy link

hustcat commented Apr 13, 2021

Enhancement Description

  • One-line enhancement description (can be used as a release note):
    Some CPUs, such as AMD Rome, each CPU package(socket) have multiple L3 caches. When allocating CPUs, L3 cache should be considered.

  • Kubernetes Enhancement Proposal:

  • Discussion Link:

  • Primary contact (assignee): @hustcat @ranchothu

  • Responsible SIGs: node

  • Enhancement target (which target equals to which milestone):

    • Alpha release target (x.y): v1.22
    • Beta release target (x.y):
    • Stable release target (x.y):
  • Alpha

    • KEP (k/enhancements) update PR(s):
    • Code (k/k) update PR(s):
    • Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 13, 2021
@hustcat
Copy link
Author

hustcat commented Apr 13, 2021

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 13, 2021
@hustcat
Copy link
Author

hustcat commented Apr 13, 2021

/assign @ranchothu @hustcat

@k8s-ci-robot
Copy link
Contributor

@hustcat: GitHub didn't allow me to assign the following users: ranchothu.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @ranchothu @hustcat

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor

ffromani commented May 7, 2021

Hi! there is already a conversation (maybe a github issue?) in place on cadvisor project to report the informations about llc so kubelet can consume them? Could you please link it here maybe?

@ffromani
Copy link
Contributor

ffromani commented May 7, 2021

Hi! there is already a conversation (maybe a github issue?) in place on cadvisor project to report the informations about llc so kubelet can consume them? Could you please link it here maybe?

nevermind, I just realized you documented in the KEP text itself.

@hustcat
Copy link
Author

hustcat commented May 8, 2021

https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit?usp=sharing
This is the design document.

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 6, 2021
@ffromani
Copy link
Contributor

ffromani commented Aug 6, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 6, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 4, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor

ffromani commented Jan 3, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@fromanirh: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jan 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor

/reopen

@k8s-ci-robot
Copy link
Contributor

@ffromani: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 22, 2024
@ffromani
Copy link
Contributor

@hustcat hi! are still interested in pushing forward this KEP?

@sphrasavath
Copy link

sphrasavath commented Jul 17, 2024

@ffromani Thanks for reopening this KEP. I am interested in pushing this forward. However, I would like to make changes to the design. Motivation for the KEP is the same. After reading through your comments for [pr-2684](url) I agree with the approach of implementing this feature as a static policy option.

I’m still working out the details to the design but in general here is what we're thinking...

  • Add a policy option in pkg/kubelet/cm/cpumanager/policy_options.go
    GroupByL3CacheOption string = "group-by-l3-preferred"
  • Modify allocateCPUs in static_policy.go to check for option flag
  •   Implement logic called TakeByL3Preffered in cpu_assignment.go
    

Revised design doc

@sphrasavath
Copy link

sphrasavath commented Aug 18, 2024

@ffromani created pr-126750 for this KEP

Thanks for the feedback! Per your comments in the design doc regarding the goal to support multiple socket due to decrease performance for cross-die placement, our intention is to add that as part of beta. However, the pull request above does not include 2P yet. Will create a new pr.

In response to your comments about how the new policy option works with existing options. "align-cpus-by-uncorecache" attempts to take full cores whether "full-pcpus-only" is enabled or not. Because it this new option follows the "pack" cpu sorting strategy, this option will not be allowed if "distribute-cpus-across-numa" or "distribute-cpus-across-cores" policy options are enabled.

Saw your comments and @kannon92 regarding e2e tests. Will update.

@ffromani
Copy link
Contributor

ffromani commented Aug 18, 2024

@ffromani created pr-126750 for this KEP

thanks. We need now to resume the conversation about this KEP, incorporating the elements from the design doc you shared previously, and of course all the feedback from the community.

Lacking better options (cc @SergeyKanzhelev @mrunalp please suggest any) the best option is probably to create a new PR superceding the old one. We can't transfer ownership in github issues, so it's likely we would need a new issue to be able to interact efficiently with the release team.

@kannon92
Copy link
Contributor

/assign @sphrasavath

@kannon92
Copy link
Contributor

@ffromani created pr-126750 for this KEP

thanks. We need now to resume the conversation about this KEP, incorporating the elements from the design doc you shared previously, and of course all the feedback from the community.

Lacking better options (cc @SergeyKanzhelev @mrunalp please suggest any) the best option is probably to create a new PR superceding the old one. We can't transfer ownership in github issues, so it's likely we would need a new issue to be able to interact efficiently with the release team.

Yes, let's get new PR up for the KEP and close the old one.

And I think the best thing would be to close this issue and open a new one with @sphrasavath so that it can be updated with the correct details.

@sphrasavath
Copy link

@kannon92 Per your request, new issue created: #4800

@kannon92
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@kannon92: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants