Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2621: Add llc affinity to cpu manager. #2684

Closed
wants to merge 3 commits into from

Conversation

enzoyes
Copy link

@enzoyes enzoyes commented May 6, 2021

design for issue 2621

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 6, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @ranchothu!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 6, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @ranchothu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 6, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ranchothu
To complete the pull request process, please assign derekwaynecarr after the PR has been reviewed.
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 6, 2021
@enzoyes enzoyes force-pushed the master branch 3 times, most recently from ba5f08d to ccf74f1 Compare May 6, 2021 13:43
@k8s-ci-robot
Copy link
Contributor

@ranchothu: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ehashman
Copy link
Member

ehashman commented May 6, 2021

Hi @ranchothu,

This currently isn't being tracked for SIG Node planning for 1.22: https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit#

/hold

It also appears that you will need to sign the Kubernetes CLA (see the bot comment above: #2684 (comment)).

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2021
@pacoxu
Copy link
Member

pacoxu commented May 7, 2021

image

I suspect you are using a different 'GitHub user' name when you committed. Check Github user setting env or config in your develop env.

@enzoyes enzoyes force-pushed the master branch 3 times, most recently from 0a40710 to 40c8fd4 Compare May 7, 2021 03:08
@enzoyes
Copy link
Author

enzoyes commented May 7, 2021

image

I suspect you are using a different 'GitHub user' name when you committed. Check Github user setting env or config in your develop env.

@pacoxu maybe a ok-to-test is in need, seems cla not recheck after force-pushes. And when i reply with I signed it also not works.

@pacoxu
Copy link
Member

pacoxu commented May 7, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 7, 2021
@enzoyes
Copy link
Author

enzoyes commented May 7, 2021

/retest

@enzoyes
Copy link
Author

enzoyes commented May 7, 2021

I signed it

@k8s-ci-robot k8s-ci-robot removed the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 7, 2021
Copy link
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took an initial pass at the KEP. The motivation is clear to me but I would recommend to add more specific use case in terms of the performance sensitive workloads that have strictly need resource allocation while taking L3 cache into consideration.

### Risks and Mitigations

+ Currently no risks was found.
+ Feature is enbled by a gate - a new kube feature with default false, potential risk effects could be limited.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: typo enbled -> enabled


- Feature Gate
- Add `CPUManagerUncoreCacheAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
- Also, more than one l3 cache should exist in a single socket/package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably implied but maybe we should explicitly capture the scenario where CPUManagerUncoreCacheAlign is enabled and in case only one l3 cache is present, we would obtain the current behaviour.


- General Design
- Logic Elaboration
Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: typo tring -> trying


![design_overview](design_overview.png "design_overview")

- feature-gates `CPUManagerUncoreCacheAlign`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some formatting issue in the way this line is rendered?

@enzoyes
Copy link
Author

enzoyes commented May 19, 2021

@swatisehgal , thanks, and modifies are updated.

- Add `CPUManagerUncoreCacheAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
- Also, more than one l3 cache should exist in a single socket/package.

- C1: Add `CPUManagerUncoreCacheAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this feature (per my previous comment, see history) will need to depend on a feature gate. But you don't necessarily need a separate feature gate, you can depend on the one we added on #2626.

This CPUManager feature gate will most likely be alpha in the 1.23 cycle, which fit the needs of this KEP.

I think you can conditionally enable this optimization depending on a cpu manager policy options. This way you keep the conditional logic you already need to support the feature gate, without extra burden.

Fitting in the new CPUManager policy options is also a very nice and clean design.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, @fromanirh since i've post a patch kubernetes/kubernetes#102307, it maybe help you to understand why i choose a kubelet running option other than a separate cpu manager policy. IMHO, attach to a policy is also optional, but it will cause some redundant logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the implementation. I don't see yet where we need redundant logic, however. We could:

  1. get the enable/disable flag from cpumanager options. Probably the option should be on by default
  2. propagate the flag down to the cpuAccumulator
  3. consume the flag into isUncoreCacheAlignEnabled

IMHO this is also a nicer and more integrated design.
Now: implementation wise, this flow seems compliant with the production-readiness review (see the details in https://kubernetes.slack.com/archives/CPNHUMN74/p1620312071045800) because new features should depend on A compatible and related feature gate; a new feature gate would be alpha level, and the cpuManagerOptions feature gate will still be alpha in 1.23.

So reshaping to use the cpuManagerOptions still seems a totally valid option and I think is cleaner from overall design perspective.
Let's see what other reviewers (@klueska :) ) think, and please let me know if there are concerns or requirements that I missed.

- Also, more than one l3 cache should exist in a single socket/package.

- C1: Add `CPUManagerUncoreCacheAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
- C2: More than one l3 cache should exist in a single socket/package(uncore-cache exists).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what "C1" and "C2" mean in this context.

@dchen1107
Copy link
Member

Thanks for making LLC cache cadvisor aware first. Here is some native questions form the top of my head since I didn't find the answer from the first pass of the KEP:

  1. Are you proposing to include current CPU management proposal as one of the policies? Or an enhancement to the proposed policy?
  2. Do you plan to extend this to cluster aware scheduling policy or just keep at Kubelet / Node level? If KEP, looks like you proposed a node level optimization.
  3. If Kubelet cannot commit such requests, should Kubelet reject the admission of some pods? Or this is a best effort?

Id int `json:"core_id"`
Threads []int `json:"thread_ids"`
Caches []Cache `json:"caches"`
+ UncoreCaches []Cache `json:"uncore_caches"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks like a unnecessary/confusing kludge. Would it be possible to present this similarly to how the linux kernel does, i.e. add the information about cpus in the Cache structure (like /sys/devices/system/cpu/cpu*/cache/index*/shared_cpu_list)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @marquiz, it is an interesting problem. Previously, i've made a consideration here. And then, i want desgign here to be decoupling. But seems no other structure is more fit to place the uncore-cache information.
I think:

  1. For a cache, it shouldn't get knowleage of whether it is uncore, that's not a cache's charater. Infos in /sys/devices/system/cpu/cpu*/cache/index*/ could not tell us if the cache is uncore without information about socket/core.
  2. But, core is awareness of the cache and uncore-cache it uses, and for a core, it include {id,threads, caches, uncore caches}.
    Dicussion is welcomed, thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ranchothu , in my understanding you'd like to differentiate between node level cache (typically on Intel chips) and uncore cache (typically on AMD chips), if so, why do we need to differentiate between them? they are both L3 cache and it seems that they can both be aligned when assign CPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I agree that we need additional layer of abstraction since number of grouped CPU per L3 cache is not a feature of only AMD's processor, for example Kunpen920 (armv8) had 4 cpu per L3 cache, and had 24 of such blocks.

@enzoyes
Copy link
Author

enzoyes commented Jun 13, 2021

Hi, @dchen1107 , thanks for your advice, and sorry for miss of the meeting.

  1. I am adding an enhancement to existing policy, and logic to basic cpu allocation(1st problem you proposed).
  2. As i've descrived in Feature gate, a new feature gate is added to enable/disable the feature. And also, if no uncore-cache(captured from cadvisor api) exists in the architecture, the feature is disabled even feature gate is open.(2nd)
  3. In General Design, i think the kubelet cpu allocation in each level is always best effort(try node->socket->core->cpu), and the path for me is (node->socket->uncore-cache->core->cpu) , each level is cluster of cpus sharing some common character. It will try cpus in same core if uncore-cache affinity failed.(3th)

May the explanation above answer your question. And look forword for future process.:smile:

### Graduation Criteria
#### Alpha

- Implement the new policy.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the implementation, strictly it is not a new policy but an enhancement against existing static policy right?

@ffromani
Copy link
Contributor

Hi! just a friendly reminder that the deadline for 1.23 KEP planning is 9th of September 2021: https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit# . If we want this enhancement to be in 1.23 we need some actions. sig-node is planning the 1.23 enhancements in today's meeting: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit - please consider proposing this change in this meeting or next-week meeting.

@k8s-ci-robot
Copy link
Contributor

@ranchothu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-enhancements-test 09ffea5 link /test pull-enhancements-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor

@enzoyes hi! are still interested in pushing forward this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.