Skip to content

Commit

Permalink
KEP-2621: Add llc affinity to cpu manager.
Browse files Browse the repository at this point in the history
  • Loading branch information
enzoyes committed May 7, 2021
1 parent 2340226 commit 40c8fd4
Show file tree
Hide file tree
Showing 5 changed files with 165 additions and 0 deletions.
128 changes: 128 additions & 0 deletions keps/sig-node/2621-cpu-allocation-llc-affinity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# KEP-2621: Add LLC Affinity to CPU manager

<!-- toc -->
- [KEP-2621: Add LLC Affinity to CPU manager](#kep-2621-add-llc-affinity-to-cpu-manager)
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Future Work](#future-work)
- [Proposal](#proposal)
- [User Stories (Optional)](#user-stories-optional)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

Caches are not considered in current Kubernetes cpu-manager, in some architectures, each socket/package owns more than one L3 cache, containers may encounter performance degradation for L3 cache interference and lower hit rate.
Add support for L3 cache affinity during container cpu allocation, while in the same package/socket, try to use cpus sharing L3 cache for container demand but not just choose from all cpus in the package/socket.

## Motivation

Kubernetes cpu-manager tries to allocate cpus in the same core, socket/package, gaining better performance. In traditional architecture, L3 cache is shared between the whole socket, current cpus allocator works well.
However, the allocation algorithm may encounter problem in processors like 2nd Gen AMD EPYC™, each ccx(a term used by AMD to describe a cluster of physical cores along with the shared level 3 cache) owns its L3 cache, more than one L3 cache exists in a socket/package. Depending on current cpu allocation may face L3 cache interference. For example, 4 cores with HT in ccx, a container demand for 8 cpus may not get the whole ccx, but get some cpus in other ccx(see figure below), container A and B may affect each other while the other flush l3 cache. In our opinion, container's cpu locality should be considered.

![allocation_motivation](allocation_motivation.png "allocation_motivation")

### Goals

Support L3 cache affinity in cpu allocation in architecture with more than one l3 cache in socket/package.

### Future Work

Cross-die may also decrease process performance. We will add die affinity in the future.

## Proposal

- Add cache id to cadvisor
In cadvisor PR(https://github.com/google/cadvisor/pull/2847/), use /sys/devices/system/cpu/cpu*/cache/index3/id to get L3 cache id of current cpu, and store it as cpu topology.
- Add uncore cache to cadvisor
In cadvisor PR(https://github.com/google/cadvisor/pull/2849), add L3 cache not shared among the whole socket(uncore cache) to core info in cpu topology.

### User Stories (Optional)

Workload is memory sensitive, this feature can reduce memory(L3 cache) latency.
Also, we make a bench with stream2 DAXPY, as we can see, cross ccx(cross l3 cache) gets lower bandwidth.

![stream2_daxpy](stream2_daxpy.png "stream2_daxpy")

### Notes/Constraints/Caveats (Optional)

### Risks and Mitigations

L3 cache affinity will not always get a better performance, however, we do think, workload in containers should not influence other containers. Decreasing L3 cache-miss in individual containers should be taken into consideration during programming workload or use other L3 cache allocation and isolation technology, which are not our topic.

## Design Details

- Feature Gate
More than one l3 cache should exist in a single socket/package, the feature will be auto enabled during cpu allocation.
- General Design
Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.

![design_overview](design_overview.png "design_overview")

### Test Plan

Test should work on two scenarios:
- For AMD rome/milan or other architectures with more than one L3 cache in a socket, cpu allocation for a container should always try to get all demand cpus sharing one L3 cache. Check containers’ cpuset.cpus for verification.
- For other architectures, cpu allocation should be the same as before.

### Graduation Criteria

### Upgrade / Downgrade Strategy

### Version Skew Strategy

## Production Readiness Review Questionnaire

### Feature Enablement and Rollback

### Rollout, Upgrade and Rollback Planning

### Monitoring Requirements

### Dependencies

High version cadvisor is in need, in which cache id and uncore cache info are stored in cpu topology.

### Scalability

### Troubleshooting

## Implementation History

Original design doc with solutions considered: https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit#
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions keps/sig-node/2621-cpu-allocation-llc-affinity/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
title: Add llc affinity to cpu manager
kep-number: 2621
authors:
- "@ranchothu"
owning-sig: sig-node
participating-sigs:
- sig-node
status: provisional
creation-date: 2021-05-06
reviewers:
- "@derekwaynecarr"
- "@dchen1107"
approvers:
- "@derekwaynecarr"
- TBD
prr-approvers:
- TBD
labels:
- sig/node

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.22"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.22"
beta: "v1.23"
stable: "v1.24"

# The following PRR answers are required at beta release
metrics:
- N/A
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 40c8fd4

Please sign in to comment.