Skip to content

Commit

Permalink
KEP-2621: Add feature gate.
Browse files Browse the repository at this point in the history
  • Loading branch information
enzoyes committed May 10, 2021
1 parent 62bc695 commit 7fb3c8c
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 20 deletions.
92 changes: 72 additions & 20 deletions keps/sig-node/2621-cpu-allocation-llc-affinity/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
- [Beta to G.A Graduation](#beta-to-ga-graduation)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
Expand All @@ -33,7 +36,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] (R) Graduation criteria is in place
- [x] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
Expand All @@ -47,81 +50,130 @@ Items marked with (R) are required *prior to targeting to a milestone / release*

## Summary

Caches are not considered in current Kubernetes cpu-manager, in some architectures, each socket/package owns more than one L3 cache, containers may encounter performance degradation for L3 cache interference and lower hit rate.
Add support for L3 cache affinity during container cpu allocation, while in the same package/socket, try to use cpus sharing L3 cache for container demand but not just choose from all cpus in the package/socket.
Caches are not considered in current Kubernetes cpu-manager, in some architectures, each socket/package owns more than one L3 cache, containers may encounter performance degradation for L3 cache interference and lower hit rate.
We propose to support for L3 cache affinity during container cpu allocation. While in the same package/socket, try to use cpus sharing L3 cache for container demand but not just choose from all cpus in the package/socket.

## Motivation

Kubernetes cpu-manager tries to allocate cpus in the same core, socket/package, gaining better performance. In traditional architecture, L3 cache is shared between the whole socket, current cpus allocator works well.
However, the allocation algorithm may encounter problem in processors like 2nd Gen AMD EPYC™, each ccx(a term used by AMD to describe a cluster of physical cores along with the shared level 3 cache) owns its L3 cache, more than one L3 cache exists in a socket/package. Depending on current cpu allocation may face L3 cache interference. For example, 4 cores with HT in ccx, a container demand for 8 cpus may not get the whole ccx, but get some cpus in other ccx(see figure below), container A and B may affect each other while the other flush l3 cache. In our opinion, container's cpu locality should be considered.
Kubernetes cpu-manager tries to allocate cpus in the same core, socket/package, gaining better performance. In traditional architecture, L3 cache is shared between the whole socket, current cpus allocator works well.
However, the allocation algorithm may encounter problem in processors like `2nd Gen AMD EPYC™`, each `ccx`(a term used by AMD to describe a cluster of physical cores along with the shared L3 cache) owns its L3 cache, more than one L3 cache exists in a socket/package, we call L3 caches like this as uncore-cache all this design). Depending on current cpu allocation may face uncore-cache interference. For example, 4 cores with HT in ccx, a container demand for 8 cpus may not get the whole ccx, but get some cpus in other ccx(see figure below), container A and B may affect each other while the other flush uncore-cache. In our opinion, container's cpu locality should be considered.

![allocation_motivation](allocation_motivation.png "allocation_motivation")

### Goals

Support L3 cache affinity in cpu allocation in architecture with more than one l3 cache in socket/package.
Support uncore-cache affinity in cpu allocation in architecture.

### Future Work

Cross-die may also decrease process performance. We will add die affinity in the future.
Cross-die may also decrease process performance. We will add die affinity future, and corresponding cpu assignment algorithm implemetation.

## Proposal

In order to make a decision to allocate cpu with uncore-cache affinity, we should be aware of the uncore-cache information in kubelet, current kubelet gets cpu topology with cadvisor, which does not support the related details. So, we add cache id and uncore-cache items to cadvisor(all merged).
- Add cache id to cadvisor
In cadvisor PR(https://github.com/google/cadvisor/pull/2847/), use /sys/devices/system/cpu/cpu*/cache/index3/id to get L3 cache id of current cpu, and store it as cpu topology.
```go
type Cache struct {
+ // Id of memory cache
+ Id int `json:"id"`
// Size of memory cache in bytes.
Size uint64 `json:"size"`
// Type of memory cache: data, instruction, or unified.
Type string `json:"type"`
// Level (distance from cpus) in a multi-level cache hierarchy.
Level int `json:"level"`
}
```
- Add uncore cache to cadvisor
In cadvisor PR(https://github.com/google/cadvisor/pull/2849), add L3 cache not shared among the whole socket(uncore cache) to core info in cpu topology.
In cadvisor PR(https://github.com/google/cadvisor/pull/2849), add L3 cache not shared among the whole socket(uncore cache) to core info in cpu topology. And we can get core->uncore-cache mappings.
```go
type Core struct {
Id int `json:"core_id"`
Threads []int `json:"thread_ids"`
Caches []Cache `json:"caches"`
+ UncoreCaches []Cache `json:"uncore_caches"`
SocketID int `json:"socket_id"`
}
```

### User Stories (Optional)

Workload is memory sensitive, this feature can reduce memory(L3 cache) latency.
Also, we make a bench with stream2 DAXPY, as we can see, cross ccx(cross l3 cache) gets lower bandwidth.
Before change, when kubelet allocates cpus for containers, uncore-cache is not considered, and may get cpus across caches even there're free cpus shared uncore-caches.
We make a bench with `stream2` DAXPY, as we can see, cross ccx(cross uncore-cache) gets lower bandwidth.

![stream2_daxpy](stream2_daxpy.png "stream2_daxpy")

And, when workload is memory sensitive, this feature can improve memory bandwidth significantly(20% above).

### Notes/Constraints/Caveats (Optional)

### Risks and Mitigations

L3 cache affinity will not always get a better performance, however, we do think, workload in containers should not influence other containers. Decreasing L3 cache-miss in individual containers should be taken into consideration during programming workload or use other L3 cache allocation and isolation technology, which are not our topic.
+ Currently no risks was found.
+ Feature is enbled by a gate - a new kube feature with default false, potential risk effects could be limited.

## Design Details

- Feature Gate
More than one l3 cache should exist in a single socket/package, the feature will be auto enabled during cpu allocation.
- Add `CPUManagerLLCAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
- Also, more than one l3 cache should exist in a single socket/package.

- General Design
Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.
- Logic Elaboration
Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.
If we cannot find llc-satisfied cpus, continue the original process(find available cores).

![design_overview](design_overview.png "design_overview")

- feature-gates `CPUManagerLLCAlign`
`CPUManagerLLCAlign` should set `false` in `defaultKubernetesFeatureGates`. And make a judge in `takeByTopology`, `enable`->`(do l3 cache affinity best-fit)`,`disable`->`(skip)`.
### Test Plan

Test should work on two scenarios:
- For AMD rome/milan or other architectures with more than one L3 cache in a socket, cpu allocation for a container should always try to get all demand cpus sharing one L3 cache. Check containers’ cpuset.cpus for verification.
- For other architectures, cpu allocation should be the same as before.
- Unit tests for new added allocation algorithm.
- E2E tests should work on two scenarios:
- For AMD rome/milan or other architectures with more than one L3 cache in a socket, cpu allocation for a container should always try to get all demand cpus sharing one L3 cache. Check containers’ cpuset.cpus for verification.
- For other architectures, cpu allocation should be the same as before.

### Graduation Criteria
#### Alpha

- Implement the new policy.
- Ensure proper e2e node tests are in place.
#### Alpha to Beta Graduation

- Gather feedback from the consumer of the policy.
- No major bugs reported in the previous cycle.
#### Beta to G.A Graduation

- Allowing time for feedback (1 year).
- Risks have been addressed.
### Upgrade / Downgrade Strategy

We expect no impact. The new kube feature is opt-in.

### Version Skew Strategy
No changes needed.

## Production Readiness Review Questionnaire

### Feature Enablement and Rollback

- Feature gate
We use a feature symbol `CPUManagerLLCAlign` with default `false` to kubelet's kube-feature.

### Rollout, Upgrade and Rollback Planning

### Monitoring Requirements

### Dependencies

High version cadvisor is in need, in which cache id and uncore cache info are stored in cpu topology.
High version cadvisor is in need, in which cache id and uncore cache info are stored in cpu topology.

### Scalability

### Troubleshooting

## Implementation History

Original design doc with solutions considered: https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit#
- 2021.5.10: KEP updated, add CPUManagerLLCAlign as a kube feature.
- 2021.5.6: KEP created
- 2021.5.1: Original design doc with solutions considered: https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit#
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7fb3c8c

Please sign in to comment.