KEP-2621: Add feature gate.

kubernetes · May 10, 2021 · 7fb3c8c · 7fb3c8c
1 parent 62bc695
commit 7fb3c8c
Show file tree

Hide file tree

Showing 2 changed files with 72 additions and 20 deletions.
diff --git a/keps/sig-node/2621-cpu-allocation-llc-affinity/README.md b/keps/sig-node/2621-cpu-allocation-llc-affinity/README.md
@@ -13,6 +13,9 @@
 - [Design Details](#design-details)
   - [Test Plan](#test-plan)
   - [Graduation Criteria](#graduation-criteria)
+    - [Alpha](#alpha)
+    - [Alpha to Beta Graduation](#alpha-to-beta-graduation)
+    - [Beta to G.A Graduation](#beta-to-ga-graduation)
   - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
   - [Version Skew Strategy](#version-skew-strategy)
 - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -33,7 +36,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 - [ ] (R) KEP approvers have approved the KEP status as `implementable`
 - [x] (R) Design details are appropriately documented
 - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
-- [ ] (R) Graduation criteria is in place
+- [x] (R) Graduation criteria is in place
 - [ ] (R) Production readiness review completed
 - [ ] (R) Production readiness review approved
 - [ ] "Implementation History" section is up-to-date for milestone
@@ -47,81 +50,130 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 
 ## Summary
 
-Caches are not considered in current Kubernetes cpu-manager, in some architectures,  each socket/package owns more than one L3 cache, containers may encounter performance degradation for L3 cache interference and lower hit rate.
-Add support for L3 cache affinity during container cpu allocation, while in the same package/socket, try to use cpus sharing L3 cache  for container demand but not just choose from all cpus in the package/socket.
+Caches are not considered in current Kubernetes cpu-manager, in some architectures, each socket/package owns more than one L3 cache, containers may encounter performance degradation for L3 cache interference and lower hit rate.
+We propose to support for L3 cache affinity during container cpu allocation. While in the same package/socket, try to use cpus sharing L3 cache for container demand but not just choose from all cpus in the package/socket.
 
 ## Motivation
 
-Kubernetes cpu-manager tries to allocate cpus in the same core, socket/package, gaining better performance.  In traditional architecture, L3 cache is shared between the whole socket, current cpus allocator works well.
-However, the allocation algorithm may encounter problem in processors like 2nd Gen AMD EPYC™,  each ccx(a term used by AMD to describe a cluster of physical cores along with the shared level 3 cache) owns its L3 cache, more than one L3 cache exists in a socket/package. Depending on current cpu allocation may face L3 cache interference. For example, 4 cores with HT in ccx, a container demand for 8 cpus may not get the whole ccx, but get some cpus in other ccx(see figure below), container A and B may affect each other while the other flush l3 cache. In our opinion, container's cpu locality should be considered.
+Kubernetes cpu-manager tries to allocate cpus in the same core, socket/package, gaining better performance. In traditional architecture, L3 cache is shared between the whole socket, current cpus allocator works well.
+However, the allocation algorithm may encounter problem in processors like `2nd Gen AMD EPYC™`, each `ccx`(a term used by AMD to describe a cluster of physical cores along with the shared L3 cache) owns its L3 cache, more than one L3 cache exists in a socket/package, we call L3 caches like this as uncore-cache all this design). Depending on current cpu allocation may face uncore-cache interference. For example, 4 cores with HT in ccx, a container demand for 8 cpus may not get the whole ccx, but get some cpus in other ccx(see figure below), container A and B may affect each other while the other flush uncore-cache. In our opinion, container's cpu locality should be considered.
 
 ![allocation_motivation](allocation_motivation.png "allocation_motivation")
 
 ### Goals
 
-Support L3 cache affinity in cpu allocation in architecture with more than one l3 cache in socket/package.
+Support uncore-cache affinity in cpu allocation in architecture.
 
 ### Future Work
 
-Cross-die may also decrease process performance. We will add die affinity in the future.
+Cross-die may also decrease process performance. We will add die affinity future, and corresponding cpu assignment algorithm implemetation.
 
 ## Proposal
 
+In order to make a decision to allocate cpu with uncore-cache affinity, we should be aware of the uncore-cache information in kubelet, current kubelet gets cpu topology with cadvisor, which does not support the related details. So, we add cache id and uncore-cache items to cadvisor(all merged).
 - Add cache id to cadvisor
 In cadvisor PR(https://github.com/google/cadvisor/pull/2847/),  use /sys/devices/system/cpu/cpu*/cache/index3/id to get L3 cache id of current cpu, and store it as cpu topology.
+```go
+type Cache struct {
++	// Id of memory cache
++	Id int `json:"id"`
+	// Size of memory cache in bytes.
+	Size uint64 `json:"size"`
+	// Type of memory cache: data, instruction, or unified.
+	Type string `json:"type"`
+	// Level (distance from cpus) in a multi-level cache hierarchy.
+	Level int `json:"level"`
+}
+```
 - Add uncore cache to cadvisor
-In cadvisor PR(https://github.com/google/cadvisor/pull/2849), add L3 cache not shared among the whole socket(uncore cache) to core info in cpu topology.
+In cadvisor PR(https://github.com/google/cadvisor/pull/2849), add L3 cache not shared among the whole socket(uncore cache) to core info in cpu topology. And we can get core->uncore-cache mappings.
+```go
+type Core struct {
+	Id           int     `json:"core_id"`
+	Threads      []int   `json:"thread_ids"`
+	Caches       []Cache `json:"caches"`
++	UncoreCaches []Cache `json:"uncore_caches"`
+	SocketID     int     `json:"socket_id"`
+}
+```
 
 ### User Stories (Optional)
 
-Workload is memory sensitive, this feature can reduce memory(L3 cache) latency.
-Also, we make a bench with stream2 DAXPY, as we can see, cross ccx(cross l3 cache) gets lower bandwidth.
+Before change, when kubelet allocates cpus for containers, uncore-cache is not considered, and may get cpus across caches even there're free cpus shared uncore-caches. 
+We make a bench with `stream2` DAXPY, as we can see, cross ccx(cross uncore-cache) gets lower bandwidth.
 
 ![stream2_daxpy](stream2_daxpy.png "stream2_daxpy")
 
+And, when workload is memory sensitive, this feature can improve memory bandwidth significantly(20% above).
+
 ### Notes/Constraints/Caveats (Optional)
 
 ### Risks and Mitigations
 
-L3 cache affinity will not always get a better performance, however, we do think, workload in containers should not influence other containers. Decreasing L3 cache-miss in individual containers should be taken into consideration during programming workload or use other L3 cache allocation and isolation technology, which are not our topic.
++ Currently no risks was found.
++ Feature is enbled by a gate - a new kube feature with default false, potential risk effects could be limited.
 
 ## Design Details
 
 - Feature Gate
-More than one l3 cache should exist in a single socket/package, the feature will be auto enabled during cpu allocation.
+  - Add `CPUManagerLLCAlign` to kubelet's feature-gates to enable(true)/disable(false) the feature.
+  - Also, more than one l3 cache should exist in a single socket/package.
+
 - General Design
-Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.
+  - Logic Elaboration
+  Try to allocate cpus sharing the same cache if demand is larger than one core. Add L3 cache affinity before tring core affinity best-fit.
+  If we cannot find llc-satisfied cpus, continue the original process(find available cores). 
 
 ![design_overview](design_overview.png "design_overview")
 
+  - feature-gates `CPUManagerLLCAlign`
+    `CPUManagerLLCAlign` should set `false` in `defaultKubernetesFeatureGates`. And make a judge in `takeByTopology`, `enable`->`(do l3 cache affinity best-fit)`,`disable`->`(skip)`.
 ### Test Plan
-
-Test should work on two scenarios:
-- For AMD rome/milan or other architectures with more than one L3 cache in a socket, cpu allocation for a container should always try to get all demand cpus sharing one L3 cache. Check containers’ cpuset.cpus for verification.
-- For other architectures, cpu allocation should be the same as before.
+- Unit tests for new added allocation algorithm.
+- E2E tests should work on two scenarios:
+  - For AMD rome/milan or other architectures with more than one L3 cache in a socket, cpu allocation for a container should always try to get all demand cpus sharing one L3 cache. Check containers’ cpuset.cpus for verification.
+  - For other architectures, cpu allocation should be the same as before.
 
 ### Graduation Criteria
+#### Alpha
+
+ - Implement the new policy.
+ - Ensure proper e2e node tests are in place.
+#### Alpha to Beta Graduation
 
+ - Gather feedback from the consumer of the policy.
+ - No major bugs reported in the previous cycle.
+#### Beta to G.A Graduation
+
+ - Allowing time for feedback (1 year).
+ - Risks have been addressed.
 ### Upgrade / Downgrade Strategy
 
+We expect no impact. The new kube feature is opt-in.
+
 ### Version Skew Strategy
+No changes needed.
 
 ## Production Readiness Review Questionnaire
 
 ### Feature Enablement and Rollback
 
+- Feature gate
+  We use a feature symbol `CPUManagerLLCAlign` with default `false` to kubelet's kube-feature.
+
 ### Rollout, Upgrade and Rollback Planning
 
 ### Monitoring Requirements
 
 ### Dependencies
 
-High version cadvisor is in need, in which cache id  and uncore cache info are stored in cpu topology.
+High version cadvisor is in need, in which cache id and uncore cache info are stored in cpu topology.
 
 ### Scalability
 
 ### Troubleshooting
 
 ## Implementation History
-
-Original design doc with solutions considered: https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit#
+- 2021.5.10: KEP updated, add CPUManagerLLCAlign as a kube feature.
+- 2021.5.6: KEP created
+- 2021.5.1: Original design doc with solutions considered: https://docs.google.com/document/d/1BuiBgsittUnU3heKHRCQ66YYxzAItT5gcPlu3N83PfA/edit#
diff --git a/keps/sig-node/2621-cpu-allocation-llc-affinity/allocation_motivation.png b/keps/sig-node/2621-cpu-allocation-llc-affinity/allocation_motivation.png