-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BalanceSimilarNodeGroups when scaling from zero #3608
Conversation
Use real-world nodeInfo from similar groups when BalanceSimilarNodeGroups. Where available, the CA evaluates node groups capacities and labels using real-world nodes, for optimal accuracy (accounting for kubelet and system reserved resources, for instance). It fallbacks to synthetic nodeInfos infered from ASG/MIG/VMSs templates when no real nodes are available yet (ie. upscaling from zero/empty node group). That asymetric evaluation can prevent `BalanceSimilarNodeGroups` from working reliably when upscaling from zero: * The first upscaled nodeGroup will get a node (capacity initially evaluated from ASG/MIG/VMSS template, then from the first real node, runtime labels) * Other empty nodeGroups will likely be considered dissimilar to the first one: we're comparing ASG templates with real-world nodeInfo * In the case of `least-waste` expander for instance, CA might favor the nodegroup from real-world node, accounting for reserved resources (best usage) over the others This change set implements [Maciek Pytel suggestion (from a previous attempt at fixing this issue)](kubernetes#2892 (comment)): we try to use real-world nodes examples from ~similar node groups, before a last resort fallback to synthetic nodeInfos built from ASG templates. We compare nodeInfos using the `FindSimilarNodeGroups` processor primitives in order to leverage per cloud provider comparators (eg. label exclusions). We look for similar nodegroups using synthetic nodeInfo from ASG/MIG (where available): comparing their nodeInfo from real-world nodes to a synthetic nodeInfo from the empty nodegroup wouldn't work (that's the root cause of this issue). When ~similar nodegroups are found that way, we use their nodeInfos built from real-world nodes as models for empty nodegroups (but keeping original location related labels). Tested with AWS ASGs, GCP MIGs and Azure VMSS.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).
This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).
I think this should be implemented as a processor:
Those processors were added specifically to allow to use custom logic to replace balancing (NodeGroupSetProcessor) and allow custom overrides to NodeInfo map produced by GetNodeInfosForGroups (NodeInfoProcessor). My reasons are:
|
Ping. Please respond to @MaciekPytel comment. |
@mwielgus : Maciek comment makes perfect sense, I want to secure a bit of time to try implementing the suggested options (I suspect I'll come back with questions if you don't mind). |
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
Implemented @MaciekPytel suggestion in PR #3761 ; closing in favour of the new PR. |
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).
This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered. `GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered. `GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered. `GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
Use real-world nodeInfo from similar groups when BalanceSimilarNodeGroups.
Where available, the CA evaluates node groups capacities and labels using real-world nodes, for optimal accuracy (accounting for kubelet and system reserved resources, for instance).
It fallbacks to synthetic nodeInfos inferred from ASG/MIG/VMSs templates when no real nodes are available yet (ie. upscaling from zero/empty node group).
That asymmetric evaluation can prevent
BalanceSimilarNodeGroups
from working reliably when upscaling from zero:least-waste
expander for instance, CA might prefer the nodegroup from real-world node, accounting for reserved resources (best usage) over the othersThis change set implements Maciek Pytel suggestion (from a previous attempt at fixing this issue): we try to use real-world nodes examples from ~similar node groups, before a last resort fallback to synthetic nodeInfos built from ASG templates.
We compare nodeInfos using the
FindSimilarNodeGroups
processor primitives in order to leverage per cloud provider comparators (eg. label exclusions).We look for similar nodegroups using synthetic nodeInfo from ASG/MIG (where available): comparing their nodeInfo from real-world nodes to a synthetic nodeInfo from the empty nodegroup wouldn't work (that's the root cause of this issue). When ~similar nodegroups are found that way, we use their nodeInfos built from real-world nodes as models for empty nodegroups (but keeping original location related labels).
Tested with AWS ASGs, GCP MIGs and Azure VMSS.