Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change max pods per node on cli #13420

Closed
GustavoAmerico opened this issue May 9, 2020 · 15 comments
Closed

Change max pods per node on cli #13420

GustavoAmerico opened this issue May 9, 2020 · 15 comments
Labels
AKS az aks/acs/openshift customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention This issue is responsible by Azure service team.

Comments

@GustavoAmerico
Copy link

Change max pods per node on cli

Problem

i'm facing problem in my AKS, as i found max pods number is 30 per node, when i checked the deployed pods I found +-10 pods are related to AKS system itself, i think max pods number should not include AKS system pods as this reduced my limit to 20 not 30 as mentioned in MS doc.

Expectative:

run az aks nodepool update --cluster-name <you aks cluster name> --name <pool name> --max-pods=$maxpodsize (currently it doesn't work)

Workaround:

  1. I created an new node pool
    az aks nodepool add --cluster-name <you aks cluster name> -g <your-aks-resource-group> --max-pods $maxpodsize --name newpool --enable-cluster-autoscaler --min-count 1 --max-count 1 -c 1 -s <machine size>

  2. Scale down the node of agentpool one per one and scale up newpool as necessary

Workaround problem:

I tried to remove the default pool, but the cli show an error

The message is: Operation failed with status: 'Bad Request'. Details: There has to be at least one system agent pool.

Workaround Expectative:

Cleaning the default pool if other pool was work

@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels May 9, 2020
@yonzhan yonzhan added the AKS az aks/acs/openshift label May 9, 2020
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label May 9, 2020
@yonzhan yonzhan added needs-triage This is a new issue that needs to be triaged to the appropriate team. Service Attention This issue is responsible by Azure service team. labels May 9, 2020
@ghost
Copy link

ghost commented May 9, 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/aks-pm.

@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label May 9, 2020
@yonzhan
Copy link
Collaborator

yonzhan commented May 9, 2020

aks

@yungezz yungezz removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label May 11, 2020
@yungezz
Copy link
Member

yungezz commented May 11, 2020

routing to appropriate team

@jluk
Copy link
Contributor

jluk commented May 11, 2020

This is by design, the pods per node have to represent the actual count of pods viewed from k8s. The system pod count is dependent on the cluster config, if you add multiple addons the "system-pods" will increase and while these pods are managed by the AKS service, they still reside in user-space so you can debug and transparently view them as needed.

It is also by design that there must be 1 system pool, you can delete the whole cluster if you need all the system pools removed.

Closing as a result that this is by design, but comment if this doesnt' clarify and we can revisit.

@jluk jluk closed this as completed May 11, 2020
@GustavoAmerico
Copy link
Author

I understand, displaying the system nodes in the total of pods makes sense, but, I did not understand how you determine that the maximum pod in the node is 30. A cluster with nodes DS12-V2 only run 30 pods equals the cluster with B2MS?

This problem should not be closed without action, I reported 2 problems with the CLI.
Problems:

  1. az aks nodepool update --max-pods not working

  2. It is not possible to delete 1 pool, even if there is another one.

@jluk
Copy link
Contributor

jluk commented May 11, 2020

The max pod setting per node is user defined and done so at create time only for each agent pool. There is no update functionality for it as seen from the possible commands on update:

Command
    az aks nodepool update : Update a node pool to enable/disable cluster-autoscaler or change min-
    count or max-count.

Arguments
    --cluster-name       [Required] : The cluster name.
    --name -n            [Required] : The node pool name.
    --resource-group -g  [Required] : Name of resource group. You can configure the default group
                                      using `az configure --defaults group=<name>`.
    --disable-cluster-autoscaler -d : Disable cluster autoscaler.
    --enable-cluster-autoscaler -e  : Enable cluster autoscaler.
    --max-count                     : Maximum nodes count used for autoscaler, when "--enable-
                                      cluster-autoscaler" specified. Please specify the value in the
                                      range of [1, 100].
    --min-count                     : Minimum nodes count used for autoscaler, when "--enable-
                                      cluster-autoscaler" specified. Please specify the value in the
                                      range of [1, 100].
    --mode                          : The mode for a node pool which defines a node pool's primary
                                      function. If set as "System", AKS prefers system pods
                                      scheduling to node pools with mode `System`. Learn more at
                                      https://aka.ms/aks/nodepool/mode.  Allowed values: System,
                                      User.
    --no-wait                       : Do not wait for the long-running operation to finish.
    --tags                          : Space-separated tags: key[=value] [key[=value] ...]. Use '' to
                                      clear existing tags.
    --update-cluster-autoscaler -u  : Update min-count or max-count for cluster autoscaler.

Clusters require at least 1 system pool, as returned in err msg. This is detailed here:
https://docs.microsoft.com/en-us/azure/aks/use-system-pools

@pvorb
Copy link

pvorb commented Nov 12, 2020

But why can't you update the maximum number of pods for an existing nodepool? Could this be added to the cli, please?

@palma21
Copy link
Member

palma21 commented Nov 12, 2020

Because that would involve re-wiring the pod CIDR which require a node reboot across the node pool at which point it makes the scenario to create a new pool and delete the previous one a bit more of controlled one.

Happy to evaluate your scenario is it's a major pain for you to do this.

@pvorb
Copy link

pvorb commented Nov 12, 2020

Okay, I fully understand. My original problem is that the AKS collection for Ansible doesn't support specifying --max-pods upon cluster creation: ansible-collections/azure#308.

So now I'm trying to build a workaround:

  1. set up cluster with system nodepool via ansible
  2. increase --max-pods for the existing cluster via az cli

But now I think I should wait for an action on the other issue.

@scheung38
Copy link

scheung38 commented Jan 8, 2023

Hi @palma21 and anyone: is this still the case, after AKS cluster creation, we cannot later increase the maxPod to 250 worker nodes? Because I see default value of 30 in portal for worker nodes, even though in my bicep config says 250?

@description('Specifies the maximum number of pods that can run on a node in the user node pool. The maximum number of pods per node in an AKS cluster is 250. The default maximum number of pods per node varies between kubenet and Azure CNI networking, and the method of cluster deployment.')
param userAgentPoolMaxPods int = 250 //30

Why is it not showing 250 ?

@Bv-Lucas
Copy link

bump

@iamandymcinnes
Copy link

250 is the kubernetes max, 110 is the default, but on AKS the default is 30. I'm more interested in why is AKS defaulting to 30. This makes it very easy to get the scenario where the node pool autoscaler doesn't work as there is no memory or cpu pressure at 30 pods. Perhaps you should align with the kubernetes default of 110 or fix the autoscaler to also pay attention to max node count and the current allocation?

@Bv-Lucas
Copy link

I believe this is because AKS doesn't support changing this value after creation
Since when using CNI the "--max-pods" parameter is used to reserve IP addresses on the associated subnet upfront, it kind of makes sense to have a rather low value for a managed service default

@iamandymcinnes
Copy link

iamandymcinnes commented Mar 22, 2023

I'm sure this is the same for any cloud hosted K8s cluster, but GKE etc stick to that default of 110, and really is 80 address reseverations extra per node that big of a deal, especially when they recommend an address space of /16 on your vnet? Either way 30 is very low and so the autoscaler just doesn't scale when you hit that 30 pod limit and there's nothing you can do about it. So the way I see it is set a sensible default that is likely to get some pressure to trigger the autoscaler, or probably better still make the autoscaler also watch the pod allocation and limit to trigger scaling that way.

@Bv-Lucas
Copy link

Bv-Lucas commented Mar 22, 2023

30 is the default only when using Azure CNI they don't explicitly state why I'm just making an educated guess, although apparently if you deploy from the portal it defaults to 110 even with Azure CNI.
If you use kubenet AKS default is 110 like kubernetes default
https://learn.microsoft.com/en-us/azure/aks/configure-azure-cni

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AKS az aks/acs/openshift customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

9 participants