[EKS] Managed worker nodes #139

tabern · 2019-01-30T19:58:11Z

Managed Kubernetes worker nodes will allow you to provision, scale, and update groups of EC2 worker nodes through EKS.

This feature fulfills #57

EKS Managed Node Groups are now GA!

dcherman · 2019-01-30T20:00:58Z

As a quick note, can we make sure that this will interact/play nicely with cluster-autoscaler? If we can get managed, autoscaling worker nodes, this would be amazing.

jaredeis · 2019-02-05T22:53:43Z

Along with draining nodes in an upgrade situation.

dnobre · 2019-02-23T23:56:34Z

Shouldn't this one be solved as part of implementing Fargate for EKS? #32

danmx · 2019-07-29T12:25:00Z

You could use Virtual Kubelet

whereisaaron · 2019-07-29T16:50:35Z

Fargate for EKS is a different thing @dnobre, with Fargate there are no worker nodes to manage. So this issue about managing worker nodes is not relevant to Fargate.

@tabern being able to support cluster-autoscaler is important to people. At the moment it expects to manipulate ASGs through that API. But if you add an EKS API they please contribute a patch or fork to cluster-autoscaler to be able to use the new API.

If you provide your own autoscaling instead, it has to be aware on the cluster workload and all ASGs, you’d need to a k8s service or daemonset to provide custom metrics to ASG. And in some way, when there are many ASGs, to choose which one to scale up/down next, as cluster-autoscaler does.

cluster-autoscaler has to use some tricks to scale to/from zero nodes in an ASG, because it don’t know what a node would look like when there are none. The improvement the EKS API could provide would be to expose what a node would be (instance type, AZ, node labels, node taints, tags) when the node group is scaled to zero.

cluster-autoscaler also has trouble when scaling up multi-AZ ASGs, because it can’t specify which AZ the new node will be in (e.g. when the un-scheduled workload is AZ-specific). The ability to specify an AZ when scaling up and EKS node group would be great.

cdenneen · 2019-07-29T19:08:30Z

@whereisaaron interesting that's exactly what I would consider Fargate for EKS to be. Since it was never released 2 years ago it's implementation is pretty hypothetical but theoretically I would expect an endpoint for your kubeconfig and be able to deploy via kubectl apply or helm install.
I wouldn't expect things like "task" definitions because that's what Fargate for ECS already is.
How would "managed worker nodes" be any different?

Granted the fact that "Fargate for EKS" was never released means we are all just spit balling here.

whereisaaron · 2019-07-30T17:54:12Z

With Fargate, whether ECS Fargate or EKS Fargate there are no worker nodes. That’s why you use a Fargate solution, so you do not have to manage worker nodes. So this issue has no overlap with a Fargate product.

@cdenneen not sure I understand, but what you describe sounds correct, just like EKS endpoint, except no (real) worker nodes, just a virtual-kubelet running as a sidecar to the Pod. The virtual-kubelet and Pod run who-knows-where, because the instances they run on are not our problem with a Fargate solution.

groodt · 2019-10-01T05:03:03Z

Any updates on this issue?

tabern · 2019-10-01T16:30:30Z

@groodt coming soon.... we'll be sure to update when there are updates to share!

ejlp12 · 2019-10-06T03:41:05Z

Does this features will add capability to create worker nodes group from the AWS console (UI)?

tabern · 2019-10-06T22:10:37Z

@ejlp12 yes.

lilley2412 · 2019-10-17T16:33:27Z

@tabern will there be an option to add a userdata script or otherwise modify the instances?

pfremm · 2019-10-17T22:01:34Z

I am curious about logging aggregation as well for managed workers. Any details on how we can aggregate logs as part of this feature?

tabern · 2019-10-18T03:06:57Z

@lilley2412 not at launch, but we plan to add this in the future.

@pfremm yes. You'll be able to use EC2 Autoscaling for reporting group-level metrics. Since managed nodes are standard EC2 instances that run in your account, you will be able to implement any log forwarding/aggregation tooling that you are using today, such as FluentBit/S3 and Fluentd/CloudWatch.

bcmedeiros · 2019-10-21T02:08:27Z

@tabern will this support windows worker nodes?

hrushig · 2019-11-13T18:25:45Z

Who manages security patches or addresses CVEs on these managed worker nodes. Will this still fall under "Security in the Cloud" customer responsibility?

virgofx · 2019-11-18T19:29:33Z

Released GA 11/18 👍

pkoch · 2019-11-18T19:30:48Z

Can we have a link to the docs?

nrdlngr · 2019-11-18T19:34:31Z

Hi! The documentation is deploying now. It should be available shortly, and I'll update with a link here when it is.

tabern · 2019-11-18T20:30:23Z

We're excited to announce that Amazon EKS Managed Node Groups are now generally available!

With Amazon EKS managed node groups you don’t need to separately provision or connect the EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, update, or terminate nodes for your cluster with a single command. Nodes run using the latest EKS-optimized AMIs in your AWS account while node updates and terminations gracefully drain nodes to ensure your applications stay available.

Today, EKS managed node groups are available for new Amazon EKS clusters running Kubernetes version 1.14 with platform version eks.3. You can also update clusters (1.13 or lower) to version 1.14 to take advantage of this feature. Support for existing version 1.14 clusters is coming soon.

Learn more

robgott · 2019-11-18T22:38:25Z

@tabern congrats on the release!
Is CF support in a future release or is doco just pending updates?
Ready to use this but can't use without CF support :/

MarcusNoble · 2019-11-19T06:32:59Z

Does something need to be done to enable this on existing clusters?

Latest EKS 1.14

Also, as @nxf5025 mentions, doesn't look like any ability to pass in userdata or kubelet flags?

Also, will there be support for spot instances?

tabern · 2019-11-19T07:34:01Z

Thanks all! We're pretty excited to introduce this new feature.

@robgott @pc-rshetty CloudFormation support for managed node groups is there today, its just that the documentation is taking a bit longer to publish than we had originally expected.

Specifically, EKS Managed node group introduces a new resource type ”AWS::EKS::Nodegroup“ and an update to existing resource type ”AWS::EKS::Cluster“ to add ClusterSecurityGroupId in Cloudformation. The documentation updates for these changes will be published by 11/21.

@pc-rshetty Cluster Autoscaler should continue to work just like it does today. The biggest change from our end is that we tag every node for auto discovery by cluster autoscaler. Overprovisioner should work. Seems like a helm chart that basically implements the method described here?

@nxf5025 @MarcusNoble today you cannot pass this to managed node groups. However! we're planning to add this in the future as part of support for EC2 Launch Templates #585

Yes, we also will be working on spot support - tracking in #583

The other feature we're currently tracking on the roadmap is Windows Support (#584) but feel free to add more if there are important features you think we should be looking at.

JanneEN · 2019-11-19T07:50:15Z

Are managed Ubuntu node groups also being worked on or should that be added to the roadmap? That was mentioned in the blog post when comparing EKS API with eksctl, it's a feature we need.

drewhemm · 2019-11-19T08:27:03Z

In addition to spot instances, being able to utilise mixed instances policy, as per kubernetes/autoscaler#1886, i.e. t3.large and t3a.large or m5.large and m5d.large etc. This is to increase the probability of a successful instance fulfilment. We are currently using this functionality to good effect and would need to have the same ability with managed worker nodes, along with the ability to specify userdata.

In the UI, this would be simply represented by being able to select multiple instance types and preferably being able to sort them in order of preference. This is how launch template mixed instances policy and overrides currently work:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-launchtemplateoverrides.html

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-launchtemplate.html

omerfsen · 2019-11-19T08:55:45Z

Just one question can this feature utilise spot instances? Could not find it in documentation though Sent from my mobile. Typos are possible!

…

On 19 Nov 2019, at 08:27, Andrew Hemming ***@***.***> wrote: In addition to spot instances, being able to utilise mixed instances policy, as per kubernetes/autoscaler#1886, i.e. t3.large and t3a.large or m5.large and m5d.large etc. This is to increase the probability of a successful instance fulfilment. We are currently using this functionality to good effect and would need to have the same ability with managed worker nodes, along with the ability to specify userdata. In the UI, this would be simply represented by being able to select multiple instance types and preferably being able to sort them in order of preference. This is launch template mixed instances policy and overrides currently work: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-launchtemplateoverrides.html https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-launchtemplate.html — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tabern · 2019-11-21T16:30:18Z

@omerfsen we're tracking spot support in #583

@drewhemm we're considering mixed instance groups part of spot support, agree that without them spot will be difficult to use properly.

@JanneEN that's a good call out, we'd love to make this happen. Thanks for adding as #588

tabern · 2019-11-22T02:33:39Z

Cloudformation Documentation for EKS Managed Node Groups is now published - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html

stevenoctopus · 2019-11-22T14:37:05Z

@tabern Doesn't look like docs have been updated still. That link redirects me to: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html

drewhemm · 2019-11-22T14:58:09Z

Interesting. The link worked when I clicked it 12 hours ago...

stevenoctopus · 2019-11-22T18:54:50Z

It's working for me now! 👍

splieth · 2019-11-27T09:11:05Z

@tabern How are rolling updates supposed to work? Draining nodes actually works, but apparently leads to a downtime.

Let's say I have an existing node group and want to rotate the nodes. To do this (manually), I would replace the node group by creating a new one, waiting for it to become available and then delete the old one afterwards. When doing this, I can actually see that the nodes get drained before the instances get terminated. However, the running pods are more or less terminated simultaneously which leads to a downtime.

In terraform, the mechanism is basically the same, leading to the same result.

Am I doing something wrong?

Edit:
I can also see this behavior when just scaling an existing node group (e.g. by scaling from 3 nodes to 6 and back from 6 to 3).

2nd edit:
Downtime in this case means that I can see some failing requests.

reegnz · 2019-11-28T07:16:17Z

@splieth look into pod disruption budgets, that's what you need to avoid all pods terminate at once.

groodt · 2019-12-01T03:55:16Z

When will we see support for existing 1.14 clusters? My clusters are currently stuck on platform version eks.2.

reegnz · 2019-12-02T15:30:57Z

@groodt I don't think there's much hope there. You could just create a new cluster and move your workloads there.
Hopefully everyone realizes that you're not supposed to build pet clusters either.

MarcusNoble · 2019-12-03T01:53:34Z

I don't see why it couldn't support existing clusters. A little more involved maybe but a cluster can have multiple ASGs associated with it so the new managed nodes could be brought up alongside the existing self-managed and them remove the self-managed when the new nodes are stable.

groodt · 2019-12-03T02:40:25Z

This isn't a new Kubernetes version. Presumably it's some additional process running in the control plane that are aware of the ASGs and that's it.

I saw this in the original announcement:

Today, EKS managed node groups are available for new Amazon EKS clusters running Kubernetes version 1.14 with platform version eks.3. You can also update clusters (1.13 or lower) to version 1.14 to take advantage of this feature. Support for existing version 1.14 clusters is coming soon.

So presumably they do plan to upgrade existing clusters, I'm curious on the timelines. If it's too long, sure I can create a new cluster and migrate workloads easily enough, but it's still annoying to do without downtime.

PerGon · 2020-01-03T09:31:42Z

My 1.14 clusters are still stuck in platform version eks.2. The newest platform version is eks.7 - seems that the rollout of new platform versions for existing clusters is really slow.
While it's fair to assume the creation of new clusters for new K8s versions, seems a bit excessive creating new clusters for new platform versions.
What expectations can we have on the timeline for updates of platform version for existing clusters?

llamahunter · 2020-01-03T15:58:03Z

FWIW, my clusters weren’t updating either, but when I updated all my workers to a newer AMI ahead of the control plane version, all my control planes updated within 48 hours. Coincidence?

pfremm · 2020-02-10T18:25:28Z

Been trying out managed worker nodes and unless I am missing something do I have no ability to see kubelet related logs unless I provision with an SSH key?

splieth · 2020-02-27T16:00:39Z

@pfremm I didn't find another method apart from deploying the SSM agent as DaemonSet and accessing the logs via SSM rather than SSH. But imho that's the better option since the SSH key doesn't need to be shared

reegnz · 2020-02-27T20:37:32Z

@pfremm I suggest you set up container insights to ship logs and metrics into CloudWatch Logs, that worked great for me.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html

All logs from pods and kubelet + kube proxy are then shipped and viewable in cloudwatch. You can then ship that further into elasticsearch as well, so that's also an option 8f you don't like cloudwatch queries.

pdonorio · 2020-04-24T18:50:33Z

@tabern will there be an option to add a userdata script or otherwise modify the instances?

Is there any update on this?

I couldn't find any reference in https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html

This is quite critical to setup nodes for different purposes

mikestef9 · 2020-04-24T19:07:12Z

Hi @pdonorio, that feature request is being tracked in this issue #596

dcherman · 2020-04-24T19:18:10Z

Out of curiosity, does the current managed node group setup specify any kind of flags for --kube-reserved and friends? If so, what are the values based on? I know we'll be able to control those values once #596 has been addressed, but I'm wondering if managed nodegroups would be usable for us today.

erez-rabih · 2020-05-06T15:43:08Z

is there any support for setting taints on a nodegroup?
is there any support for customizing the bootstrap script that is being ran on the nodegroup instances?

amitkatyal · 2021-08-19T16:44:19Z

I am not able to scale down the Managed Node group. I had deployed the 2 nodes cluster (Managed node group) and was able to scale the number of nodes from 2 to 3 but scaling down doesn't work. I tried from AWS managed console and CF template but no luck.

On scaling down the nodes, the node does go into the scheduling disabled state but the work load is not evicted from that node and eventually, the node becomes ready.

For scaling up or down the managed node group, I am only updating the scaling config of the Managed node group.

NAME STATUS ROLES AGE VERSION
ip-10-0-24-220.ap-northeast-1.compute.internal Ready,SchedulingDisabled 101m v1.21.2-eks-c1718fb
ip-10-0-36-47.ap-northeast-1.compute.internal Ready 5h48m v1.21.2-eks-c1718fb
ip-10-0-5-244.ap-northeast-1.compute.internal Ready 25m v1.21.2-eks-c1718fb

tabern added the EKS Amazon Elastic Kubernetes Service label Jan 30, 2019

thomasjungblut mentioned this issue Mar 6, 2019

Support for instance draining? awslabs/amazon-eks-ami#66

Closed

tabern mentioned this issue Mar 29, 2019

[EKS] Cloudformation support for cluster upgrades #115

Closed

mikestef9 changed the title ~~Managed worker nodes~~ [EKS] Managed worker nodes Oct 30, 2019

tabern closed this as completed Nov 18, 2019

ueokande mentioned this issue Nov 18, 2019

AWS::EKS::NodeGroup - Managed Node Groups on EKS aws-cloudformation/cloudformation-coverage-roadmap#264

Closed

tabern mentioned this issue Nov 27, 2019

[EKS] : GUI for creating EKS cluster along with worker nodes #421

Closed

mikestef9 mentioned this issue Nov 28, 2019

[EKS] [request]: Manage IAM identity cluster access with EKS API #185

Closed

tabern mentioned this issue Dec 18, 2019

Interactively Upgrade EKS Worker Node #57

Closed

[EKS] Managed worker nodes #139

[EKS] Managed worker nodes #139

Comments

tabern commented Jan 30, 2019 • edited Loading

EKS Managed Node Groups are now GA!

dcherman commented Jan 30, 2019

jaredeis commented Feb 5, 2019

dnobre commented Feb 23, 2019

danmx commented Jul 29, 2019

whereisaaron commented Jul 29, 2019

cdenneen commented Jul 29, 2019

whereisaaron commented Jul 30, 2019

groodt commented Oct 1, 2019

tabern commented Oct 1, 2019

ejlp12 commented Oct 6, 2019

tabern commented Oct 6, 2019

lilley2412 commented Oct 17, 2019

pfremm commented Oct 17, 2019

tabern commented Oct 18, 2019

bcmedeiros commented Oct 21, 2019

hrushig commented Nov 13, 2019

virgofx commented Nov 18, 2019 • edited Loading

pkoch commented Nov 18, 2019

nrdlngr commented Nov 18, 2019

tabern commented Nov 18, 2019 • edited Loading

robgott commented Nov 18, 2019

MarcusNoble commented Nov 19, 2019

tabern commented Nov 19, 2019

JanneEN commented Nov 19, 2019

drewhemm commented Nov 19, 2019 • edited Loading

omerfsen commented Nov 19, 2019 via email

tabern commented Nov 21, 2019

tabern commented Nov 22, 2019

stevenoctopus commented Nov 22, 2019

drewhemm commented Nov 22, 2019

stevenoctopus commented Nov 22, 2019

splieth commented Nov 27, 2019 • edited Loading

reegnz commented Nov 28, 2019

groodt commented Dec 1, 2019

reegnz commented Dec 2, 2019

MarcusNoble commented Dec 3, 2019

groodt commented Dec 3, 2019 • edited Loading

PerGon commented Jan 3, 2020

llamahunter commented Jan 3, 2020

pfremm commented Feb 10, 2020

splieth commented Feb 27, 2020 • edited Loading

reegnz commented Feb 27, 2020 • edited Loading

pdonorio commented Apr 24, 2020

mikestef9 commented Apr 24, 2020

dcherman commented Apr 24, 2020

erez-rabih commented May 6, 2020

amitkatyal commented Aug 19, 2021

tabern commented Jan 30, 2019 •

edited

Loading

virgofx commented Nov 18, 2019 •

edited

Loading

tabern commented Nov 18, 2019 •

edited

Loading

drewhemm commented Nov 19, 2019 •

edited

Loading

splieth commented Nov 27, 2019 •

edited

Loading

groodt commented Dec 3, 2019 •

edited

Loading

splieth commented Feb 27, 2020 •

edited

Loading

reegnz commented Feb 27, 2020 •

edited

Loading