Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue operation if parsing some ASG fails #1728

Closed
nielsole opened this issue Feb 26, 2019 · 14 comments · Fixed by #2385
Closed

Continue operation if parsing some ASG fails #1728

nielsole opened this issue Feb 26, 2019 · 14 comments · Fixed by #2385
Assignees
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug.

Comments

@nielsole
Copy link
Contributor

nielsole commented Feb 26, 2019

One of our ASGs uses mixed instance types and has 0 instances.

Expected Behaviour:
Autoscaler behaves like described in: #1647.
The autoscaler fails to parse the mixed ASG with
Unable to build proper template node for {{ASG_NAME}}: Unable to get instance type from launch config or launch template
and continues with the existing others.

Actual Behaviour:
The autoscaler fails to parse the mixed ASG with
Unable to build proper template node for {{ASG_NAME}}: Unable to get instance type from launch config or launch template and stops processing ASGs with the following message:
static_autoscaler.go:312] Failed to scale up: failed to build node infos for node groups: Unable to get instance type from launch config or launch template

I just had to remove the ASG and everything was fine, but a single invalid ASG shouldn't incapacitate the Autoscaler IMO.

@nielsole
Copy link
Contributor Author

This happens not only when there are 0 instances in an ASG but also if all nodes are cordoned in that ASG :/

@bskiba bskiba added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. labels Mar 4, 2019
@Jeffwan
Copy link
Contributor

Jeffwan commented Mar 4, 2019

Yeah. As you know, Mixed instance type in one ASG are not supported. Maybe we can change to skip invalid ASG instead to prevent CA stop working. What do you think?

@nielsole
Copy link
Contributor Author

sounds good.

@Jeffwan
Copy link
Contributor

Jeffwan commented Apr 2, 2019

/sig aws

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2019
@jsharpe
Copy link

jsharpe commented Jul 30, 2019

/remove-lifecycle stale
I'm hitting this issue with one of my ASGs - at the very least processing should continue to scale up as this effectively disables autoscaling for the whole cluster.

@k8s-ci-robot k8s-ci-robot removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/aws labels Jul 30, 2019
@jaypipes
Copy link
Contributor

@jsharpe I believe this issue has been fixed when #1886 merged. It has been cherry-picked to 1.14 cluster autoscaler release branch as well. Can you check to see if you are still seeing this issue?

Thanks!

@jsharpe
Copy link

jsharpe commented Aug 27, 2019

@jaypipes I think I am still seeing this issue; the root cause I identified in my comment: #2248 (comment) in PR #2248 but this was merged without handling this case... I still think that the code should be robust against provider implementation errors though as turning off the autoscaling when an error starts occurring is a bad undesirable behaviour (and could be very costly!)

@jsharpe
Copy link

jsharpe commented Aug 27, 2019

For the record its clusters create via kops that generates ASGs which fallback to the default version for the template and don't explicitly set it.

@jaypipes
Copy link
Contributor

@jsharpe OK, I'm not understanding whether this is an issue in the autoscaler or this is an issue with kops. Are you recommending that the cluster-autoscaler AWS cloud provider code that builds a launch template from the AWS SDK's autoscaling launchTemplateSpec should set the returned launch template version to "$Default" when when the AWS SDK returned an empty string for the launch template spec's version?

@jsharpe
Copy link

jsharpe commented Aug 27, 2019

Yes, I'm currently running a fork with that setup and it seems to resolve the issue I was seeing. I think the issue is with the autoscaler as the aws sdk says that the version parameter field is optional and has a default of "$Default" which I take to mean that you should use "$Default" if the field is empty? I guess ideally the AWS Go SDK would fill this in for us.

jaypipes added a commit to jaypipes/k8s-autoscaler that referenced this issue Sep 25, 2019
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Fixes kubernetes#1728
jaypipes added a commit to jaypipes/k8s-autoscaler that referenced this issue Sep 25, 2019
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
@jaypipes
Copy link
Contributor

Hi @jsharpe, sorry for delay, was on vacation and dealing with other stuff! Please check out #2385 and let me know if this addresses the problem for you. Thanks!

@jaypipes
Copy link
Contributor

/assign @jaypipes

@jsharpe
Copy link

jsharpe commented Sep 26, 2019

@jaypipes Yes #2385 looks like it should address the problem.

Jeffwan pushed a commit to Jeffwan/autoscaler that referenced this issue Oct 10, 2019
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
Jeffwan pushed a commit to Jeffwan/autoscaler that referenced this issue Oct 10, 2019
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
Jeffwan pushed a commit to Jeffwan/autoscaler that referenced this issue Oct 10, 2019
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
aksentyev pushed a commit to aksentyev/autoscaler that referenced this issue Jan 28, 2020
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
aksentyev pushed a commit to aksentyev/autoscaler that referenced this issue Apr 9, 2021
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
tim-smart pushed a commit to arisechurch/autoscaler that referenced this issue Nov 22, 2022
The LaunchTemplateSpecification.Version is a pointer to
string. When the pointer is nil, EC2 AutoScaling API considers the value
to be "$Default", however aws.StringValue(ltSpec.Version) will return an
empty string (which is not considered the same as "$Default" or a nil
string pointer. So, in order to not pass an empty string as the version
for the launch template when we communicate with the EC2 AutoScaling API
using the information in the launchTemplate, we store the string
"$Default" when the ltSpec.Version is a nil pointer.

Issue kubernetes#1728
yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants