-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed Cluster Topology and cluster-autoscaler #8217
Comments
In my opinion this is intended behavior and a limitation/feature of the autoscaler. If you use a MachineDeployment without ClusterClass you would get the same effect. Please also see my comment here: #7293 (comment) |
i guess my question here is, if using MachineDeployment without ClusterClass, i could update the min size annotation and then update the replicas to the new value. how would i do that with a ClusterClass managing the MachineDeployment? |
You would update the annotation, then set the replica field in Cluster.spec.topology and then unset it again. |
Not ideal, but I think it's better then if we try to fixup the replica field on the MD based on the annotation and thus break the functionality that autoscaler doesn't act when replicas is outside of the range |
or instead of setting / unsetting the replica field in Cluster.spec.topology you can also only set it directly on the MD |
i thought we had to leave this field unset if we are using with the autoscaler? (sorry if i've got the details wrong here)
this is my concern too
ack, i'm curious to hear what @MaxFedotov has to say as he is using this feature much more than i am currently. |
It has to be unset for autoscaler to be able to control the field continuously (otherwise they both would try to set the field continuously). But if the goal is basically to temporarily take control of the field to bring it in the min/max range you can set it and directly unset it afterwards |
that sounds like the user can at least work around this issue manually |
I think it's definitely not worse than with the standalone MachineDeployment. In both cases they:
|
Seems like I don't understand what can go wrong in this case :( If we will set
I have to argue about it. In the case of standalone MD, a user needs a single atomic update operation (set annotations, remove replicas). In the case of a Cluster Topology user needs at least 2 operations (but really it will be 3 - update\get\update), which can be hard to make atomically. |
Autoscaler today intentionally does not act when the value of the replica field is outside of the min/max range. If we want to preserve this behavior we can't implement something that always changes the replica field to be inside of the range.
That's a valid point |
But if the user specified intentionally using cluster autoscaler annotation that he wants min number of nodes in nodegroup to be equal to some number, wouldn't it be the correct behavior to set a current number of replicas to this number (because autoscaler won't do it, it will only ensure that the number of replicas won't drop below this number)? |
this is really the central question imo, if we decide to do this then we are diverging from the autoscaler's default behavior. at which point it becomes an expected cluster-api behavior. currently, the autoscaler has chosen not to adjust the cluster when the min/max limits are changed because there might be external factors for why this happened and it is expected that the user will take action. in this case, the proposal would be for the cluster-api controllers to automatically take the action that the user would. my next question would be, what if a user didn't want the replicas to be automatically adjusted, how would we allow that? |
That depends on a user's intention :) If he specified a min number of nodes to be equal N, I think he wants them to be adjusted to the value N (and not to perform some other manual steps to achieve it), otherwise, he just won't do it. |
that's generally what i would expect too, i'm just trying to think about corner cases where a user would adjust min value but not want the replicas automatically updated. it does sound bizarre when said aloud though XD |
Completely agree :) |
I agree that it feels surprising that when replicas is outside of the range it isn't brought into range. But I think if this is the expected behavior then we should improve autoscaler to handle this case correctly. This will benefit not only the ClusterClass use case. It just feels off to implement a workaround/hack in ClusterClass/Cluster API because the behavior of the autoscaler is not what we expect it to be. Essentially by setting the annotations we hand over control of the replica field to the autoscaler. Then the autoscaler should also be responsible to bring the replica field in range. I think when using Cluster API with GitOps folks run into the exact same issue as with ClusterClass. They somehow have to bring replicas in the min/max range but they cannot continuously apply the replica field. As far as I can tell the only scenario where the current behavior works well is when someone manually modifies a MachineDeployment or if someone implements some sort of "one time operation". But continuous reconcile doesn't work, which doesn't seem ideal. |
@sbueringer thanks, understood! |
@MaxFedotov i have a feeling this would be extremely difficult to change in the core autoscaler. this behavior has been long-standing there and i doubt they will want to change it. it's possible that we could try to detect the annotation change in our cloud provider and then adjust the replicas from that side, but since we don't really have a controller there it might not work exactly as we want. we would need to do some research on the cloud-provider side. i have a feeling this won't work though since we won't know when a user has updated the min value. my gut feeling after this conversation is that we should probably leave this behavior as it is and add more documentation to instruct users how to automate this. i'm willing to do some research about it, or happy to collaborate if you want to do some research as well, but i have a feeling the manual approach might be the most straightforward solution. |
Seems like we are in a deadlock situation here.
Although, to be honest, this looks a bit ugly from a design perspective for me :( |
Reading through this thread it seems that there is a common agreement on a couple of points
Considering that, I think we should open an issue in the autoscaler repository, and possibily advocate for it the the autoscaler office hours. If we can trigger discussion about autoscaler acting when the number of replicas is our of the min/max boundary, and get some traction around it, we can solve this issue and also get rid of the CAPI workaround. Documentation could be a stop-gap while this discussion goes on, but the primarily work for this should happen in the autoscaler IMO. |
+1
i think this would be a good discussion if only to help surface the issue we are facing and to see if the SIG might have other ideas as well. the behavior of the autoscaler with respect to min/max nodes has been in place for a long time, so i'm not quite sure how the SIG will react to a proposal to change that behavior, but i do think they will have some ideas to help move the conversation forward, whatever the result may be. |
Not sure if it's a realistic option, but maybe this could also be an optional behavior of the autoscaler controlled via some sort of flag or annotation. |
it's certainly possible and might give us a way to break this jam, that seems like a reasonable path forward to me. |
/triage accepted |
@MaxFedotov @elmiko Given that there is a flag in autoscaler to control this behavior ( (looks like this flag was merged ~ a year ago) Do you think we should update our documentation? (https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling) |
/priority backlog |
/assign |
Hi @elmiko and @sbueringer.
Want to go back to this issue after #7990 was merged.
If I understand correctly, now it is possible to add a new
machineDeployment
to managedCluster
and specifycluster-autoscaler
annotations in spec like:and defaulting webhook will set
replicas
field of createdmachineDeployment
to a value specified incluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size
annotation.But if I will update this field, for example, set
cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3"
inCluster
spec, defaulting webhook will not updatereplicas
field, due to this checkAnd this completely prevents an ability to use cluster-autoscaler annotations when using managed Cluster topologies and
machineDeployment
is already created. Because ifmachineDeployments
are managed bytopology controller
andworkers .machineDeployments .[].metadata .annotations
are updated it generates aPatch
which includes only updated fields. So a user doesn't have any ability to setnewMD.Spec.Replicas
tonil
value.Maybe it is possible to modify this check and exclude
machineDeployments
created bytopology controller
from it?Thanks!
/kind feature
The text was updated successfully, but these errors were encountered: