[BUG] AKS v1.30.5 autoscale issue on node pool #4755

raffialqurran · 2025-01-20T07:14:54Z

Describe the bug
when AKS service with version 1.30.5 and auto scaling is enabled on the node pool, the AML service with online endpoint on attached kubernetes does not deploy the endpoint successfully with error "No route found or no valid deployments to route to, please check that the endpoint has at least one deployment with positive weight values or use a deployment specific header to route."

To Reproduce
Steps to reproduce the behavior:
1.create AKS cluster v1.30.5
2. enable auto scaling on the node pool
3. install AML extension
4. from AML attach the kubernetes cluster
5. try to deploy an online endpoint on the AML attached AKS cluster

Expected behavior
Successful online endpoint deployment, but in this case, there are no logs appearing for the endpoint in the first place.

Environment (please complete the following information):

CLI Version 2.65.0
Kubernetes version 1.30.5
CLI Extension version 1.6.2
Browser Edge

{"msg":"odd number of arguments passed as key-value pairs for logging","ignored key":{"name":"cluster-autoscaler-status","namespace":"kube-system","uid":"69505184-79d4-4e46-8a45-32ebdbee6be6","resourceVersion":"596181363","creationTimestamp":"2025-01-14T04:15:34.0000000Z","annotations":{"cluster-autoscaler.kubernetes.io/last-updated":"2025-01-15 15:11:06.456845329 +0000 UTC"},"managedFields":[{"manager":"cluster-autoscaler","operation":"Update","apiVersion":"v1","time":"2025-01-15T15:11:06.0000000Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:status":{}},"f:metadata":{"f:annotations":{".":{},"f:cluster-autoscaler.kubernetes.io/last-updated":{}}}}}]},"stacktrace":"goms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.needToWaitForAutoScaler\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:548\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.CheckPodScheduleErr\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:502\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.checkPodErr\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:247\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.propagateReplicaSetStatus\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:203\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.Propagate\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:89\ngoms.io/azureml/inference-operator/pkg/reconciler.(*ReconcileAndPropagateStatusComposer).Reconcile\n\t/workspace/inference-operator/build/pkg/reconciler/composer_reconcileandpropagatestatus.go:47\ngoms.io/azureml/inference-operator/controllers.(*OnlineDeploymentReconciler).Reconcile\n\t/workspace/inference-operator/build/controllers/onlinedeployment_controller.go:193\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}
panic: odd number of arguments passed as key-value pairs for logging

raffialqurran · 2025-01-22T23:37:49Z

AMLArcExtension release with fixed version 1.1.70

raffialqurran added the bug label Jan 20, 2025

raffialqurran changed the title ~~[BUG]~~ [BUG] AKS v1.30.5 autoscale issue on node pool Jan 20, 2025

olsenme closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

raffialqurran commented Jan 20, 2025

raffialqurran commented Jan 22, 2025

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

Comments

raffialqurran commented Jan 20, 2025

raffialqurran commented Jan 22, 2025