Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

Closed
raffialqurran opened this issue Jan 20, 2025 · 1 comment
Closed

[BUG] AKS v1.30.5 autoscale issue on node pool #4755

raffialqurran opened this issue Jan 20, 2025 · 1 comment
Labels

Comments

@raffialqurran
Copy link

Describe the bug
when AKS service with version 1.30.5 and auto scaling is enabled on the node pool, the AML service with online endpoint on attached kubernetes does not deploy the endpoint successfully with error "No route found or no valid deployments to route to, please check that the endpoint has at least one deployment with positive weight values or use a deployment specific header to route."

To Reproduce
Steps to reproduce the behavior:
1.create AKS cluster v1.30.5
2. enable auto scaling on the node pool
3. install AML extension
4. from AML attach the kubernetes cluster
5. try to deploy an online endpoint on the AML attached AKS cluster

Expected behavior
Successful online endpoint deployment, but in this case, there are no logs appearing for the endpoint in the first place.

Environment (please complete the following information):

  • CLI Version 2.65.0
  • Kubernetes version 1.30.5
  • CLI Extension version 1.6.2
  • Browser Edge

{"msg":"odd number of arguments passed as key-value pairs for logging","ignored key":{"name":"cluster-autoscaler-status","namespace":"kube-system","uid":"69505184-79d4-4e46-8a45-32ebdbee6be6","resourceVersion":"596181363","creationTimestamp":"2025-01-14T04:15:34.0000000Z","annotations":{"cluster-autoscaler.kubernetes.io/last-updated":"2025-01-15 15:11:06.456845329 +0000 UTC"},"managedFields":[{"manager":"cluster-autoscaler","operation":"Update","apiVersion":"v1","time":"2025-01-15T15:11:06.0000000Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:status":{}},"f:metadata":{"f:annotations":{".":{},"f:cluster-autoscaler.kubernetes.io/last-updated":{}}}}}]},"stacktrace":"goms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.needToWaitForAutoScaler\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:548\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.CheckPodScheduleErr\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:502\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.checkPodErr\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:247\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.propagateReplicaSetStatus\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:203\ngoms.io/azureml/inference-operator/controllers/status.DeploymentToOnlineDeploymentPropagator.Propagate\n\t/workspace/inference-operator/build/controllers/status/deployment_propagator.go:89\ngoms.io/azureml/inference-operator/pkg/reconciler.(*ReconcileAndPropagateStatusComposer).Reconcile\n\t/workspace/inference-operator/build/pkg/reconciler/composer_reconcileandpropagatestatus.go:47\ngoms.io/azureml/inference-operator/controllers.(*OnlineDeploymentReconciler).Reconcile\n\t/workspace/inference-operator/build/controllers/onlinedeployment_controller.go:193\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}
panic: odd number of arguments passed as key-value pairs for logging

@raffialqurran raffialqurran changed the title [BUG] [BUG] AKS v1.30.5 autoscale issue on node pool Jan 20, 2025
@raffialqurran
Copy link
Author

AMLArcExtension release with fixed version 1.1.70

@olsenme olsenme closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants