-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create AKS cluster due to Service Principal Not Found occurring in multiple regions #1206
Comments
This issue is intermittent - with a pre-created service principal you can run |
Reopening, engineering teams have added mitigations in place for this failure in the Azure portal, customers using the CLI or other tools are advised to continue to use other mitigations |
@jluk is there any timeframe for this to be fixed as it still affects terraform? |
Any work around on this issue. First time terraform apply command is failing . On next run, it is becoming successful. |
We are planning the AKS-side short term mitigation for CLI and other clients such as Terraform now that portal has been resolved. I've reached out to Tom for us to figure out how to mitigate the SP propagation latency for TF. The long term improvement for AAD propagation is being discussed from Active Directory. |
I am no longer seeing this when creating a cluster via the Azure Portal GUI 👍 |
@jnoller @jluk Is Managed Identity which has recently been GA a solution? I am not sure the implementation to wait or retry AAD sync of Managed Identity in AKS cluster creation, but I hope so. https://docs.microsoft.com/en-us/azure/aks/use-managed-identity |
@torumakabe are you still seeing this error occur? We've introduced some improvements for both Portal and CLI which should mitigate this problem, but curious if you're still seeing it and if so what clients are you using. |
I am still seeing it when creating a service principal then deploying an AKS cluster using an ARM template |
@jluk I use Terraform, so I implement a workaround(sleep) after AAD app creation like this https://github.com/ToruMakabe/container-handson/blob/1342101525cfd2b8de7f357a0cf8481ee85f16f9/prep/modules/aks/main.tf#L25 If Managed Identity could solve this AAD propagation problem, I would not use SP anymore. |
I worked around it by running the same command 35 times. az aks create --resource-group kubernetes-cluster-group --name kubernetes-cluster-v3 --node-count 1 --generate-ssh-keys --node-vm-size Standard_D4s_v3
Finished service principal creation[##################################] 100.0000%
Operation failed with status: 'Bad Request'. Details: The credentials in ServicePrincipalProfile were invalid. Please see https://aka.ms/aks-sp-help for more details. (Details: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"unauthorized_client","error_description":"jfggk: Application with identifier 'aafe21' was not found in the directory '8f'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant.\r\nTrace ID: bc\r\nCorrelation ID: 6f\r\nTimestamp: 2020-04-18 10:56:07Z","error_codes":[700016],"timestamp":"2020-04-18 10:56:07Z","trace_id":"bc","correlation_id":"6e","error_uri":"https://login.microsoftonline.com/error?code=700016"}) Then the 35th time it worked. Confidence inspiring |
This issue has been automatically marked as stale because it has not had activity in 90 days. It will be closed if no further activity occurs. Thank you! |
This is still active. |
This should not be experienced on the latest client versions. Are you still seeing this? |
Okay I've run some tests again and looks like it has been fixed for ARM templates. |
Closing this as I believe this is addressed, but we can revisit if the issue lingers |
Tracking issue for issue resolution tracking
AKS and the larger Azure team have been investigating an issue when creating a new AKS cluster and not passing in a pre-created Service Principal (SP), cluster creation may fail with a Service Principal Not Found error.
This error impacts cluster creation in all regions as well as the CLI and Azure Portal.
Azure engineering has root caused this issue to a data replication / caching issue. Teams are working on both short-term mitigation and longer term changes. This issue will be updated as the fixes are deployed globally.
Mitigation/Working around the error
Use the following workarounds:
Please see the AKS FAQ for more information.
Issue Details
AKS creates a Service Principal (SP) on behalf of the user, then AKS attempts to look up the newly created SP within 15 seconds (with retries) which then fails (the SP is created however).
The failure is due to the response not returning the SP. Lookup requests are geo-load-balanced and traffic is directed to a new data center rather than the one accepting the write request. The not found error is due to increased global replication time as well the replica propagation at the storage layer.
The error is non-destructive - users may use the linked work arounds to mitigate until the mitigations are deployed.
The text was updated successfully, but these errors were encountered: