Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro ### AKS The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. Finally, the file `linkerd-cni.conf.default` has been removed as is not used. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
- Loading branch information