Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): add workload identity in capz #3583

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,11 @@ E2E_CONF_FILE ?= $(ROOT_DIR)/test/e2e/config/azure-dev.yaml
E2E_CONF_FILE_ENVSUBST := $(ROOT_DIR)/test/e2e/config/azure-dev-envsubst.yaml
SKIP_CLEANUP ?= false
SKIP_LOG_COLLECTION ?= false
SKIP_CREATE_MGMT_CLUSTER ?= false
# @sonasingh46: Skip creating mgmt cluster for ci as workload identity needs kind cluster
# to be created with extra mounts for key pairs which is not yet supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with extra mounts for key pairs which is not yet supported by existing e2e framework

did you open an issue in CAPI to add this support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not file it yet. I will file that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# by existing e2e framework. A mgmt cluster(kind) is created as part of e2e suite
# that meets workload identity pre-requisites.
SKIP_CREATE_MGMT_CLUSTER ?= true
WIN_REPO_URL ?=

# Build time versioning details.
Expand Down Expand Up @@ -663,8 +667,12 @@ test-cover: test ## Run tests with code coverage and generate reports.
./hack/codecov-ignore.sh
go tool cover -html=coverage.out -o coverage.html

.PHONY: kind-create-bootstrap
kind-create-bootstrap: $(KUBECTL) ## Create capz kind bootstrap cluster.
export AZWI=true KIND_CLUSTER_NAME=capz-e2e && ./scripts/kind-with-registry.sh

.PHONY: test-e2e-run
test-e2e-run: generate-e2e-templates install-tools ## Run e2e tests.
test-e2e-run: generate-e2e-templates install-tools kind-create-bootstrap ## Run e2e tests.
$(ENVSUBST) < $(E2E_CONF_FILE) > $(E2E_CONF_FILE_ENVSUBST) && \
$(GINKGO) -v --trace --timeout=4h --tags=e2e --focus="$(GINKGO_FOCUS)" --skip="$(GINKGO_SKIP)" --nodes=$(GINKGO_NODES) --no-color=$(GINKGO_NOCOLOR) --output-dir="$(ARTIFACTS)" --junit-report="junit.e2e_suite.1.xml" $(GINKGO_ARGS) ./test/e2e -- \
-e2e.artifacts-folder="$(ARTIFACTS)" \
Expand Down
16 changes: 15 additions & 1 deletion api/v1beta1/azureclusteridentity_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ limitations under the License.
package v1beta1

import (
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/util/validation/field"
webhookutils "sigs.k8s.io/cluster-api-provider-azure/util/webhook"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
)
Expand All @@ -40,7 +43,18 @@ func (c *AzureClusterIdentity) ValidateCreate() error {

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type.
func (c *AzureClusterIdentity) ValidateUpdate(oldRaw runtime.Object) error {
return c.validateClusterIdentity()
var allErrs field.ErrorList
old := oldRaw.(*AzureClusterIdentity)
mboersma marked this conversation as resolved.
Show resolved Hide resolved
if err := webhookutils.ValidateImmutable(
field.NewPath("Spec", "Type"),
old.Spec.Type,
c.Spec.Type); err != nil {
allErrs = append(allErrs, err)
}
if len(allErrs) == 0 {
return c.validateClusterIdentity()
}
return apierrors.NewInvalid(GroupVersion.WithKind("AzureClusterIdentity").GroupKind(), c.Name, allErrs)
CecileRobertMichon marked this conversation as resolved.
Show resolved Hide resolved
}

// ValidateDelete implements webhook.Validator so a webhook will be registered for the type.
Expand Down
81 changes: 81 additions & 0 deletions api/v1beta1/azureclusteridentity_webhook_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,84 @@ func TestAzureClusterIdentity_ValidateCreate(t *testing.T) {
})
}
}

func TestAzureClusterIdentity_ValidateUpdate(t *testing.T) {
g := NewWithT(t)

tests := []struct {
name string
oldClusterIdentity *AzureClusterIdentity
clusterIdentity *AzureClusterIdentity
wantErr bool
}{
{
name: "azureclusteridentity with no change",
clusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: ServicePrincipal,
ClientID: fakeClientID,
TenantID: fakeTenantID,
},
},
oldClusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: ServicePrincipal,
ClientID: fakeClientID,
TenantID: fakeTenantID,
},
},
wantErr: false,
},
{
name: "azureclusteridentity with a change in type",
clusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: ServicePrincipal,
ClientID: fakeClientID,
TenantID: fakeTenantID,
ResourceID: fakeResourceID,
},
},
oldClusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: WorkloadIdentity,
ClientID: fakeClientID,
TenantID: fakeTenantID,
ResourceID: fakeResourceID,
},
},
wantErr: true,
},
{
name: "azureclusteridentity with a change in client ID",
clusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: ServicePrincipal,
ClientID: fakeClientID,
TenantID: fakeTenantID,
ResourceID: fakeResourceID,
},
},
oldClusterIdentity: &AzureClusterIdentity{
Spec: AzureClusterIdentitySpec{
Type: WorkloadIdentity,
ClientID: "diff-fake-Client-ID",
TenantID: fakeTenantID,
ResourceID: fakeResourceID,
},
},
wantErr: true,
},
}

for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
err := tc.clusterIdentity.ValidateUpdate(tc.oldClusterIdentity)
if tc.wantErr {
g.Expect(err).To(HaveOccurred())
} else {
g.Expect(err).NotTo(HaveOccurred())
}
})
}
}
5 changes: 4 additions & 1 deletion api/v1beta1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -557,7 +557,7 @@ const (
)

// IdentityType represents different types of identities.
// +kubebuilder:validation:Enum=ServicePrincipal;UserAssignedMSI;ManualServicePrincipal;ServicePrincipalCertificate
// +kubebuilder:validation:Enum=ServicePrincipal;UserAssignedMSI;ManualServicePrincipal;ServicePrincipalCertificate;WorkloadIdentity
type IdentityType string

const (
Expand All @@ -572,6 +572,9 @@ const (

// ServicePrincipalCertificate represents a service principal using a certificate as secret.
ServicePrincipalCertificate IdentityType = "ServicePrincipalCertificate"

// WorkloadIdentity represents a WorkloadIdentity.
WorkloadIdentity IdentityType = "WorkloadIdentity"
)

// OSDisk defines the operating system disk for a VM.
Expand Down
10 changes: 10 additions & 0 deletions azure/scope/identity.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,16 @@ func (p *AzureCredentialsProvider) GetAuthorizer(ctx context.Context, resourceMa
var authErr error
var cred azcore.TokenCredential
switch p.Identity.Spec.Type {
case infrav1.WorkloadIdentity:
azwiCredOptions, err := NewWorkloadIdentityCredentialOptions().
WithTenantID(p.Identity.Spec.TenantID).
WithClientID(p.Identity.Spec.ClientID).
WithDefaults()
if err != nil {
return nil, errors.Wrapf(err, "failed to setup azwi options for identity %s", p.Identity.Name)
}
cred, authErr = NewWorkloadIdentityCredential(azwiCredOptions)

case infrav1.ServicePrincipal, infrav1.ServicePrincipalCertificate, infrav1.UserAssignedMSI:
if err := createAzureIdentityWithBindings(ctx, p.Identity, resourceManagerEndpoint, activeDirectoryEndpoint, clusterMeta, p.Client); err != nil {
return nil, err
Expand Down
155 changes: 155 additions & 0 deletions azure/scope/workload_identity.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
/*
Copyright 2023 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package scope

import (
"context"
"os"
"strings"
"time"

"github.com/Azure/azure-sdk-for-go/sdk/azcore"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
"github.com/pkg/errors"
)

/*

For workload identity to work we need the following.

|-----------------------------------------------------------------------------------|
|AZURE_AUTHORITY_HOST | The Azure Active Directory (AAD) endpoint. |
|AZURE_CLIENT_ID | The client ID of the Azure AD |
| | application or user-assigned managed identity. |
|AZURE_TENANT_ID | The tenant ID of the Azure subscription. |
|AZURE_FEDERATED_TOKEN_FILE | The path of the projected service account token file. |
|-----------------------------------------------------------------------------------|

With the current implementation, AZURE_CLIENT_ID and AZURE_TENANT_ID are read via AzureClusterIdentity
object and fallback to reading from env variables if not found on AzureClusterIdentity.

AZURE_FEDERATED_TOKEN_FILE is the path of the projected service account token which is by default
"/var/run/secrets/azure/tokens/azure-identity-token".
The path can be overridden by setting "AZURE_FEDERATED_TOKEN_FILE" env variable.

*/

const (
// azureFederatedTokenFileEnvKey is the env key for AZURE_FEDERATED_TOKEN_FILE.
azureFederatedTokenFileEnvKey = "AZURE_FEDERATED_TOKEN_FILE"
// azureClientIDEnvKey is the env key for AZURE_CLIENT_ID.
azureClientIDEnvKey = "AZURE_CLIENT_ID"
// azureTenantIDEnvKey is the env key for AZURE_TENANT_ID.
azureTenantIDEnvKey = "AZURE_TENANT_ID"
// azureTokenFilePath is the path of the projected token.
azureTokenFilePath = "/var/run/secrets/azure/tokens/azure-identity-token" // #nosec G101
// azureFederatedTokenFileRefreshTime is the time interval after which it should be read again.
azureFederatedTokenFileRefreshTime = 5 * time.Minute
)

type workloadIdentityCredential struct {
assertion string
file string
cred *azidentity.ClientAssertionCredential
lastRead time.Time
}

// WorkloadIdentityCredentialOptions contains the configurable options for azwi.
type WorkloadIdentityCredentialOptions struct {
azcore.ClientOptions
ClientID string
TenantID string
TokenFilePath string
}

// NewWorkloadIdentityCredentialOptions returns an empty instance of WorkloadIdentityCredentialOptions.
func NewWorkloadIdentityCredentialOptions() *WorkloadIdentityCredentialOptions {
return &WorkloadIdentityCredentialOptions{}
}

// WithClientID sets client ID to WorkloadIdentityCredentialOptions.
func (w *WorkloadIdentityCredentialOptions) WithClientID(clientID string) *WorkloadIdentityCredentialOptions {
w.ClientID = strings.TrimSpace(clientID)
return w
}

// WithTenantID sets tenant ID to WorkloadIdentityCredentialOptions.
func (w *WorkloadIdentityCredentialOptions) WithTenantID(tenantID string) *WorkloadIdentityCredentialOptions {
w.TenantID = strings.TrimSpace(tenantID)
return w
}

// getProjectedTokenPath return projected token file path from the env variable.
func getProjectedTokenPath() string {
tokenPath := strings.TrimSpace(os.Getenv(azureFederatedTokenFileEnvKey))
if tokenPath == "" {
return azureTokenFilePath
}
return tokenPath
}

// WithDefaults sets token file path. It also sets the client tenant ID from injected env in
// case empty values are passed.
func (w *WorkloadIdentityCredentialOptions) WithDefaults() (*WorkloadIdentityCredentialOptions, error) {
w.TokenFilePath = getProjectedTokenPath()

// Fallback to using client ID from env variable if not set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the workaround for #3409?

I'm looking at that but not sure what needs to be fixed (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
It is still allowed to leave client ID and tenant ID fields on AzureClusterIdentity object to be empty. So this is fallback for cases where those fields are not set on AzureClusterIdentity.

So the behaviour is like this:

  • CAPZ uses the tenant Id and the client ID as specified on AzureClusterIdentity.
  • If client ID is not specified, it tries to read from the env.
  • If tenant ID is not specified, it tries to read from the env.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to make client ID and tenant ID mandatory if identity type is WorkloadIdentity? IMO having fallbacks to env vars could cause confusion and requires additional documentation on the behavior when both CRD and env vars are configured.

Copy link
Contributor Author

@sonasingh46 sonasingh46 May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things here:

  1. AzureClusterIdentity is referenced by AzureCluster.
  2. So if I want to create a workload cluster by using/referencing a AzureClusterIdentity capz should always use the IDs from it.

Also, I do not know why client ID and tenant ID are optionally required and not strictly required on AzureClusterIdentity.

So I am not sure if we should make a special case for workload identity, though it is possible to do so.

Having said that, if I leave the workload identity discussion aside, then still it makes more sense to use IDs present on AzureClusterIdentity. (e.g ManualServicePrincipal )

I am also thinking about the following in deciding the priority:

  1. Read IDs from AzureClusterIdentity and fall back to read ID from env if not present on it. Or
  2. Read IDs from env and fall back to using IDs present on AzureClusterIdentity.

Here [2] looks semantically confusing/incorrect to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to make client ID and tenant ID required, AFAIK there is not Identity type that doesn't need them. Can we create an issue and do this as a follow up?

if w.ClientID == "" {
w.ClientID = strings.TrimSpace(os.Getenv(azureClientIDEnvKey))
if w.ClientID == "" {
return nil, errors.New("empty client ID")
}
}

// Fallback to using tenant ID from env variable.
if w.TenantID == "" {
w.TenantID = strings.TrimSpace(os.Getenv(azureTenantIDEnvKey))
if w.TenantID == "" {
return nil, errors.New("empty tenant ID")
}
}
return w, nil
}

// NewWorkloadIdentityCredential returns a workload identity credential.
func NewWorkloadIdentityCredential(options *WorkloadIdentityCredentialOptions) (azcore.TokenCredential, error) {
w := &workloadIdentityCredential{file: options.TokenFilePath}
cred, err := azidentity.NewClientAssertionCredential(options.TenantID, options.ClientID, w.getAssertion, &azidentity.ClientAssertionCredentialOptions{ClientOptions: options.ClientOptions})
if err != nil {
return nil, err
}
w.cred = cred
return w, nil
}

// GetToken returns the token for azwi.
func (w *workloadIdentityCredential) GetToken(ctx context.Context, opts policy.TokenRequestOptions) (azcore.AccessToken, error) {
return w.cred.GetToken(ctx, opts)
}

func (w *workloadIdentityCredential) getAssertion(context.Context) (string, error) {
if now := time.Now(); w.lastRead.Add(azureFederatedTokenFileRefreshTime).Before(now) {
content, err := os.ReadFile(w.file)
if err != nil {
return "", err
}
w.assertion = string(content)
w.lastRead = now
}
return w.assertion, nil
}
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ spec:
- UserAssignedMSI
- ManualServicePrincipal
- ServicePrincipalCertificate
- WorkloadIdentity
type: string
required:
- clientID
Expand Down
16 changes: 16 additions & 0 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ metadata:
namespace: system
labels:
control-plane: capz-controller-manager
#ToDo: (@sonasingh46): Remove this label as part of aad pod identity deprecation
aadpodidbinding: capz-controller-aadpodidentity-selector

spec:
selector:
matchLabels:
Expand All @@ -16,6 +18,7 @@ spec:
labels:
control-plane: capz-controller-manager
aadpodidbinding: capz-controller-aadpodidentity-selector
azure.workload.identity/use: "true"
annotations:
kubectl.kubernetes.io/default-container: manager
spec:
Expand All @@ -28,6 +31,10 @@ spec:
image: controller:latest
imagePullPolicy: Always
name: manager
volumeMounts:
- mountPath: /var/run/secrets/azure/tokens
name: azure-identity-token
readOnly: true
ports:
- containerPort: 9440
name: healthz
Expand Down Expand Up @@ -75,3 +82,12 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
volumes:
- name: azure-identity-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: api://AzureADTokenExchange
expirationSeconds: 3600
path: azure-identity-token
3 changes: 2 additions & 1 deletion config/rbac/service_account.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
name: manager
namespace: system
namespace: system
Loading