Skip to content

Commit

Permalink
Improve docs (#3909)
Browse files Browse the repository at this point in the history
* Add new entry to diagnosing problems page about ASO pod restarts due
   to crdPattern.
 * Add entry on how to run ASO on an AKS cluster using the task command
   we have for that.
  • Loading branch information
matthchr authored Apr 8, 2024
1 parent be29b9c commit a66c25f
Show file tree
Hide file tree
Showing 5 changed files with 125 additions and 67 deletions.
71 changes: 5 additions & 66 deletions docs/hugo/content/contributing/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ weight: 50
menu:
main:
weight: 50
layout: single
cascade:
- type: docs
- render: always
- type: docs
- render: always
description: "How to contribute new resources to Azure Service Operator v2"
---

Expand All @@ -16,6 +17,8 @@ description: "How to contribute new resources to Azure Service Operator v2"
* [Developer Setup]( {{< relref "developer-setup" >}} ).
* [Adding a new code-generator resource]( {{< relref "add-a-new-code-generated-resource" >}} ).
* [Generator code overview]( {{< relref "generator-overview" >}} ).
* [Running a development version of ASO]( {{< relref "running-a-development-version" >}} ).
* [Testing]( {{< relref "testing" >}} ).

## Directory structure of the operator

Expand All @@ -29,70 +32,6 @@ Key folders of note include:

The size of each dot reflects the size of the file; the legend in the corner shows the meaning of colour.

## Running integration tests

Basic use: run `task controller:test-integration-envtest`.

### Record/replay

The task `controller:test-integration-envtest` runs the tests in a record/replay mode by default, so that it does not touch any live Azure resources. (This uses the [go-vcr](https://github.com/dnaeon/go-vcr) library.) If you change the controller or other code in such a way that the required requests/responses from ARM change, you will need to update the recordings.

To do this, delete the recordings for the failing tests (under `{test-dir}/recordings/{test-name}.yml`), and re-run `controller:test-integration-envtest`. If the test passes, a new recording will be saved, which you can commit to include with your change. All authentication and subscription information is removed from the recording.

To run the test and produce a new recording you will need to have set the required authentication environment variables for an Azure Service Principal: `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, and `AZURE_CLIENT_SECRET`. This Service Principal will need access to the subscription to create and delete resources.

A few tests also need the `TEST_BILLING_ID` variable set to a valid Azure Billing ID when running in record mode. In replay mode this variable is never required. Note that the billing ID is redacted from all recording files so that the resulting file can be replayed by anybody, even somebody who does not know the Billing ID the test was recorded with.

Some Azure resources take longer to provision or delete than the default test timeout of 15m. To change the timeout, set `TIMEOUT` to a suitable value when running task. For example, to give your test a 60m timeout, use:

``` bash
TIMEOUT=60m task controller:test-integration-envtest
```

If you need to create a new Azure Service Principal, run the following commands:

```console
$ az login
… follow the instructions …
$ az account set --subscription {the subscription ID you would like to use}
Creating a role assignment under the scope of "/subscriptions/{subscription ID you chose}"
$ az ad sp create-for-rbac --role contributor --name {the name you would like to use}
{
"appId": "…",
"displayName": "{name you chose}",
"name": "{name you chose}",
"password": "…",
"tenant": "…"
}
```
The output contains `appId` (`AZURE_CLIENT_ID`), `password` (`AZURE_CLIENT_SECRET`), and `tenant` (`AZURE_TENANT_ID`). Store these somewhere safe as the password cannot be viewed again, only reset. The Service Principal will be created as a “contributor” to your subscription which means it can create and delete resources, so **ensure you keep the secrets secure**.

### Running live tests

If you want to skip all recordings and run all tests directly against live Azure resources, you can use the `controller:test-integration-envtest-live` task. This will also require you to set the authentication environment variables, as detailed above.

### Running a single test
By default `task controller:test-integration-envtest` and its variants run all tests. This is often undesirable as you may just be working on a single feature or test. In order to run a subset of tests, use the `TEST_FILTER`:

```bash
TEST_FILTER=<test_name_regex> task controller:test-integration-envtest
```

## Running the operator locally
If you would like to try something out but do not want to write an integration test, you can run the operation locally in a [kind](https://kind.sigs.k8s.io) cluster.

Before launching `kind`, make sure that your shell has the `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, and `AZURE_CLIENT_SECRET` environment variables set. See [above](#recordreplay) for more details about them.

Once you've set the environment variables above, run one of the following commands to create a `kind` cluster:

1. Service Principal authentication cluster: `task controller:kind-create-with-service-principal`.
2. AAD Pod Identity authentication enabled cluster (emulates Managed Identity): `controller:kind-create-with-podidentity`.

You can use `kubectl` to interact with the local `kind` cluster.

When you're done with the local cluster, tear it down with `task controller:kind-delete`.

## Submitting a pull request
Pull requests opened from forks of the azure-service-operator repository will initially have a `skipped` `Validate Pull Request / integration-tests` check which
will prevent merging even if all other checks pass. Once a maintainer has looked at your PR and determined it is ready they will comment `/ok-to-test sha=<sha>`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ More information on the naming convention can be found in that folders [README](

### Record the test passing

See [the code generator README](../#running-integration-tests) for how to run tests and record their HTTP interactions to allow replay.
See [the code generator test README](../testing/#running-integration-tests) for how to run tests and record their HTTP interactions to allow replay.

## Add a new sample

Expand Down
33 changes: 33 additions & 0 deletions docs/hugo/content/contributing/running-a-development-version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Running a Development Version
---

## Locally

If you would like to try something out but do not want to write an integration test, you can run your local version of the
operator locally in a [kind](https://kind.sigs.k8s.io) cluster.

Before launching `kind`, make sure that your shell has the `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`,
and `AZURE_CLIENT_SECRET` environment variables set. See [testing](../testing/#recordreplay) for more details about them.

Once you've set the environment variables above, run one of the following commands to create a `kind` cluster:

1. Service Principal authentication cluster: `task controller:kind-create-with-service-principal`.
2. AAD Pod Identity authentication enabled cluster (emulates Managed Identity): `controller:kind-create-with-podidentity`.

You can use `kubectl` to interact with the local `kind` cluster.

When you're done with the local cluster, tear it down with `task controller:kind-delete`.

## On AKS

Sometimes running in `kind` does not suffice and a real cluster is needed. The `task controller:aks-create-helm-install`
will perform the following actions:
- Create an AKS cluster named `{{.HOSTNAME}}-aso-aks` in a resource group `{{.HOSTNAME}}-aso-rg`. These resources
are created in the subscription set in the `AZURE_SUBSCRIPTION_ID` environment variable.
- By default, the cluster is created in `westus3`, but that can be overridden by specifying the `LOCATION` variable to
the `task` command like so: `task controller:aks-create-helm-install LOCATION=mylocation`
- Create an ACR in the `{{.HOSTNAME}}-aso-rg` associated with the AKS cluster.
- Install `cert-manager` into the cluster (required for ASO).
- Build and push your local container image into the ACR.
- Install ASO into the cluster, using the ACR image as the source for the controller pod.
74 changes: 74 additions & 0 deletions docs/hugo/content/contributing/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: Testing
---

## Running integration tests

Basic use: run `task controller:test-integration-envtest`.

### Record/replay

The task `controller:test-integration-envtest` runs the tests in a record/replay mode by default, so that it does not
touch any live Azure resources. (This uses the [go-vcr](https://github.com/dnaeon/go-vcr) library.) If you change the controller or other code in
such a way that the required requests/responses from ARM change, you will need to update the recordings.

To do this, delete the recordings for the failing tests (under `{test-dir}/recordings/{test-name}.yaml`), and re-run
`controller:test-integration-envtest`. If the test passes, a new recording will be saved, which you can commit to
include with your change. All authentication and subscription information is removed from the recording.

To run the test and produce a new recording you will need to have set the required authentication environment variables
for an Azure Service Principal: `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, and `AZURE_CLIENT_SECRET`.
This Service Principal will need access to the subscription to create and delete resources.

A few tests also need the `TEST_BILLING_ID` variable set to a valid Azure Billing ID when running in record mode.
In replay mode this variable is never required. Note that the billing ID is redacted from all recording files so that
the resulting file can be replayed by anybody, even somebody who does not know the Billing ID the test was recorded with.

Some Azure resources take longer to provision or delete than the default test timeout of 15m. To change the timeout,
set `TIMEOUT` to a suitable value when running task. For example, to give your test a 60m timeout, use:

``` bash
TIMEOUT=60m task controller:test-integration-envtest
```

If you need to create a new Azure Service Principal, run the following commands:

```console
$ az login
… follow the instructions …
$ az account set --subscription {the subscription ID you would like to use}
Creating a role assignment under the scope of "/subscriptions/{subscription ID you chose}"
$ az ad sp create-for-rbac --role contributor --name {the name you would like to use}
{
"appId": "…",
"displayName": "{name you chose}",
"name": "{name you chose}",
"password": "…",
"tenant": "…"
}
```
The output contains `appId` (`AZURE_CLIENT_ID`), `password` (`AZURE_CLIENT_SECRET`), and `tenant` (`AZURE_TENANT_ID`).
Store these somewhere safe as the password cannot be viewed again, only reset. The Service Principal will be created as
a “contributor” to your subscription which means it can create and delete resources, so
**ensure you keep the secrets secure**.

### Running live tests

If you want to skip all recordings and run all tests directly against live Azure resources, you can use the
`controller:test-integration-envtest-live` task. This will also require you to set the authentication environment
variables, as detailed above.

### Running a single test
By default `task controller:test-integration-envtest` and its variants run all tests. This is often undesirable
as you may just be working on a single feature or test. In order to run a subset of tests, use the `TEST_FILTER`:

```bash
TEST_FILTER=<test_name_regex> task controller:test-integration-envtest
```

or, with a timeout:

```bash
TIMEOUT=10m TEST_FILTER=<test_name_regex> task controller:test-integration-envtest
```
12 changes: 12 additions & 0 deletions docs/hugo/content/guide/diagnosing-problems.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,18 @@ If you see this problem, the resource wasn't ever created successfully in Azure
skip deletion of the Azure resource. This can be done by adding the `serviceoperator.azure.com/reconcile-policy: skip`
annotation to the resource in your cluster.

### Resource reports webhook error when applied

The error may look like this:
```
"Error from server (InternalError): error when creating "/tmp/asd": Internal error occurred: failed calling webhook"
```

This may be caused by ASO pod restarts, which you can check via `kubectl get pods -n azureserviceoperator-system`. If
you're seeing the ASO pod restart periodically check its logs to see if something is causing it to exit. A common cause
of this is installing too many CRDs on a free tier AKS cluster overloading the API Server. See
[CRD management](../crd-management/) for more details.

## Getting ASO controller pod logs
The last stop when investigating most issues is to look at the ASO pod logs. We expect that
most resource issues can be resolved using the resources .status.conditions without resorting to
Expand Down

0 comments on commit a66c25f

Please sign in to comment.