Skip to content

Commit

Permalink
Merge release v0.1.14
Browse files Browse the repository at this point in the history
Release v0.1.14
  • Loading branch information
roehrich-hpe authored Feb 6, 2025
2 parents f5f55a8 + 82b779f commit 85fab38
Show file tree
Hide file tree
Showing 6 changed files with 279 additions and 2 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,3 @@ INFO - Documentation built in 0.22 seconds
INFO - [10:59:28] Watching paths for changes: 'docs', 'mkdocs.yml'
INFO - [10:59:28] Serving on http://127.0.0.1:8000/
```

5 changes: 5 additions & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@

* [Disable or Drain a Node](node-management/drain.md)
* [Debugging NVMe Namespaces](node-management/nvme-namespaces.md)

## Monitoring the Cluster

* [Auditing](monitoring-cluster/auditing.md)
* [API Priority and Fairness](monitoring-cluster/api-priority-and-fairness.md)
107 changes: 107 additions & 0 deletions docs/guides/monitoring-cluster/api-priority-and-fairness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Kubernetes API Priority And Fairness

Kubernetes [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) (APF) allows requests to the Kubernetes API server to be classified, isolated, and queued in a fine-grained way.

The APF metrics can be monitored to determine how well the API servers are handling the workload. The metrics are intended to be interpreted by tools like [Prometheus](https://prometheus.io/) or [VictoriaMetrics](https://victoriametrics.com/). This document will use them in their raw form.

The metrics covered by this document are **counter** type. Counters are incremented, never decremented. While sampling counters in raw form, they will appear to bounce but on an idle system a given counter should make its current high value known after it appears in 3-5 samples.

## Concepts

Requests coming into the API server are classified by `FlowSchemas` and assigned to priority levels. The FlowSchema assigns the request to a **flow** and gives it a **flow distinguisher**. The flow distinguisher indicates the origin of the request--a user, service account, controller, namespace, or nothing. A priority level may take requests from multiple flows. The priority level attempts to give equal response time to each flow.

To view FlowSchemas and their assigned priority levels:

```console
kubectl get flowschemas
```

Flowschema sample output:

```bash
NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL
[...]
system-leader-election leader-election 100 ByUser 112d False
endpoint-controller workload-high 150 ByUser 112d False
workload-leader-election leader-election 200 ByUser 112d False
system-node-high node-high 400 ByUser 112d False
system-nodes system 500 ByUser 112d False
[...]
```

To view priority levels:

```console
kubectl get prioritylevelconfiguration
```

Priority level sample output:

```bash
NAME TYPE NOMINALCONCURRENCYSHARES QUEUES HANDSIZE QUEUELENGTHLIMIT AGE
[...]
global-default Limited 20 128 6 50 112d
leader-election Limited 10 16 4 50 112d
node-high Limited 40 64 6 50 112d
system Limited 30 64 6 50 112d
workload-high Limited 40 128 6 50 112d
workload-low Limited 100 128 6 50 112d
[...]
```

## Metric types

As noted earlier, the metrics will be viewed in their raw form and they are all of **counter** type. An individual counter must be sampled multiple times before its current high value can be clearly identified.

To view a counter's type:

```console
kubectl get --raw /metrics | grep flowcontrol_rejected | grep '^#'
```

The output will describe the counter and its type:

```bash
# HELP apiserver_flowcontrol_rejected_requests_total [BETA] Number of requests rejected by API Priority and Fairness subsystem
# TYPE apiserver_flowcontrol_rejected_requests_total counter
```

## Examples

A quick way to get a summary of requests by priority level:

```console
kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
```

From here one can drill down into the `Flowschemas` that feed a given priority level to see which one is generating the traffic.

View activity that uses the **nnf-clientmount** credentials:

```console
kubectl get --raw /metrics | grep 'flow_schema=\"nnf-clientmount\"' | head -6
```

View activity that uses the **viewer** user credential:

```console
kubectl get --raw /metrics | grep 'flow_schema=\"nodediag-kubectls\"' | head -6
```

## Resources

### Kubernetes

A description of APF:
[API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)

Debugging guide:
[Flow Control](https://kubernetes.io/docs/reference/debug-cluster/flow-control/)

### Other sources

An excellent, though dated, description of tunables:
[Kubernetes API and flow control: Managing request quantity and queuing procedure](https://blog.palark.com/kubernetes-api-flow-control-management/)

Slide deck that gets into the algorithms:
[Kubernetes API Priority and Fairness](https://speakerdeck.com/ladicle/kubernetes-api-priority-and-fairness)
164 changes: 164 additions & 0 deletions docs/guides/monitoring-cluster/auditing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Kubernetes Auditing

Auditing provides records of each request that arrives in the kube-apiserver. The audit record will indicate what happened and who requested it.

## Enable Auditing

Enable auditing by installing an audit policy configuration file on each k8s master, creating a directory on the master to hold the audit logs, and providing the appropriate commandline options to kube-apiserver.

### Install an audit policy

The audit policy file will be installed on each k8s master node as `/etc/kubernetes/policies/audit-policy.yaml`.

The following is an example audit policy file that captures events for the NNF stack. Other examples can be found later in this document.

```bash
apiVersion: audit.k8s.io/v1
kind: Policy

omitStages:
- RequestReceived

rules:
- level: Metadata
verbs: ["get", "list", "watch", "create", "patch", "update"]
resources:

- group: lus.cray.hpe.com
- group: dataworkflowservices.github.io
- group: nnf.cray.hpe.com
- group: dm.cray.hpe.com
```

### Create a log directory

Create a directory on each k8s master to contain the audit logs.

```console
mkdir /var/log/kubernetes
```

### Configure the kube-apiserver

The following is an example patch to apply to the `/etc/kubernetes/manifests/kube-apiserver.yaml` file on each k8s master node. The arguments in this patch refer to the audit policy file location and audit log location used earlier in this document.

**Do not copy the `kube-apiserver.yaml` file to other master nodes. It contains IP addresses that are specific to one master node.**

After applying this patch to `kube-apiserver.yaml`, clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix.

The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver.

```bash
--- a/kube-apiserver.yaml-orig 2024-05-13 12:18:48.256680095 -0700
+++ b/kube-apiserver.yaml 2024-05-28 13:39:50.342694448 -0700
@@ -41,6 +41,9 @@
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
+ - --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
+ - --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log
+ - --audit-log-maxsize=100
image: registry.k8s.io/kube-apiserver:v1.29.3
imagePullPolicy: IfNotPresent
livenessProbe:
@@ -86,6 +89,12 @@
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
+ - mountPath: /etc/kubernetes/policies/audit-policy.yaml
+ name: k8s-policies
+ readOnly: true
+ - mountPath: /var/log/kubernetes/
+ name: k8s-log
+ readOnly: false
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
@@ -105,4 +114,12 @@
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
+ - hostPath:
+ path: /etc/kubernetes/policies/audit-policy.yaml
+ type: File
+ name: k8s-policies
+ - hostPath:
+ path: /var/log/kubernetes/
+ type: DirectoryOrCreate
+ name: k8s-log
status: {}
```

## Disable auditing

Disable auditing by editing the `/etc/kubernetes/manifests/kube-apiserver.yaml` on each master to remove the `--audit-*` commandline options from the kube-apiserver configuration. The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver.

Clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix.

## Auditing in KIND

The KIND environment that is created by the tools in nnf-deploy already has auditing enabled. See the notes in nnf-deploy's [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) to access the audit log.

## Reading the audit log

The `jq(1)` command can be used to make sense of the audit logs. The following `jq` commands have proven useful to the NNF project:

Pretty-print the log events:

```console
jq -M . kube-apiserver-audit.log | less
```

Dump a quick-to-digest summary of the log events:

```console
jq -M '[.auditID,.verb,.requestURI,.user.username,.responseStatus.code,.stageTimestamp]' kube-apiserver-audit.log | less
```

Extract a specific event record from the log:

```console
jq -M '. | select(.auditID=="d1053ee5-0734-4b40-815f-3f6831f82bac")' kube-apiserver-audit.log | less
```

## Example audit policies

Log all activity from the clientmountd daemon. Extract records from the log with:

```console
jq -M '.|select(.user.username=="system:serviceaccount:nnf-system:nnf-clientmount")' kube-apiserver-audit.log
```

This could also be adjusted to isolate any other ServiceAccount.

```bash
apiVersion: audit.k8s.io/v1
kind: Policy

omitStages:
- RequestReceived

rules:

- level: Metadata
users: ["system:serviceaccount:nnf-system:nnf-clientmount"]
resources:
- group: "" # core
- group: lus.cray.hpe.com
- group: dataworkflowservices.github.io
- group: nnf.cray.hpe.com
- group: dm.cray.hpe.com
```

A more complex [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) can be found in the nnf-deploy configuration for KIND environments.

## References

### Kubernetes

[Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/)

### nnf-deploy

Nnf-deploy contains a more complex audit policy:
[audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml)
2 changes: 1 addition & 1 deletion external/nnf-dm
Submodule nnf-dm updated 129 files
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ nav:
- 'Lustre External MGT': 'guides/external-mgs/readme.md'
- 'Global Lustre': 'guides/global-lustre/readme.md'
- 'Disable or Drain a Node': 'guides/node-management/drain.md'
- 'Auditing': 'guides/monitoring-cluster/auditing.md'
- 'API Priority and Fairness': 'guides/monitoring-cluster/api-priority-and-fairness.md'
- 'Debugging NVMe Namespaces': 'guides/node-management/nvme-namespaces.md'
- 'Directive Breakdown': 'guides/directive-breakdown/readme.md'
- 'System Storage': 'guides/system-storage/readme.md'
Expand Down

0 comments on commit 85fab38

Please sign in to comment.