-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release v0.1.14
- Loading branch information
Showing
6 changed files
with
279 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
107 changes: 107 additions & 0 deletions
107
docs/guides/monitoring-cluster/api-priority-and-fairness.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# Kubernetes API Priority And Fairness | ||
|
||
Kubernetes [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) (APF) allows requests to the Kubernetes API server to be classified, isolated, and queued in a fine-grained way. | ||
|
||
The APF metrics can be monitored to determine how well the API servers are handling the workload. The metrics are intended to be interpreted by tools like [Prometheus](https://prometheus.io/) or [VictoriaMetrics](https://victoriametrics.com/). This document will use them in their raw form. | ||
|
||
The metrics covered by this document are **counter** type. Counters are incremented, never decremented. While sampling counters in raw form, they will appear to bounce but on an idle system a given counter should make its current high value known after it appears in 3-5 samples. | ||
|
||
## Concepts | ||
|
||
Requests coming into the API server are classified by `FlowSchemas` and assigned to priority levels. The FlowSchema assigns the request to a **flow** and gives it a **flow distinguisher**. The flow distinguisher indicates the origin of the request--a user, service account, controller, namespace, or nothing. A priority level may take requests from multiple flows. The priority level attempts to give equal response time to each flow. | ||
|
||
To view FlowSchemas and their assigned priority levels: | ||
|
||
```console | ||
kubectl get flowschemas | ||
``` | ||
|
||
Flowschema sample output: | ||
|
||
```bash | ||
NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL | ||
[...] | ||
system-leader-election leader-election 100 ByUser 112d False | ||
endpoint-controller workload-high 150 ByUser 112d False | ||
workload-leader-election leader-election 200 ByUser 112d False | ||
system-node-high node-high 400 ByUser 112d False | ||
system-nodes system 500 ByUser 112d False | ||
[...] | ||
``` | ||
|
||
To view priority levels: | ||
|
||
```console | ||
kubectl get prioritylevelconfiguration | ||
``` | ||
|
||
Priority level sample output: | ||
|
||
```bash | ||
NAME TYPE NOMINALCONCURRENCYSHARES QUEUES HANDSIZE QUEUELENGTHLIMIT AGE | ||
[...] | ||
global-default Limited 20 128 6 50 112d | ||
leader-election Limited 10 16 4 50 112d | ||
node-high Limited 40 64 6 50 112d | ||
system Limited 30 64 6 50 112d | ||
workload-high Limited 40 128 6 50 112d | ||
workload-low Limited 100 128 6 50 112d | ||
[...] | ||
``` | ||
|
||
## Metric types | ||
|
||
As noted earlier, the metrics will be viewed in their raw form and they are all of **counter** type. An individual counter must be sampled multiple times before its current high value can be clearly identified. | ||
|
||
To view a counter's type: | ||
|
||
```console | ||
kubectl get --raw /metrics | grep flowcontrol_rejected | grep '^#' | ||
``` | ||
|
||
The output will describe the counter and its type: | ||
|
||
```bash | ||
# HELP apiserver_flowcontrol_rejected_requests_total [BETA] Number of requests rejected by API Priority and Fairness subsystem | ||
# TYPE apiserver_flowcontrol_rejected_requests_total counter | ||
``` | ||
|
||
## Examples | ||
|
||
A quick way to get a summary of requests by priority level: | ||
|
||
```console | ||
kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels | ||
``` | ||
|
||
From here one can drill down into the `Flowschemas` that feed a given priority level to see which one is generating the traffic. | ||
|
||
View activity that uses the **nnf-clientmount** credentials: | ||
|
||
```console | ||
kubectl get --raw /metrics | grep 'flow_schema=\"nnf-clientmount\"' | head -6 | ||
``` | ||
|
||
View activity that uses the **viewer** user credential: | ||
|
||
```console | ||
kubectl get --raw /metrics | grep 'flow_schema=\"nodediag-kubectls\"' | head -6 | ||
``` | ||
|
||
## Resources | ||
|
||
### Kubernetes | ||
|
||
A description of APF: | ||
[API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) | ||
|
||
Debugging guide: | ||
[Flow Control](https://kubernetes.io/docs/reference/debug-cluster/flow-control/) | ||
|
||
### Other sources | ||
|
||
An excellent, though dated, description of tunables: | ||
[Kubernetes API and flow control: Managing request quantity and queuing procedure](https://blog.palark.com/kubernetes-api-flow-control-management/) | ||
|
||
Slide deck that gets into the algorithms: | ||
[Kubernetes API Priority and Fairness](https://speakerdeck.com/ladicle/kubernetes-api-priority-and-fairness) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# Kubernetes Auditing | ||
|
||
Auditing provides records of each request that arrives in the kube-apiserver. The audit record will indicate what happened and who requested it. | ||
|
||
## Enable Auditing | ||
|
||
Enable auditing by installing an audit policy configuration file on each k8s master, creating a directory on the master to hold the audit logs, and providing the appropriate commandline options to kube-apiserver. | ||
|
||
### Install an audit policy | ||
|
||
The audit policy file will be installed on each k8s master node as `/etc/kubernetes/policies/audit-policy.yaml`. | ||
|
||
The following is an example audit policy file that captures events for the NNF stack. Other examples can be found later in this document. | ||
|
||
```bash | ||
apiVersion: audit.k8s.io/v1 | ||
kind: Policy | ||
|
||
omitStages: | ||
- RequestReceived | ||
|
||
rules: | ||
- level: Metadata | ||
verbs: ["get", "list", "watch", "create", "patch", "update"] | ||
resources: | ||
|
||
- group: lus.cray.hpe.com | ||
- group: dataworkflowservices.github.io | ||
- group: nnf.cray.hpe.com | ||
- group: dm.cray.hpe.com | ||
``` | ||
|
||
### Create a log directory | ||
|
||
Create a directory on each k8s master to contain the audit logs. | ||
|
||
```console | ||
mkdir /var/log/kubernetes | ||
``` | ||
|
||
### Configure the kube-apiserver | ||
|
||
The following is an example patch to apply to the `/etc/kubernetes/manifests/kube-apiserver.yaml` file on each k8s master node. The arguments in this patch refer to the audit policy file location and audit log location used earlier in this document. | ||
|
||
**Do not copy the `kube-apiserver.yaml` file to other master nodes. It contains IP addresses that are specific to one master node.** | ||
|
||
After applying this patch to `kube-apiserver.yaml`, clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix. | ||
|
||
The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver. | ||
|
||
```bash | ||
--- a/kube-apiserver.yaml-orig 2024-05-13 12:18:48.256680095 -0700 | ||
+++ b/kube-apiserver.yaml 2024-05-28 13:39:50.342694448 -0700 | ||
@@ -41,6 +41,9 @@ | ||
- --service-cluster-ip-range=10.96.0.0/12 | ||
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt | ||
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key | ||
+ - --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml | ||
+ - --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log | ||
+ - --audit-log-maxsize=100 | ||
image: registry.k8s.io/kube-apiserver:v1.29.3 | ||
imagePullPolicy: IfNotPresent | ||
livenessProbe: | ||
@@ -86,6 +89,12 @@ | ||
- mountPath: /etc/kubernetes/pki | ||
name: k8s-certs | ||
readOnly: true | ||
+ - mountPath: /etc/kubernetes/policies/audit-policy.yaml | ||
+ name: k8s-policies | ||
+ readOnly: true | ||
+ - mountPath: /var/log/kubernetes/ | ||
+ name: k8s-log | ||
+ readOnly: false | ||
hostNetwork: true | ||
priority: 2000001000 | ||
priorityClassName: system-node-critical | ||
@@ -105,4 +114,12 @@ | ||
path: /etc/kubernetes/pki | ||
type: DirectoryOrCreate | ||
name: k8s-certs | ||
+ - hostPath: | ||
+ path: /etc/kubernetes/policies/audit-policy.yaml | ||
+ type: File | ||
+ name: k8s-policies | ||
+ - hostPath: | ||
+ path: /var/log/kubernetes/ | ||
+ type: DirectoryOrCreate | ||
+ name: k8s-log | ||
status: {} | ||
``` | ||
|
||
## Disable auditing | ||
|
||
Disable auditing by editing the `/etc/kubernetes/manifests/kube-apiserver.yaml` on each master to remove the `--audit-*` commandline options from the kube-apiserver configuration. The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver. | ||
|
||
Clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix. | ||
|
||
## Auditing in KIND | ||
|
||
The KIND environment that is created by the tools in nnf-deploy already has auditing enabled. See the notes in nnf-deploy's [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) to access the audit log. | ||
|
||
## Reading the audit log | ||
|
||
The `jq(1)` command can be used to make sense of the audit logs. The following `jq` commands have proven useful to the NNF project: | ||
|
||
Pretty-print the log events: | ||
|
||
```console | ||
jq -M . kube-apiserver-audit.log | less | ||
``` | ||
|
||
Dump a quick-to-digest summary of the log events: | ||
|
||
```console | ||
jq -M '[.auditID,.verb,.requestURI,.user.username,.responseStatus.code,.stageTimestamp]' kube-apiserver-audit.log | less | ||
``` | ||
|
||
Extract a specific event record from the log: | ||
|
||
```console | ||
jq -M '. | select(.auditID=="d1053ee5-0734-4b40-815f-3f6831f82bac")' kube-apiserver-audit.log | less | ||
``` | ||
|
||
## Example audit policies | ||
|
||
Log all activity from the clientmountd daemon. Extract records from the log with: | ||
|
||
```console | ||
jq -M '.|select(.user.username=="system:serviceaccount:nnf-system:nnf-clientmount")' kube-apiserver-audit.log | ||
``` | ||
|
||
This could also be adjusted to isolate any other ServiceAccount. | ||
|
||
```bash | ||
apiVersion: audit.k8s.io/v1 | ||
kind: Policy | ||
|
||
omitStages: | ||
- RequestReceived | ||
|
||
rules: | ||
|
||
- level: Metadata | ||
users: ["system:serviceaccount:nnf-system:nnf-clientmount"] | ||
resources: | ||
- group: "" # core | ||
- group: lus.cray.hpe.com | ||
- group: dataworkflowservices.github.io | ||
- group: nnf.cray.hpe.com | ||
- group: dm.cray.hpe.com | ||
``` | ||
|
||
A more complex [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) can be found in the nnf-deploy configuration for KIND environments. | ||
|
||
## References | ||
|
||
### Kubernetes | ||
|
||
[Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/) | ||
|
||
### nnf-deploy | ||
|
||
Nnf-deploy contains a more complex audit policy: | ||
[audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters