Merge release v0.1.14

Release v0.1.14
NearNodeFlash · Feb 6, 2025 · 85fab38 · 85fab38
2 parents f5f55a8 + 82b779f
commit 85fab38
Show file tree

Hide file tree

Showing 6 changed files with 279 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -30,4 +30,3 @@ INFO     -  Documentation built in 0.22 seconds
 INFO     -  [10:59:28] Watching paths for changes: 'docs', 'mkdocs.yml'
 INFO     -  [10:59:28] Serving on http://127.0.0.1:8000/
 ```
-
diff --git a/docs/guides/index.md b/docs/guides/index.md
@@ -27,3 +27,8 @@
 
 * [Disable or Drain a Node](node-management/drain.md)
 * [Debugging NVMe Namespaces](node-management/nvme-namespaces.md)
+
+## Monitoring the Cluster
+
+* [Auditing](monitoring-cluster/auditing.md)
+* [API Priority and Fairness](monitoring-cluster/api-priority-and-fairness.md)
diff --git a/docs/guides/monitoring-cluster/api-priority-and-fairness.md b/docs/guides/monitoring-cluster/api-priority-and-fairness.md
@@ -0,0 +1,107 @@
+# Kubernetes API Priority And Fairness
+
+Kubernetes [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) (APF) allows requests to the Kubernetes API server to be classified, isolated, and queued in a fine-grained way.
+
+The APF metrics can be monitored to determine how well the API servers are handling the workload. The metrics are intended to be interpreted by tools like [Prometheus](https://prometheus.io/) or [VictoriaMetrics](https://victoriametrics.com/). This document will use them in their raw form.
+
+The metrics covered by this document are **counter** type. Counters are incremented, never decremented. While sampling counters in raw form, they will appear to bounce but on an idle system a given counter should make its current high value known after it appears in 3-5 samples.
+
+## Concepts
+
+Requests coming into the API server are classified by `FlowSchemas` and assigned to priority levels. The FlowSchema assigns the request to a **flow** and gives it a **flow distinguisher**. The flow distinguisher indicates the origin of the request--a user, service account, controller, namespace, or nothing. A priority level may take requests from multiple flows. The priority level attempts to give equal response time to each flow.
+
+To view FlowSchemas and their assigned priority levels:
+
+```console
+kubectl get flowschemas
+```
+
+Flowschema sample output:
+
+```bash
+NAME                           PRIORITYLEVEL     MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE    MISSINGPL
+[...]
+system-leader-election         leader-election   100                  ByUser                112d   False
+endpoint-controller            workload-high     150                  ByUser                112d   False
+workload-leader-election       leader-election   200                  ByUser                112d   False
+system-node-high               node-high         400                  ByUser                112d   False
+system-nodes                   system            500                  ByUser                112d   False
+[...]
+```
+
+To view priority levels:
+
+```console
+kubectl get prioritylevelconfiguration
+```
+
+Priority level sample output:
+
+```bash
+NAME              TYPE      NOMINALCONCURRENCYSHARES   QUEUES   HANDSIZE   QUEUELENGTHLIMIT   AGE
+[...]
+global-default    Limited   20                         128      6          50                 112d
+leader-election   Limited   10                         16       4          50                 112d
+node-high         Limited   40                         64       6          50                 112d
+system            Limited   30                         64       6          50                 112d
+workload-high     Limited   40                         128      6          50                 112d
+workload-low      Limited   100                        128      6          50                 112d
+[...]
+```
+
+## Metric types
+
+As noted earlier, the metrics will be viewed in their raw form and they are all of **counter** type. An individual counter must be sampled multiple times before its current high value can be clearly identified.
+
+To view a counter's type:
+
+```console
+kubectl get --raw /metrics | grep flowcontrol_rejected | grep '^#'
+```
+
+The output will describe the counter and its type:
+
+```bash
+# HELP apiserver_flowcontrol_rejected_requests_total [BETA] Number of requests rejected by API Priority and Fairness subsystem
+# TYPE apiserver_flowcontrol_rejected_requests_total counter
+```
+
+## Examples
+
+A quick way to get a summary of requests by priority level:
+
+```console
+kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
+```
+
+From here one can drill down into the `Flowschemas` that feed a given priority level to see which one is generating the traffic.
+
+View activity that uses the **nnf-clientmount** credentials:
+
+```console
+kubectl get --raw /metrics | grep 'flow_schema=\"nnf-clientmount\"' | head -6
+```
+
+View activity that uses the **viewer** user credential:
+
+```console
+kubectl get --raw /metrics | grep 'flow_schema=\"nodediag-kubectls\"' | head -6
+```
+
+## Resources
+
+### Kubernetes
+
+A description of APF:
+[API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)
+
+Debugging guide:
+[Flow Control](https://kubernetes.io/docs/reference/debug-cluster/flow-control/)
+
+### Other sources
+
+An excellent, though dated, description of tunables:
+[Kubernetes API and flow control: Managing request quantity and queuing procedure](https://blog.palark.com/kubernetes-api-flow-control-management/)
+
+Slide deck that gets into the algorithms:
+[Kubernetes API Priority and Fairness](https://speakerdeck.com/ladicle/kubernetes-api-priority-and-fairness)
diff --git a/docs/guides/monitoring-cluster/auditing.md b/docs/guides/monitoring-cluster/auditing.md
@@ -0,0 +1,164 @@
+# Kubernetes Auditing
+
+Auditing provides records of each request that arrives in the kube-apiserver. The audit record will indicate what happened and who requested it.
+
+## Enable Auditing
+
+Enable auditing by installing an audit policy configuration file on each k8s master, creating a directory on the master to hold the audit logs, and providing the appropriate commandline options to kube-apiserver.
+
+### Install an audit policy
+
+The audit policy file will be installed on each k8s master node as `/etc/kubernetes/policies/audit-policy.yaml`.
+
+The following is an example audit policy file that captures events for the NNF stack. Other examples can be found later in this document.
+
+```bash
+apiVersion: audit.k8s.io/v1
+kind: Policy
+
+omitStages:
+- RequestReceived
+
+rules:
+- level: Metadata
+  verbs: ["get", "list", "watch", "create", "patch", "update"]
+  resources:
+
+  - group: lus.cray.hpe.com
+  - group: dataworkflowservices.github.io
+  - group: nnf.cray.hpe.com
+  - group: dm.cray.hpe.com
+```
+
+### Create a log directory
+
+Create a directory on each k8s master to contain the audit logs.
+
+```console
+mkdir /var/log/kubernetes
+```
+
+### Configure the kube-apiserver
+
+The following is an example patch to apply to the `/etc/kubernetes/manifests/kube-apiserver.yaml` file on each k8s master node. The arguments in this patch refer to the audit policy file location and audit log location used earlier in this document.
+
+**Do not copy the `kube-apiserver.yaml` file to other master nodes. It contains IP addresses that are specific to one master node.**
+
+After applying this patch to `kube-apiserver.yaml`, clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix.
+
+The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver.
+
+```bash
+--- a/kube-apiserver.yaml-orig 2024-05-13 12:18:48.256680095 -0700
++++ b/kube-apiserver.yaml 2024-05-28 13:39:50.342694448 -0700
+@@ -41,6 +41,9 @@
+     - --service-cluster-ip-range=10.96.0.0/12
+     - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
+     - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
++    - --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
++    - --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log
++    - --audit-log-maxsize=100
+     image: registry.k8s.io/kube-apiserver:v1.29.3
+     imagePullPolicy: IfNotPresent
+     livenessProbe:
+@@ -86,6 +89,12 @@
+     - mountPath: /etc/kubernetes/pki
+       name: k8s-certs
+       readOnly: true
++    - mountPath: /etc/kubernetes/policies/audit-policy.yaml
++      name: k8s-policies
++      readOnly: true
++    - mountPath: /var/log/kubernetes/
++      name: k8s-log
++      readOnly: false
+   hostNetwork: true
+   priority: 2000001000
+   priorityClassName: system-node-critical
+@@ -105,4 +114,12 @@
+       path: /etc/kubernetes/pki
+       type: DirectoryOrCreate
+     name: k8s-certs
++  - hostPath:
++      path: /etc/kubernetes/policies/audit-policy.yaml
++      type: File
++    name: k8s-policies
++  - hostPath:
++      path: /var/log/kubernetes/
++      type: DirectoryOrCreate
++    name: k8s-log
+ status: {}
+ ```
+
+## Disable auditing
+
+Disable auditing by editing the `/etc/kubernetes/manifests/kube-apiserver.yaml` on each master to remove the `--audit-*` commandline options from the kube-apiserver configuration. The kubelet on that master will detect the change to the `kube-apiserver.yaml` file and will restart the kube-apiserver.
+
+Clear any extra patch or backup files out of `/etc/kubernetes/manifests` because kubelet will read all of them, regardless of the file suffix.
+
+## Auditing in KIND
+
+The KIND environment that is created by the tools in nnf-deploy already has auditing enabled. See the notes in nnf-deploy's [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) to access the audit log.
+
+## Reading the audit log
+
+The `jq(1)` command can be used to make sense of the audit logs. The following `jq` commands have proven useful to the NNF project:
+
+Pretty-print the log events:
+
+```console
+jq -M . kube-apiserver-audit.log | less
+```
+
+Dump a quick-to-digest summary of the log events:
+
+```console
+jq -M '[.auditID,.verb,.requestURI,.user.username,.responseStatus.code,.stageTimestamp]' kube-apiserver-audit.log | less
+```
+
+Extract a specific event record from the log:
+
+```console
+jq -M '. | select(.auditID=="d1053ee5-0734-4b40-815f-3f6831f82bac")' kube-apiserver-audit.log | less
+```
+
+## Example audit policies
+
+Log all activity from the clientmountd daemon. Extract records from the log with:
+
+```console
+jq -M '.|select(.user.username=="system:serviceaccount:nnf-system:nnf-clientmount")' kube-apiserver-audit.log
+```
+
+This could also be adjusted to isolate any other ServiceAccount.
+
+```bash
+apiVersion: audit.k8s.io/v1
+kind: Policy
+
+omitStages:
+- RequestReceived
+
+rules:
+
+- level: Metadata
+  users: ["system:serviceaccount:nnf-system:nnf-clientmount"]
+  resources:
+  - group: "" # core
+  - group: lus.cray.hpe.com
+  - group: dataworkflowservices.github.io
+  - group: nnf.cray.hpe.com
+  - group: dm.cray.hpe.com
+```
+
+A more complex [audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml) can be found in the nnf-deploy configuration for KIND environments.
+
+## References
+
+### Kubernetes
+
+[Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/)
+
+### nnf-deploy
+
+Nnf-deploy contains a more complex audit policy:
+[audit-policy.yaml](https://github.com/NearNodeFlash/nnf-deploy/blob/master/config/audit-policy.yaml)
diff --git a/external/nnf-dm b/external/nnf-dm
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -21,6 +21,8 @@ nav:
       - 'Lustre External MGT': 'guides/external-mgs/readme.md'
       - 'Global Lustre': 'guides/global-lustre/readme.md'
       - 'Disable or Drain a Node': 'guides/node-management/drain.md'
+      - 'Auditing': 'guides/monitoring-cluster/auditing.md'
+      - 'API Priority and Fairness': 'guides/monitoring-cluster/api-priority-and-fairness.md'
       - 'Debugging NVMe Namespaces': 'guides/node-management/nvme-namespaces.md'
       - 'Directive Breakdown': 'guides/directive-breakdown/readme.md'
       - 'System Storage': 'guides/system-storage/readme.md'
Original file line number	Diff line number	Diff line change
Expand Up		@@ -30,4 +30,3 @@ INFO - Documentation built in 0.22 seconds
		INFO - [10:59:28] Watching paths for changes: 'docs', 'mkdocs.yml'
		INFO - [10:59:28] Serving on http://127.0.0.1:8000/
		```