Add documentation for K8s-onprem StartupProbe (#5257)

Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
triton-inference-server · Oct 26, 2023 · 3dfa18f · 3dfa18f
1 parent b5c2e38
commit 3dfa18f
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 1 deletion.
diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md
@@ -234,6 +234,16 @@ EOF
 $ helm install example -f config.yaml .
 ```
 
+## Probe Configuration
+
+In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container.
+By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes.
+If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it,
+ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case.
+The liveliness and readiness probes are being sent only after the first success of a startup probe.
+
+For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md).
+
 ## Using Triton Inference Server
 
 Now that the inference server is running you can send HTTP or GRPC
@@ -316,4 +326,4 @@ CRDs](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-
 
 ```
 $ kubectl delete crd alertmanagerconfigs.monitoring.coreos.com alertmanagers.monitoring.coreos.com podmonitors.monitoring.coreos.com probes.monitoring.coreos.com prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com thanosrulers.monitoring.coreos.com
-```
+```
diff --git a/deploy/k8s-onprem/templates/deployment.yaml b/deploy/k8s-onprem/templates/deployment.yaml
@@ -79,12 +79,25 @@ spec:
             - containerPort: 8002
               name: metrics
           livenessProbe:
+            initialDelaySeconds: 15
+            failureThreshold: 3
+            periodSeconds: 10
             httpGet:
               path: /v2/health/live
               port: http
           readinessProbe:
             initialDelaySeconds: 5
             periodSeconds: 5
+            failureThreshold: 3
+            httpGet:
+              path: /v2/health/ready
+              port: http
+          startupProbe:
+            # allows Triton to load the models during 30*10 = 300 sec = 5 min
+            # starts checking the other probes only after the success of this one
+            # for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
+            periodSeconds: 10
+            failureThreshold: 30
             httpGet:
               path: /v2/health/ready
               port: http