Skip to content

Commit

Permalink
Add documentation for K8s-onprem StartupProbe (#5257)
Browse files Browse the repository at this point in the history
Co-authored-by: dyastremsky <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
  • Loading branch information
3 people authored Oct 26, 2023
1 parent b5c2e38 commit 3dfa18f
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
12 changes: 11 additions & 1 deletion deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,16 @@ EOF
$ helm install example -f config.yaml .
```

## Probe Configuration

In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container.
By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes.
If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it,
ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case.
The liveliness and readiness probes are being sent only after the first success of a startup probe.

For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md).

## Using Triton Inference Server

Now that the inference server is running you can send HTTP or GRPC
Expand Down Expand Up @@ -316,4 +326,4 @@ CRDs](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-

```
$ kubectl delete crd alertmanagerconfigs.monitoring.coreos.com alertmanagers.monitoring.coreos.com podmonitors.monitoring.coreos.com probes.monitoring.coreos.com prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com thanosrulers.monitoring.coreos.com
```
```
13 changes: 13 additions & 0 deletions deploy/k8s-onprem/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,25 @@ spec:
- containerPort: 8002
name: metrics
livenessProbe:
initialDelaySeconds: 15
failureThreshold: 3
periodSeconds: 10
httpGet:
path: /v2/health/live
port: http
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /v2/health/ready
port: http
startupProbe:
# allows Triton to load the models during 30*10 = 300 sec = 5 min
# starts checking the other probes only after the success of this one
# for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /v2/health/ready
port: http
Expand Down

0 comments on commit 3dfa18f

Please sign in to comment.