Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use secrets to configure HF_TOKEN #22

Merged
merged 1 commit into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,4 @@ perf-test.py
/try

values-*.yaml
helm/examples
2 changes: 1 addition & 1 deletion helm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1
version: 0.0.2

maintainers:
- name: apostac
7 changes: 7 additions & 0 deletions helm/templates/deployment-vllm-multi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,13 @@ spec:
env:
- name: HF_HOME
value: /data
{{- if $modelSpec.hf_token }}
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: {{ .Release.Name }}-secrets
key: hf_token_{{ $modelSpec.name }}
{{- end }}
{{- with $modelSpec.env }}
{{- toYaml . | nindent 10 }}
{{- end }}
Expand Down
14 changes: 14 additions & 0 deletions helm/templates/secrets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: v1
kind: Secret
metadata:
name: "{{ .Release.Name }}-secrets"
namespace: {{ .Release.Namespace }}
type: Opaque
data:
{{- range $modelSpec := .Values.servingEngineSpec.modelSpec }}
{{- with $ -}}
{{- if $modelSpec.hf_token }}
hf_token_{{ $modelSpec.name }}: {{ $modelSpec.hf_token | b64enc | quote }}
{{- end }}
{{- end }}
{{- end }}
3 changes: 3 additions & 0 deletions helm/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@
},
"required": ["enabled", "cpuOffloadingBufferSize"]
},
"hf_token": {
"type": "string"
},
"env": {
"type": "array",
"items": {
Expand Down
2 changes: 2 additions & 0 deletions helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ servingEngineSpec:
# - enabled: (optional, bool) Enable LMCache, e.g., true
# - cpuOffloadingBufferSize: (optional, string) The CPU offloading buffer size, e.g., "30"
#
# - hf_token: (optional, string) the Huggingface tokens for this model
#
# - env: (optional, list) The environment variables to set in the container, e.g., your HF_TOKEN
#
# - nodeSelectorTerms: (optional, list) The node selector terms to match the nodes
Expand Down
12 changes: 5 additions & 7 deletions tutorials/02-basic-vllm-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ This tutorial guides you through the basic configurations required to deploy a v
## Prerequisites
- A Kubernetes environment with GPU support, as set up in the [00-install-kubernetes-env tutorial](00-install-kubernetes-env.md).
- Helm installed on your system.
- Access to a Hugging Face token (`HF_TOKEN`).
- Access to a HuggingFace token (`HF_TOKEN`).

## Step 1: Preparing the Configuration File

1. Locate the example configuration file `tutorials/assets/values-02-basic-config.yaml`.
2. Open the file and update the following fields:
- Replace `<USERS SHOULD PUT THEIR HF_TOKEN HERE>` with your actual Hugging Face token.
- Write your actual huggingface token in `hf_token: <YOUR HF TOKEN>` in the yaml file.

### Explanation of Key Items in `values-02-basic-config.yaml`

Expand All @@ -37,7 +37,8 @@ This tutorial guides you through the basic configurations required to deploy a v
- `maxModelLen`: The maximum sequence length the model can handle.
- `dtype`: Data type for computations, e.g., `bfloat16` for faster performance on modern GPUs.
- `extraArgs`: Additional arguments passed to the vLLM engine for fine-tuning behavior.
- **`env`**: Environment variables such as `HF_TOKEN` for authentication with Hugging Face.
- **`hf_token`**: The Hugging Face token for authenticating with the Hugging Face model hub.
- **`env`**: Extra environment variables to pass to the model-serving engine.

### Example Snippet
```yaml
Expand All @@ -62,10 +63,7 @@ servingEngineSpec:
dtype: "bfloat16"
extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.8"]

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN>

hf_token: <YOUR HF TOKEN>
```

## Step 2: Applying the Configuration
Expand Down
6 changes: 2 additions & 4 deletions tutorials/03-load-model-from-pv.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,12 @@ servingEngineSpec:
vllmConfig:
maxModelLen: 4096

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN>
hf_token: <YOUR HF TOKEN>
```

> **Explanation:** The `pvcMatchLabels` field specifies the labels to match an existing Persistent Volume. In this example, it ensures that the deployment uses the PV with the label `model: "llama3-pv"`. This provides a way to link a specific PV to your application.

> **Note:** Make sure to replace `<YOUR_HF_TOKEN>` with your actual Hugging Face token in the `env` section.
> **Note:** Make sure to replace `<YOUR_HF_TOKEN>` with your actual Hugging Face token in the yaml.

2. Deploy the Helm chart:

Expand Down
10 changes: 3 additions & 7 deletions tutorials/04-launch-multiple-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ servingEngineSpec:
pvcStorage: "50Gi"
vllmConfig:
maxModelLen: 4096
env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_FOR_LLAMA3.1>
hf_token: <YOUR HF TOKEN FOR LLAMA 3.1>

- name: "mistral"
repository: "vllm/vllm-openai"
Expand All @@ -51,12 +49,10 @@ servingEngineSpec:
pvcStorage: "50Gi"
vllmConfig:
maxModelLen: 4096
env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_FOR_MISTRAL>
hf_token: <YOUR HF TOKEN FOR MISTRAL>
```

> **Note:** Replace `<YOUR_HF_TOKEN_FOR_LLAMA3.1>` and `<YOUR_HF_TOKEN_FOR_MISTRAL>` with your Hugging Face tokens.
> **Note:** Replace `<YOUR HF TOKEN FOR LLAMA 3.1>` and `<YOUR HF TOKEN FOR MISTRAL>` with your Hugging Face tokens.


## Step 2: Deploying the Helm Chart
Expand Down
6 changes: 2 additions & 4 deletions tutorials/05-offload-kv-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,10 @@ servingEngineSpec:
enabled: true
cpuOffloadingBufferSize: "20"

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_HERE>
hf_token: <YOUR HF TOKEN>
```

> **Note:** Replace `<YOUR_HF_TOKEN_HERE>` with your actual Hugging Face token.
> **Note:** Replace `<YOUR HF TOKEN>` with your actual Hugging Face token.

The `lmcacheConfig` field enables LMCache and sets the CPU offloading buffer size to `20`GB. You can adjust this value based on your workload.

Expand Down
4 changes: 1 addition & 3 deletions tutorials/assets/values-02-basic-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,4 @@ servingEngineSpec:
dtype: "bfloat16"
extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.8"]

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN>
hf_token: <YOUR HF TOKEN>
4 changes: 1 addition & 3 deletions tutorials/assets/values-03-match-pv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,4 @@ servingEngineSpec:
vllmConfig:
maxModelLen: 4096

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN>
hf_token: <YOUR HF TOKEN>
8 changes: 2 additions & 6 deletions tutorials/assets/values-04-multiple-models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@ servingEngineSpec:
pvcStorage: "50Gi"
vllmConfig:
maxModelLen: 4096
env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_FOR_LLAMA3.1>
hf_token: <YOUR HF TOKEN FOR LLAMA3.1>

- name: "mistral"
repository: "vllm/vllm-openai"
Expand All @@ -26,6 +24,4 @@ servingEngineSpec:
pvcStorage: "50Gi"
vllmConfig:
maxModelLen: 4096
env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_FOR_MISTRAL>
hf_token: <YOUR HF TOKEN FOR MISTRAL>
4 changes: 1 addition & 3 deletions tutorials/assets/values-05-cpu-offloading.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,4 @@ servingEngineSpec:
enabled: true
cpuOffloadingBufferSize: "20"

env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN_HERE>
hf_token: <YOUR HF TOKEN>