Skip to content
This repository was archived by the owner on Jul 24, 2025. It is now read-only.

Commit c4a1e73

Browse files
authored
Document behaviors for model artifacts (#212)
* Document behaviors for model artifacts Signed-off-by: Jing Chen <[email protected]> * Incorporate link in user guide Signed-off-by: Jing Chen <[email protected]> * Fix numbering Signed-off-by: Jing Chen <[email protected]> --------- Signed-off-by: Jing Chen <[email protected]>
1 parent 9f3abad commit c4a1e73

File tree

2 files changed

+105
-2
lines changed

2 files changed

+105
-2
lines changed

docs/userguide.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ Understand how `ModelService` fits into the Kubernetes ecosystem, what resources
1313
1. **[Model Name](userguide/model-name.md)**
1414
How inference clients refer to your model using OpenAI-compatible APIs.
1515

16-
<!-- 2. **[Model Artifacts](userguide/model-artifacts.md)** -->
17-
2. **Model Artifacts**
16+
2. **[Model Artifacts](userguide/model-artifacts.md)**
1817
Load models from Hugging Face, PVCs, or OCI images and mount them into serving containers.
1918

2019
<!-- 3. **[Templating Reference](userguide/templating-reference.md)** -->

docs/userguide/model-artifacts.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Model Artifacts
2+
3+
The `modelArtifacts` section under the `spec` of a `ModelService` defines how model files, such as weights and metadata configurations, are retrieved and loaded into inference backends like vLLM. This abstraction simplifies the process by allowing users to specify the model source without needing to configure low-level details like environment variables, volumes, or volume mounts.
4+
5+
## Purpose
6+
7+
Without `ModelService`, users must manually configure vLLM arguments, environment variables, and pod/container specifications. This requires a deep understanding of both vLLM and the composition of model artifacts. The `ModelService` controller automates these configurations, enabling users to focus solely on specifying the model source.
8+
9+
## Model Artifact Sources and Behaviors
10+
11+
The `modelArtifacts.uri` field determines the source of the model artifacts. Each supported prefix results in specific behaviors in the prefill and decode deployments. The following sources are supported:
12+
13+
### 1. Downloading a Model Directly from Hugging Face
14+
15+
If the `uri` begins with the `hf://` prefix, the model is downloaded directly from Hugging Face into an `emptyDir` volume.
16+
17+
#### URI Format
18+
19+
The repo and model ID must match exactly to the IDs found on the Hugging Face model registry, as required by vLLM.
20+
21+
`hf://<repo-id>/<model-id>`
22+
23+
Example: `hf://facebook/opt-125m`
24+
25+
#### Additional Fields
26+
27+
- **`authSecretName`**: Specifies the Kubernetes Secret containing the `HF_TOKEN` for gated models.
28+
- **`size`**: Defines the size of the `emptyDir` volume.
29+
30+
#### Behavior
31+
32+
- An `emptyDir` volume named `model-storage` is created.
33+
- Containers with `mountModelVolume: true` will have a `volumeMount` at `/model-cache`.
34+
- The `HF_HOME` environment variable is set to `/model-cache`.
35+
- If `authSecretName` is provided, the `HF_TOKEN` environment variable is created.
36+
37+
#### Example Deployment Snippet
38+
39+
```yaml
40+
volumes:
41+
- name: model-storage
42+
emptyDir: {}
43+
containers:
44+
- name: vllm
45+
env:
46+
- name: HF_HOME
47+
value: /model-cache
48+
- name: HF_TOKEN
49+
valueFrom:
50+
secretKeyRef:
51+
name: hf-secret
52+
key: HF_TOKEN
53+
volumeMounts:
54+
- mountPath: /model-cache
55+
name: model-storage
56+
```
57+
58+
#### Template variables
59+
60+
Various template variables are exposed as a result of using the `"hf://"` prefix, namely
61+
62+
- `{{ .HFModelName }}`: this is `<repo-id>/<model-id>` in the URI, which might be useful for vLLM arguments. Note that this is different from `{{ .ModelName }}`, which is the `spec.routing.modelName`, used for client requests
63+
- `{{ .MountedModelPath }}`: this is equal to `/model-cache`
64+
65+
### 2. Loading a model directly from a PVC
66+
67+
Downloading large models from Hugging Face can take a significant amount of time. If a PVC containing the model files is already pre-populated, then mounting this path and supplying that to vLLM can drastically shorten the engine's warm up time.
68+
69+
#### URI format
70+
71+
`"pvc://<pvc-name>/<path/to/model>"`
72+
73+
Example: `"pvc://granite-pvc/path/to/granite"`
74+
75+
#### Behavior
76+
77+
- A read-only PVC volume with the name `model-storage` is created for the deployment
78+
- A read-only `volumeMount` with the `mountPath: model-cache` is created for each container where `mountModelVolume: true`
79+
80+
81+
#### Example Deployment Snippet
82+
83+
```yaml
84+
volumes:
85+
- name: model-storage
86+
persistentVolumeClaim:
87+
claimName: granite-pvc
88+
readOnly: true
89+
containers:
90+
- name: vllm
91+
volumeMounts:
92+
- mountPath: /model-cache
93+
name: model-storage
94+
```
95+
96+
#### Template variables
97+
98+
Various template variable are exposed as a result of using the `"pvc://"` prefix, with `.MountedModelPath` being particularly useful if vLLM arguments require it.
99+
100+
- `{{ .MountedModelPath }}`: this is equal to `/model-cache/<path/to/model>` where `</path/to/model>` comes from the URI. In the above example, `{{ .MountedModelPath }}` interpolates to `/model-cache/path/to/granite`
101+
102+
### 3. Loading the model from an image volume
103+
104+
NotImplemented.

0 commit comments

Comments
 (0)