Skip to content

Commit 44ee8ac

Browse files
saturley-hallbiswapandaFortunaZhang
authored
feat: GKE examples (#2721) (#3839)
Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: FortunaZhang <[email protected]> Signed-off-by: Harrison King Saturley-Hall <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: FortunaZhang <[email protected]>
1 parent f938d03 commit 44ee8ac

File tree

3 files changed

+311
-0
lines changed

3 files changed

+311
-0
lines changed

examples/deployments/GKE/README.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Dynamo Deployment on GKE
2+
3+
## Pre-requisites
4+
5+
### Install gcloud CLI
6+
https://cloud.google.com/sdk/docs/install
7+
8+
### Create GKE cluster
9+
10+
```bash
11+
export PROJECT_ID=<>
12+
export REGION=<>
13+
export ZONE=<>
14+
export CLUSTER_NAME=<>
15+
export CLUSTER_MACHINE_TYPE=n2-standard-4
16+
export NODE_POOL_MACHINE_TYPE=g2-standard-24
17+
export GPU_TYPE=nvidia-l4
18+
export GPU_COUNT=2
19+
export CPU_NODE=2
20+
export GPU_NODE=2
21+
export DISK_SIZE=200
22+
23+
gcloud container clusters create ${CLUSTER_NAME} \
24+
--project=${PROJECT_ID} \
25+
--location=${ZONE} \
26+
--subnetwork=default \
27+
--disk-size=${DISK_SIZE} \
28+
--machine-type=${CLUSTER_MACHINE_TYPE} \
29+
--num-nodes=${CPU_NODE}
30+
```
31+
32+
#### Create GPU pool
33+
34+
```bash
35+
gcloud container node-pools create gpu-pool \
36+
--accelerator type=${GPU_TYPE},count=${GPU_COUNT},gpu-driver-version=latest \
37+
--project=${PROJECT_ID} \
38+
--location=${ZONE} \
39+
--cluster=${CLUSTER_NAME} \
40+
--machine-type=${NODE_POOL_MACHINE_TYPE} \
41+
--disk-size=${DISK_SIZE} \
42+
--num-nodes=${GPU_NODE} \
43+
--enable-autoscaling \
44+
--min-nodes=1 \
45+
--max-nodes=3
46+
```
47+
48+
### Clone Dynamo GitHub repository
49+
50+
**Note:** Please make sure GitHub branch/commit version matches with Dynamo platform and VLLM container.
51+
52+
```bash
53+
git clone https://github.com/ai-dynamo/dynamo.git
54+
55+
# Checkout to the desired branch
56+
git checkout release/0.6.0
57+
```
58+
59+
### Set environment variables for GKE
60+
61+
```bash
62+
export NAMESPACE=dynamo-system
63+
kubectl create namespace $NAMESPACE
64+
kubectl config set-context --current --namespace=$NAMESPACE
65+
66+
export HF_TOKEN=<HF_TOKEN>
67+
kubectl create secret generic hf-token-secret \
68+
--from-literal=HF_TOKEN=${HF_TOKEN} \
69+
-n ${NAMESPACE}
70+
```
71+
72+
## Install Dynamo Kubernetes Platform
73+
74+
[See installation steps](/docs/kubernetes/installation_guide.md#overview)
75+
76+
After installation, verify the installation:
77+
78+
**Expected output**
79+
80+
```bash
81+
kubectl get pods
82+
NAME READY STATUS RESTARTS AGE
83+
dynamo-platform-dynamo-operator-controller-manager-69b9794fpgv9 2/2 Running 0 4m27s
84+
dynamo-platform-etcd-0 1/1 Running 0 4m27s
85+
dynamo-platform-nats-0 2/2 Running 0 4m27s
86+
```
87+
88+
## Deploy Inference Graph
89+
90+
We will deploy a LLM model to the Dynamo platform. Here we use `Qwen/Qwen3-0.6B` model with VLLM and disaggregated deployment as an example.
91+
92+
In the deployment yaml file, some adjustments have to/ could be made:
93+
94+
- **(Required)** Add args to change `LD_LIBRARY_PATH` and `PATH` of decoder container, to enable GKE find the correct GPU driver
95+
- Change VLLM image to the desired one on NGC
96+
- Add namespace to metadata
97+
- Adjust GPU/CPU request and limits
98+
- Change model to deploy
99+
100+
More configurations please refer to https://github.com/ai-dynamo/dynamo/tree/main/examples/deployments/GKE/vllm
101+
102+
### Highlighted configurations in yaml file
103+
Please note that `LD_LIBRARY_PATH` needs to be set properly in GKE as per [Run GPUs in GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus)
104+
105+
The following snippet needs to be present in the `args` field of the deployment `yaml` file:
106+
107+
```bash
108+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
109+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
110+
/sbin/ldconfig
111+
```
112+
113+
For example, refer to the following from [`examples/deployments/GKE/vllm/disagg_gke.yaml`](./vllm/disagg_gke.yaml)
114+
115+
```yaml
116+
metadata:
117+
name: vllm-disagg
118+
namespace: dynamo-system
119+
spec:
120+
services:
121+
Frontend:
122+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
123+
VllmDecodeWorker:
124+
​​ resources:
125+
limits:
126+
gpu: "3"
127+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
128+
args:
129+
- |
130+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
131+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
132+
/sbin/ldconfig
133+
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
134+
```
135+
136+
## Deploy the model
137+
138+
```bash
139+
cd dynamo/examples/deployments/GKE/vllm
140+
141+
kubectl apply -f disagg_gke.yaml -n ${NAMESPACE}
142+
```
143+
144+
**Expected output after successful deployment**
145+
146+
```bash
147+
kubectl get pods
148+
NAME READY STATUS RESTARTS AGE
149+
dynamo-platform-dynamo-operator-controller-manager-c665684ssqkx 2/2 Running 0 65m
150+
dynamo-platform-etcd-0 1/1 Running 0 65m
151+
dynamo-platform-nats-0 2/2 Running 0 65m
152+
vllm-disagg-frontend-5954ddc4dd-4w2cb 1/1 Running 0 11m
153+
vllm-disagg-vllmdecodeworker-77844cfcff-ddn4v 1/1 Running 0 11m
154+
vllm-disagg-vllmprefillworker-55d5b74b4f-zrskh 1/1 Running 0 11m
155+
```
156+
157+
## Test the Deployment
158+
159+
```bash
160+
export DEPLOYMENT_NAME=vllm-disagg
161+
162+
# Find the frontend pod
163+
export FRONTEND_POD=$(kubectl get pods -n ${NAMESPACE} | grep "${DEPLOYMENT_NAME}-frontend" | sort -k1 | tail -n1 | awk '{print $1}')
164+
165+
# Forward the pod's port to localhost
166+
kubectl port-forward deployment/vllm-disagg-frontend 8000:8000 -n ${NAMESPACE}
167+
168+
# disagg
169+
curl localhost:8000/v1/chat/completions \
170+
-H "Content-Type: application/json" \
171+
-d '{
172+
"model": "Qwen/Qwen3-0.6B",
173+
"messages": [
174+
{
175+
"role": "user",
176+
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
177+
}
178+
],
179+
"stream":false,
180+
"max_tokens": 30
181+
}'
182+
```
183+
184+
### Response
185+
186+
```json
187+
{"id":"chatcmpl-bd0670d9-0342-4eea-97c1-99b69f1f931f","choices":[{"index":0,"message":{"content":"Okay, here’s a detailed character background for your intrepid explorer, tailored to fit the premise of Aeloria, with a focus on a","refusal":null,"tool_calls":null,"role":"assistant","function_call":null,"audio":null},"finish_reason":"stop","logprobs":null}],"created":1756336263,"model":"Qwen/Qwen3-0.6B","service_tier":null,"system_fingerprint":null,"object":"chat.completion","usage":{"prompt_tokens":190,"completion_tokens":29,"total_tokens":219,"prompt_tokens_details":null,"completion_tokens_details":null}}
188+
```
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
apiVersion: nvidia.com/v1alpha1
4+
kind: DynamoGraphDeployment
5+
metadata:
6+
name: sglang-disagg
7+
spec:
8+
services:
9+
Frontend:
10+
dynamoNamespace: sglang-disagg
11+
componentType: frontend
12+
replicas: 1
13+
extraPodSpec:
14+
mainContainer:
15+
image: my-registry/sglang-runtime:my-tag
16+
decode:
17+
envFromSecret: hf-token-secret
18+
dynamoNamespace: sglang-disagg
19+
componentType: worker
20+
subComponentType: decode
21+
replicas: 1
22+
resources:
23+
limits:
24+
gpu: "1"
25+
extraPodSpec:
26+
mainContainer:
27+
image: my-registry/sglang-runtime:my-tag
28+
workingDir: /workspace/components/backends/sglang
29+
command:
30+
- /bin/sh
31+
- -c
32+
args:
33+
- |
34+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
35+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
36+
/sbin/ldconfig
37+
nvidia-smi
38+
exec python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --page-size 16 --tp 1 --trust-remote-code --skip-tokenizer-init --disaggregation-mode decode --disaggregation-transfer-backend nixl --disaggregation-bootstrap-port --disaggregation-bootstrap-port "12345" --host "0.0.0.0"
39+
prefill:
40+
envFromSecret: hf-token-secret
41+
dynamoNamespace: sglang-disagg
42+
componentType: worker
43+
subComponentType: prefill
44+
replicas: 1
45+
resources:
46+
limits:
47+
gpu: "1"
48+
extraPodSpec:
49+
mainContainer:
50+
image: my-registry/sglang-runtime:my-tag
51+
workingDir: /workspace/components/backends/sglang
52+
command:
53+
- /bin/sh
54+
- -c
55+
args:
56+
- |
57+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
58+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
59+
/sbin/ldconfig
60+
nvidia-smi
61+
exec python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --page-size 16 --tp 1 --trust-remote-code --skip-tokenizer-init --disaggregation-mode prefill --disaggregation-transfer-backend nixl --disaggregation-bootstrap-port "12345" --host "0.0.0.0"
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
apiVersion: nvidia.com/v1alpha1
5+
kind: DynamoGraphDeployment
6+
metadata:
7+
name: vllm-disagg
8+
spec:
9+
services:
10+
Frontend:
11+
dynamoNamespace: vllm-disagg
12+
componentType: frontend
13+
replicas: 1
14+
extraPodSpec:
15+
mainContainer:
16+
image: my-registry/vllm-runtime:my-tag
17+
VllmDecodeWorker:
18+
dynamoNamespace: vllm-disagg
19+
envFromSecret: hf-token-secret
20+
componentType: worker
21+
subComponentType: decode
22+
replicas: 1
23+
resources:
24+
limits:
25+
gpu: "1"
26+
extraPodSpec:
27+
mainContainer:
28+
startupProbe:
29+
initialDelaySeconds: 180
30+
image: my-registry/vllm-runtime:my-tag
31+
workingDir: /workspace/components/backends/vllm
32+
command:
33+
- /bin/sh
34+
- -c
35+
args:
36+
- |
37+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
38+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
39+
/sbin/ldconfig
40+
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
41+
VllmPrefillWorker:
42+
dynamoNamespace: vllm-disagg
43+
envFromSecret: hf-token-secret
44+
componentType: worker
45+
subComponentType: prefill
46+
replicas: 1
47+
resources:
48+
limits:
49+
gpu: "1"
50+
extraPodSpec:
51+
mainContainer:
52+
image: my-registry/vllm-runtime:my-tag
53+
workingDir: /workspace/components/backends/vllm
54+
command:
55+
- /bin/sh
56+
- -c
57+
args:
58+
- |
59+
export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
60+
export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib64
61+
/sbin/ldconfig
62+
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --is-prefill-worker

0 commit comments

Comments
 (0)