Skip to content

Commit a22cf24

Browse files
hhzhang16tedzhouhk
andauthored
fix: bug fixes for planner tests (#3821) (#3835)
Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: hongkuanz <[email protected]> Co-authored-by: hongkuanz <[email protected]>
1 parent 44ee8ac commit a22cf24

File tree

5 files changed

+72
-121
lines changed

5 files changed

+72
-121
lines changed

components/src/dynamo/planner/utils/prometheus.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,11 @@ def _get_average_metric(
6363
Average metric value or 0 if no data/error
6464
"""
6565
try:
66+
# Prepend the frontend metric prefix if not already present
67+
if not full_metric_name.startswith(prometheus_names.name_prefix.FRONTEND):
68+
full_metric_name = (
69+
f"{prometheus_names.name_prefix.FRONTEND}_{full_metric_name}"
70+
)
6671
query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
6772
result = self.prom.custom_query(query=query)
6873
if not result:
@@ -75,8 +80,10 @@ def _get_average_metric(
7580

7681
values = []
7782
for container in metrics_containers:
83+
# Frontend lowercases model names for Prometheus labels so we need to do case-insensitive comparison
7884
if (
79-
container.metric.model == model_name
85+
container.metric.model
86+
and container.metric.model.lower() == model_name.lower()
8087
and container.metric.dynamo_namespace == self.dynamo_namespace
8188
):
8289
values.append(container.value[1])
@@ -120,14 +127,23 @@ def get_avg_request_count(self, interval: str, model_name: str):
120127
# This function follows a different query pattern than the other metrics
121128
try:
122129
requests_total_metric = prometheus_names.frontend_service.REQUESTS_TOTAL
130+
# Prepend the frontend metric prefix if not already present
131+
if not requests_total_metric.startswith(
132+
prometheus_names.name_prefix.FRONTEND
133+
):
134+
requests_total_metric = (
135+
f"{prometheus_names.name_prefix.FRONTEND}_{requests_total_metric}"
136+
)
123137
raw_res = self.prom.custom_query(
124138
query=f"increase({requests_total_metric}[{interval}])"
125139
)
126140
metrics_containers = parse_frontend_metric_containers(raw_res)
127141
total_count = 0.0
128142
for container in metrics_containers:
143+
# Frontend lowercases model names for Prometheus labels so we need to do case-insensitive comparison
129144
if (
130-
container.metric.model == model_name
145+
container.metric.model
146+
and container.metric.model.lower() == model_name.lower()
131147
and container.metric.dynamo_namespace == self.dynamo_namespace
132148
):
133149
total_count += container.value[1]

docs/planner/sla_planner_quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ flowchart TD
3838

3939
Before deploying the SLA planner, ensure:
4040
- **Dynamo platform installed** (see [Installation Guide](/docs/kubernetes/installation_guide.md))
41-
- **[kube-prometheus-stack](/docs/kubernetes/metrics.md) installed and running.** By default, the prometheus server is not deployed in the `monitoring` namespace. If it is deployed to a different namespace, set `dynamo-operator.dynamo.metrics.prometheusEndpoint="http://prometheus-kube-prometheus-prometheus.<namespace>.svc.cluster.local:9090"`.
41+
- **[kube-prometheus-stack](/docs/kubernetes/metrics.md) installed and running.** By default, the prometheus server is deployed in the `monitoring` namespace. If it is deployed to a different namespace, set `dynamo-operator.dynamo.metrics.prometheusEndpoint="http://prometheus-kube-prometheus-prometheus.<namespace>.svc.cluster.local:9090"`.
4242
- **Benchmarking resources setup** (see [Kubernetes utilities for Dynamo Benchmarking and Profiling](../../deploy/utils/README.md)) The script will create a `dynamo-pvc` with `ReadWriteMany` access, if your cluster's default storageClassName does not allow `ReadWriteMany`, you need to specify a different storageClassName in `deploy/utils/manifests/pvc.yaml` which does support `ReadWriteMany`.
4343

4444

tests/planner/README.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -160,20 +160,64 @@ PYTHONPATH=../../components/src python -m pytest test_replica_calculation.py -v
160160
**Note**: The unit tests automatically mock external dependencies (prometheus_client, runtime modules) to ensure they can run in isolation without requiring the full Dynamo environment.
161161
162162
#### Run Full End-to-End Test
163-
Test complete scaling behavior including Kubernetes deployment and load generation:
163+
164+
Test complete scaling behavior including Kubernetes deployment and load generation.
165+
166+
**Prerequisites:**
167+
168+
- **[kube-prometheus-stack](../../docs/kubernetes/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
169+
- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/planner/sla_planner_quickstart.md#prerequisites) for details).
170+
171+
**Prepare the test deployment manifest:**
172+
173+
The test requires modifying `components/backends/vllm/deploy/disagg_planner.yaml` with test-specific planner arguments:
174+
175+
1. Copy the base deployment:
164176
165177
```bash
166-
./scaling/run_scaling_test.sh
178+
cp components/backends/vllm/deploy/disagg_planner.yaml tests/planner/scaling/disagg_planner.yaml
167179
```
168180
169-
With custom namespace:
181+
2. Edit `tests/planner/scaling/disagg_planner.yaml`. Ensure all services use the correct image. Modify the Planner service args:
182+
183+
```yaml
184+
spec:
185+
services:
186+
Planner:
187+
extraPodSpec:
188+
mainContainer:
189+
args:
190+
- --environment=kubernetes
191+
- --backend=vllm
192+
- --adjustment-interval=60
193+
- --profile-results-dir=/workspace/tests/planner/profiling_results/H200_TP1P_TP1D
194+
- --ttft=100
195+
- --itl=10
196+
- --load-predictor=constant
197+
- --no-correction
198+
```
199+
200+
3. Update the model in VllmPrefillWorker and VllmDecodeWorker services:
201+
202+
```yaml
203+
args:
204+
- -m
205+
- dynamo.vllm
206+
- --model
207+
- nvidia/Llama-3.1-8B-Instruct-FP8
208+
- --migration-limit=3
209+
- --max-model-len=8192
210+
```
211+
212+
**Run the test:**
213+
170214
```bash
171215
./scaling/run_scaling_test.sh --namespace <namespace>
172216
```
173217
174218
To save results to `tests/planner/e2e_scaling_results` instead of `/tmp`:
175219
```bash
176-
./scaling/run_scaling_test.sh --save-results
220+
./scaling/run_scaling_test.sh --namespace <namespace> --save-results
177221
```
178222
179223
**E2E Test Deployment Management:**

tests/planner/scaling/disagg_planner.yaml

Lines changed: 0 additions & 111 deletions
This file was deleted.

tests/planner/test_scaling_e2e.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,15 @@ def get_pod_counts(self) -> Optional[PodCounts]:
9797
for pod in data.get("items", []):
9898
pod_phase = pod.get("status", {}).get("phase", "")
9999
pod_labels = pod.get("metadata", {}).get("labels", {})
100-
component = pod_labels.get("nvidia.com/dynamo-component", "")
100+
sub_component = pod_labels.get(
101+
"nvidia.com/dynamo-sub-component-type", ""
102+
)
101103

102104
# Only count Running pods
103105
if pod_phase == "Running":
104-
if component == "VllmPrefillWorker":
106+
if sub_component == "prefill":
105107
prefill_pods += 1
106-
elif component == "VllmDecodeWorker":
108+
elif sub_component == "decode":
107109
decode_pods += 1
108110
else:
109111
continue

0 commit comments

Comments
 (0)