[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying `vllm serve` process already failed #2409

wangshangsam · 2025-12-10T06:21:50Z

This PR fixes two issues:

When using the DefaultAioHttpClient in AsyncOpenAI, it turns out that DefaultAioHttpClient's HTTP request timeout setting would override the AsyncOpenAI's chat completion request timeout setting, thereby, I also need to set the timeout setting in DefaultAioHttpClient as well.
Previously, for VllmDeployer, it would still wait 20 mins (or whatever the timeout is) to fail. Now it can fail fast as soon as the underying vllm serve process fail.

In addition, this PR also:
3. Introduce example slurm scripts that demonstrates how to run the benchmark in a GPU cluster managed by slurm.
4. Change the default server_expected_qps and server_target_latency to the value that we are proposing:

server_expected_qps = 5
server_target_latency = timedelta(seconds=12)

github-actions · 2025-12-10T06:21:58Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

johncalesp

Looks good, thank you!

multimodal/qwen3-vl/src/mlperf_inf_mm_q3vl/task.py

Victor49152

Verified for a few runs and works good

…rf-inference into wangshangsam/fix-req-timeout

… need to download the model for the fisrt time.

wangshangsam · 2025-12-11T20:25:16Z

This PR is ready. @mrmhodak @hanyunfan Could you help to approve and merge this PR?

…rf-inference into wangshangsam/fix-req-timeout

…condhand f1 score

multimodal/qwen3-vl/README.md

…rf-inference into wangshangsam/fix-req-timeout

multimodal/vl2l/README.md

…rf-inference into wangshangsam/fix-req-timeout

wangshangsam · 2025-12-19T09:09:31Z

@nvzhihanj do you wanna take another look? If it LGTY, we can merge it.

enable VllmDeployer to fail fast if the underying vllm process failed.

5b45943

wangshangsam requested a review from a team as a code owner December 10, 2025 06:21

johncalesp approved these changes Dec 10, 2025

View reviewed changes

johncalesp reviewed Dec 10, 2025

View reviewed changes

multimodal/qwen3-vl/src/mlperf_inf_mm_q3vl/task.py Show resolved Hide resolved

example slurm script for submitting jobs

bad5387

Victor49152 approved these changes Dec 11, 2025

View reviewed changes

wangshangsam and others added 7 commits December 10, 2025 21:33

fix slurm scripts

08b32cc

small fix

1cdf563

[Automated Commit] Format Codebase

d9caddc

Update the readme about the example slurm scripts.

6f62339

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

88b34a4

…rf-inference into wangshangsam/fix-req-timeout

Change the default endpoint startup timeout to 1 hour in case someone…

59dc167

… need to download the model for the fisrt time.

change servr expected qps and target latency

d9c0bcc

wangshangsam and others added 12 commits December 12, 2025 14:07

Change the default dataset repo_id to the new name of the public dataset

a75dc68

[Automated Commit] Format Codebase

866eba9

evaluate the json file with multiprocess

a8a8870

[Automated Commit] Format Codebase

9f3b52e

change default server_target_latency to 12

0342909

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

7576e0c

…rf-inference into wangshangsam/fix-req-timeout

revert evaluation changeS

d10d634

[Automated Commit] Format Codebase

e75a34a

update slurm script

2209ae6

update slurm script

1450143

revert evaluation.py changes after analysing the discrepancy in is_se…

6a5f17d

…condhand f1 score

[Automated Commit] Format Codebase

d5d2cc8

nvzhihanj reviewed Dec 16, 2025

View reviewed changes

multimodal/qwen3-vl/README.md Show resolved Hide resolved

nvzhihanj reviewed Dec 16, 2025

View reviewed changes

multimodal/qwen3-vl/README.md Show resolved Hide resolved

nvzhihanj approved these changes Dec 16, 2025

View reviewed changes

nvzhihanj reviewed Dec 16, 2025

View reviewed changes

multimodal/qwen3-vl/README.md Show resolved Hide resolved

wangshangsam and others added 7 commits December 16, 2025 12:53

linting

f72d82d

[Automated Commit] Format Codebase

0e731ed

lock in model and dataset SHA

4771f13

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

55a8cf1

…rf-inference into wangshangsam/fix-req-timeout

[Automated Commit] Format Codebase

d4d6f78

Specify model quality target and server target latency in the README

c0d0925

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

e2adf60

…rf-inference into wangshangsam/fix-req-timeout

nvzhihanj reviewed Dec 17, 2025

View reviewed changes

multimodal/vl2l/README.md Outdated Show resolved Hide resolved

wangshangsam and others added 14 commits December 17, 2025 21:37

Update loadgen/mlperf.conf

7dabbfe

aligning TestSettings'C++ code with its python binding

423cea4

[Automated Commit] Format Codebase

817f0e8

remove ttft and tpot from mlperf.conf

9d3b36b

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

29e7c1a

…rf-inference into wangshangsam/fix-req-timeout

Enable CLI to take in user.conf

95f4179

[Automated Commit] Format Codebase

5370ecd

readme

f9d983f

Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…

897894d

…rf-inference into wangshangsam/fix-req-timeout

Merge branch 'master' into wangshangsam/fix-req-timeout

8f8e886

readme

f8e6bf8

rename vl2l -> q3vl

8bfbeb9

[Automated Commit] Format Codebase

b589ddd

empty

3b065ee

wangshangsam requested a review from nvzhihanj December 19, 2025 09:07

rerun ci

eb65590

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying `vllm serve` process already failed #2409

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying `vllm serve` process already failed #2409

Uh oh!

wangshangsam commented Dec 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

johncalesp left a comment

Uh oh!

Uh oh!

Victor49152 left a comment

Uh oh!

wangshangsam commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangshangsam commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying vllm serve process already failed #2409

Are you sure you want to change the base?

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying vllm serve process already failed #2409

Uh oh!

Conversation

wangshangsam commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johncalesp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Victor49152 left a comment

Choose a reason for hiding this comment

Uh oh!

wangshangsam commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangshangsam commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying `vllm serve` process already failed #2409

[VLM] Fixing request timeout error, and enable VllmDeployer to fail fast if the underying `vllm serve` process already failed #2409

wangshangsam commented Dec 10, 2025 •

edited

Loading

github-actions bot commented Dec 10, 2025 •

edited

Loading