Skip to content

Conversation

@wangshangsam
Copy link
Contributor

@wangshangsam wangshangsam commented Dec 10, 2025

This PR fixes two issues:

  1. When using the DefaultAioHttpClient in AsyncOpenAI, it turns out that DefaultAioHttpClient's HTTP request timeout setting would override the AsyncOpenAI's chat completion request timeout setting, thereby, I also need to set the timeout setting in DefaultAioHttpClient as well.
  2. Previously, for VllmDeployer, it would still wait 20 mins (or whatever the timeout is) to fail. Now it can fail fast as soon as the underying vllm serve process fail.

In addition, this PR also:
3. Introduce example slurm scripts that demonstrates how to run the benchmark in a GPU cluster managed by slurm.
4. Change the default server_expected_qps and server_target_latency to the value that we are proposing:

  • server_expected_qps = 5
  • server_target_latency = timedelta(seconds=12)

@wangshangsam wangshangsam requested a review from a team as a code owner December 10, 2025 06:21
@github-actions
Copy link
Contributor

github-actions bot commented Dec 10, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@johncalesp johncalesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Copy link

@Victor49152 Victor49152 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified for a few runs and works good

@wangshangsam
Copy link
Contributor Author

This PR is ready. @mrmhodak @hanyunfan Could you help to approve and merge this PR?

@wangshangsam
Copy link
Contributor Author

@nvzhihanj do you wanna take another look? If it LGTY, we can merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants