[Feature]: Add passthrough arguments for vllm bench sweep

### 🚀 The feature, motivation and pitch


I'm trying to re-create the InferenceMAX benchmark suite: https://inferencemax.semianalysis.com/
They practically use extracted vLLM bench code to test multiple frameworks (vLLM, sglang, trt-llm), and create pareto-front curves of e.g. throughput vs latency.

I'm only interested in vLLM so I can just use vllm bench sweep directly, however I encounter one problem: I don't directly see an option for passthrough parameters (i.e. Parameters that are linked between the serve and bench serve commands)
E.g.: `--max-num-seqs` and `--max-concurrency` are fundamentally the same variable (for me), but one lives in the serve parameter space and the other in the bench serve parameter space.
Due to the cartesian product being taken between the two configs, I end up with a lot of runs that are nonsensical.

The only  option I have at the moment is do run an external loop over bench sweep and setting these variables to the same fixed value, and then merging the summary.json's myself at the end, which is not so nice.

I'm not entirely sure how one could go about implementing such a feature, clearly the parameters should become parametric in some way; perhaps defining another config with fixed values and referencing those values in the other configs could do the trick.
However as far as I know there is no default way of making json values parametric in python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add passthrough arguments for vllm bench sweep #28548

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add passthrough arguments for vllm bench sweep #28548

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions