Skip to content

[Feature]: Add passthrough arguments for vllm bench sweep #28548

@robogast

Description

@robogast

🚀 The feature, motivation and pitch

I'm trying to re-create the InferenceMAX benchmark suite: https://inferencemax.semianalysis.com/
They practically use extracted vLLM bench code to test multiple frameworks (vLLM, sglang, trt-llm), and create pareto-front curves of e.g. throughput vs latency.

I'm only interested in vLLM so I can just use vllm bench sweep directly, however I encounter one problem: I don't directly see an option for passthrough parameters (i.e. Parameters that are linked between the serve and bench serve commands)
E.g.: --max-num-seqs and --max-concurrency are fundamentally the same variable (for me), but one lives in the serve parameter space and the other in the bench serve parameter space.
Due to the cartesian product being taken between the two configs, I end up with a lot of runs that are nonsensical.

The only option I have at the moment is do run an external loop over bench sweep and setting these variables to the same fixed value, and then merging the summary.json's myself at the end, which is not so nice.

I'm not entirely sure how one could go about implementing such a feature, clearly the parameters should become parametric in some way; perhaps defining another config with fixed values and referencing those values in the other configs could do the trick.
However as far as I know there is no default way of making json values parametric in python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions