-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
🚀 The feature, motivation and pitch
I'm trying to re-create the InferenceMAX benchmark suite: https://inferencemax.semianalysis.com/
They practically use extracted vLLM bench code to test multiple frameworks (vLLM, sglang, trt-llm), and create pareto-front curves of e.g. throughput vs latency.
I'm only interested in vLLM so I can just use vllm bench sweep directly, however I encounter one problem: I don't directly see an option for passthrough parameters (i.e. Parameters that are linked between the serve and bench serve commands)
E.g.: --max-num-seqs and --max-concurrency are fundamentally the same variable (for me), but one lives in the serve parameter space and the other in the bench serve parameter space.
Due to the cartesian product being taken between the two configs, I end up with a lot of runs that are nonsensical.
The only option I have at the moment is do run an external loop over bench sweep and setting these variables to the same fixed value, and then merging the summary.json's myself at the end, which is not so nice.
I'm not entirely sure how one could go about implementing such a feature, clearly the parameters should become parametric in some way; perhaps defining another config with fixed values and referencing those values in the other configs could do the trick.
However as far as I know there is no default way of making json values parametric in python.