Replies: 4 comments
-
|
Example router from dynamo: https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/components/kv_router.py |
Beta Was this translation helpful? Give feedback.
-
|
There's this repo - https://github.com/VectorInstitute/vector-inference
Note also: Note, slurm autoscale, with an option to backfill slurm capacity is the next obvious feature. |
Beta Was this translation helpful? Give feedback.
-
|
@terrykong this is the updated standalone Router : https://github.com/ai-dynamo/dynamo/tree/main/examples/deployments/router_standalone |
Beta Was this translation helpful? Give feedback.
-
|
thanks for the pointer @euronymous-aithal . converting this discussion to an issue. please continue discussion on 1210 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We need a router sidecar that queries the telemetry from vllm and determine which vLLM instance has the least load and send more prompts there.
Beta Was this translation helpful? Give feedback.
All reactions