Unify `vllm` and `api-endpoint` providers into a single provider for openai compatible APIs #81

BenjaminBruenau · 2025-10-31T23:10:03Z

Pull Request

Description

Currently http requests to the vllm server are sent out sequentially instead of in batches.
See #67 and #68 for reference.

I initially encountered this during the synthetic data & agents hackathon and solved this with a threadpool via concurrent.futures.
But since we are not actually doing any work (on the synthetic-data-kit side) in these threads, the solution from @HarshVaragiya using a more lightweight asyncio approach is much better - since we are essentially just spawning http requests and are "waiting in parallel".

Benchmarks

OpenAI Batching (Provider=api-endpoint) OLD:
Single request latency (overall): count=10 min=18.235s median=35.391s mean=35.944s max=55.202s
Single request latency (short prompt): count=5 min=18.235s median=25.101s mean=24.447s max=29.872s
Single request latency (long prompt): count=5 min=40.909s median=47.113s mean=47.440s max=55.202s
Batch latency (overall, batch_size=8): count=10 min=3.706s median=5.348s mean=5.503s max=7.385s
Batch latency (short-only, batch_size=8): count=5 min=3.706s median=4.149s mean=4.667s max=6.908s
Batch latency (1 long prompt(s), batch_size=8): count=5 min=4.671s median=6.634s mean=6.339s max=7.385s
-> 13m 40s

VLLM "Batching" (Provider=vllm) OLD:
Single request latency (overall): count=10 min=15.975s median=30.439s mean=31.472s max=50.811s
Single request latency (short prompt): count=5 min=15.975s median=18.694s mean=20.331s max=31.216s
Single request latency (long prompt): count=5 min=29.662s median=49.092s mean=42.613s max=50.811s
Batch latency (overall, batch_size=8): count=10 min=18.593s median=22.035s mean=21.966s max=25.405s
Batch latency (short-only, batch_size=8): count=5 min=18.593s median=21.603s mean=20.949s max=23.406s
Batch latency (1 long prompt(s), batch_size=8): count=5 min=21.281s median=22.955s mean=22.984s max=25.405s
-> 34 min 50s
-> essentially as slow as processing sequentially (because it is)

VLLM Batching (Provider=vllm) NEW (HarshVaragiya) #68:
Single request latency (overall): count=10 min=13.990s median=31.602s mean=29.933s max=47.919s
Single request latency (short prompt): count=5 min=13.990s median=19.956s mean=21.075s max=31.802s
Single request latency (long prompt): count=5 min=31.403s median=38.369s mean=38.791s max=47.919s
Batch latency (overall, batch_size=8): count=10 min=3.025s median=4.228s mean=4.842s max=6.649s
Batch latency (short-only, batch_size=8): count=5 min=3.025s median=3.740s mean=3.635s max=4.367s
Batch latency (1 long prompt(s), batch_size=8): count=5 min=4.090s median=6.498s mean=6.050s max=6.649s
-> 11 min 56s

However since vllm serve essentially exposes an openai-compatible API it makes sense to actually merge the vllm
and api-endpoint provider into a single one. So logic for this can be centralized and the synthetic-data-kit can be easily extended with new providers.

Fixes # (issue)

#67

Type of change

Please non-releavant options

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (kinda breaking since the config is updated, but it is made backwards compatible with api-endpoint and vllm providers for now)
Documentation update

…t as configurable parameter

…provider - Fix VLLM provider processing batches sequentially (based on Harsh Varagiyas fix for the sequential vllm issue) - single unified provider for accessing local inference engines via their apis (i.e. vllm serve or llama-cpp's llama-server) - add backwards compatibility with old configs using api-endpoint and vllm - removes event loop is closed warnings when processing batch requests - add error handling for invalid lance dataset path

HarshVaragiya and others added 4 commits September 5, 2025 21:31

fixed batch size using aiohttp async requests. also added http timeou…

0a0978a

…t as configurable parameter

Merge branch 'main' into unify-llm-providers

dee1036

Update Documentation

907e44e

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2025

Remove obsolete config vars

4b285f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify `vllm` and `api-endpoint` providers into a single provider for openai compatible APIs #81

Unify `vllm` and `api-endpoint` providers into a single provider for openai compatible APIs #81

Uh oh!

BenjaminBruenau commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unify vllm and api-endpoint providers into a single provider for openai compatible APIs #81

Are you sure you want to change the base?

Unify vllm and api-endpoint providers into a single provider for openai compatible APIs #81

Uh oh!

Conversation

BenjaminBruenau commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unify `vllm` and `api-endpoint` providers into a single provider for openai compatible APIs #81

Unify `vllm` and `api-endpoint` providers into a single provider for openai compatible APIs #81

BenjaminBruenau commented Oct 31, 2025 •

edited

Loading