[R3] Enable R3 with new inference#1428
Conversation
…nference codepath Adds a custom `/skyrl/v1/generate` endpoint to `VLLMServerActor` that calls the vLLM engine directly and returns `routed_experts` alongside token output. The standard `/inference/v1/generate` endpoint's `GenerateResponseChoice` does not include `routed_experts` (only available on the Python `CompletionOutput` object), so a custom endpoint is required. Changes: - `vllm_server_actor.py`: Add `/skyrl/v1/generate` endpoint with correct logprobs serialisation (placeholder `-9999.0` for missing entries, matching vLLM's `ChatCompletionLogProb` default) and `routed_experts` extraction. Raises `NotImplementedError` if LoRA is enabled. - `remote_inference_client.py`: Switch `_generate_single` to `/skyrl/v1/generate`; extract and propagate `routed_experts` through to `InferenceEngineOutput.rollout_expert_indices`. - `inference_servers/utils.py`: Pass `enable_return_routed_experts` to vLLM CLI args so the engine computes routed experts. - `train/utils/utils.py`: Gate the `mp` backend assertion for R3 behind `if not _SKYRL_USE_NEW_INFERENCE` (new path uses ray backend); remove the `ValueError` blocking R3 on the new inference path; add startup validation that LoRA + R3 cannot be combined on the new path. - `main_base.py`, `tests/gpu/utils.py`: Pass `enable_return_routed_experts` when constructing `RemoteInferenceClient`. - `test_remote_inference_client.py`: Update mock endpoint to `/skyrl/v1/generate` returning a single choice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pported R3 requires the mp backend to avoid hangs, but mp is not yet supported on the new inference path (tracked in NovaSky-AI#1309). Restore the ValueError blocking R3 on new inference, and un-gate the mp assertion so it applies to both old and new inference paths consistently. The infrastructure changes (/skyrl/v1/generate endpoint, RemoteInferenceClient propagation) remain as pre-work for when mp support lands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py
|
For reference: We ran the Moonlight-16B script with the old and the new inference and got matching curves: https://api.wandb.ai/links/sky-posttraining-uc-berkeley/lwaaqy73
|
SumanthRH
left a comment
There was a problem hiding this comment.
Let's address the issue with the sample API
tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py
Show resolved
Hide resolved
…ence_client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
…client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| # needed for megatron tests | ||
| env_vars["CUDA_DEVICE_MAX_CONNECTIONS"] = "1" | ||
| env_vars["NVTE_FUSED_ATTN"] = "0" | ||
| env_vars["NVTE_FUSED_ATTN"] = "1" |
There was a problem hiding this comment.
SkyRL/skyrl/train/utils/utils.py
Line 582 in 7ba2490
There was a problem hiding this comment.
it was needed to run the r3 tests but i can revert it
There was a problem hiding this comment.
Could we move this to scripts? It will be lost here
There was a problem hiding this comment.
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
SumanthRH
left a comment
There was a problem hiding this comment.
Can you add test_router_replay.py to GPU CI with _SKYRL_USE_NEW_INFERENCE=1 ? The primary integration test is currently skipped but it will be good to have it in the CI script. @erictang000 is working on re-enabling this test so we will have coverage as soon as his changes land.
Add it here:
SkyRL/ci/gpu_ci_run_skyrl_train.sh
Line 39 in 7ba2490
|
Can you fix lint @hao-aaron |


Uh oh!
There was an error while loading. Please reload this page.