[R3] Enable R3 with new inference by hao-aaron · Pull Request #1428 · NovaSky-AI/SkyRL

hao-aaron · 2026-04-02T20:58:26Z

…nference codepath Adds a custom `/skyrl/v1/generate` endpoint to `VLLMServerActor` that calls the vLLM engine directly and returns `routed_experts` alongside token output. The standard `/inference/v1/generate` endpoint's `GenerateResponseChoice` does not include `routed_experts` (only available on the Python `CompletionOutput` object), so a custom endpoint is required. Changes: - `vllm_server_actor.py`: Add `/skyrl/v1/generate` endpoint with correct logprobs serialisation (placeholder `-9999.0` for missing entries, matching vLLM's `ChatCompletionLogProb` default) and `routed_experts` extraction. Raises `NotImplementedError` if LoRA is enabled. - `remote_inference_client.py`: Switch `_generate_single` to `/skyrl/v1/generate`; extract and propagate `routed_experts` through to `InferenceEngineOutput.rollout_expert_indices`. - `inference_servers/utils.py`: Pass `enable_return_routed_experts` to vLLM CLI args so the engine computes routed experts. - `train/utils/utils.py`: Gate the `mp` backend assertion for R3 behind `if not _SKYRL_USE_NEW_INFERENCE` (new path uses ray backend); remove the `ValueError` blocking R3 on the new inference path; add startup validation that LoRA + R3 cannot be combined on the new path. - `main_base.py`, `tests/gpu/utils.py`: Pass `enable_return_routed_experts` when constructing `RemoteInferenceClient`. - `test_remote_inference_client.py`: Update mock endpoint to `/skyrl/v1/generate` returning a single choice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pported R3 requires the mp backend to avoid hangs, but mp is not yet supported on the new inference path (tracked in NovaSky-AI#1309). Restore the ValueError blocking R3 on new inference, and un-gate the mp assertion so it applies to both old and new inference paths consistently. The infrastructure changes (/skyrl/v1/generate endpoint, RemoteInferenceClient propagation) remain as pre-work for when mp support lands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH · 2026-04-03T23:21:39Z

For reference: We ran the Moonlight-16B script with the old and the new inference and got matching curves:

https://api.wandb.ai/links/sky-posttraining-uc-berkeley/lwaaqy73

SumanthRH

Let's address the issue with the sample API

tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

examples/train/router_replay/run_moonlight16b_router_replay.sh

…ence_client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

…client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH · 2026-04-07T23:50:51Z

tests/backends/skyrl_train/gpu/gpu_ci/conftest.py

    # needed for megatron tests
    env_vars["CUDA_DEVICE_MAX_CONNECTIONS"] = "1"
-    env_vars["NVTE_FUSED_ATTN"] = "0"
+    env_vars["NVTE_FUSED_ATTN"] = "1"


Why was this change made?

SkyRL/skyrl/train/utils/utils.py

Line 582 in 7ba2490

# disable fused attention for megatron with flash_attn

it was needed to run the r3 tests but i can revert it

SumanthRH · 2026-04-07T23:52:14Z

examples/train_scripts/test_new_vs_old_inference.py

Could we move this to scripts? It will be lost here

https://github.com/NovaSky-AI/SkyRL/tree/main/examples/train_scripts

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH

Can you add test_router_replay.py to GPU CI with _SKYRL_USE_NEW_INFERENCE=1 ? The primary integration test is currently skipped but it will be good to have it in the CI script. @erictang000 is working on re-enabling this test so we will have coverage as soon as his changes land.

Add it here:

SkyRL/ci/gpu_ci_run_skyrl_train.sh

Line 39 in 7ba2490

    
           _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_expert_parallel_inference.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH · 2026-04-08T18:29:18Z

Can you fix lint @hao-aaron

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH and others added 9 commits March 19, 2026 17:45

Merge remote-tracking branch 'origin/main' into r3-new-inference

7bae48f

[inference] Add comment explaining /skyrl/v1/generate custom endpoint

4e5d2ef

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

5e942a1

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

testing

e2aba33

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

1a1e649

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

62ef81c

x

fb7841c

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review April 2, 2026 22:46

This comment was marked as resolved.

Sign in to view

x

d417624

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

This comment was marked as resolved.

Sign in to view

SumanthRH requested changes Apr 3, 2026

View reviewed changes

tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py Show resolved Hide resolved

SumanthRH reviewed Apr 3, 2026

View reviewed changes

examples/train/router_replay/run_moonlight16b_router_replay.sh Outdated Show resolved Hide resolved

hao-aaron and others added 2 commits April 7, 2026 10:27

Update tests/backends/skyrl_train/inference_servers/test_remote_infer…

ac3f23d

…ence_client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Update examples/train/router_replay/run_moonlight16b_router_replay.sh

be0daa4

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

This comment was marked as resolved.

Sign in to view

Update skyrl/backends/skyrl_train/inference_servers/remote_inference_…

e82a89d

…client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH reviewed Apr 7, 2026

View reviewed changes

Apply suggestions from code review

4952199

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH reviewed Apr 8, 2026

View reviewed changes

SumanthRH mentioned this pull request Apr 8, 2026

[CI] Migrate GPU CI to run on new inference codepath #1476

Draft

x

4a9f527

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH approved these changes Apr 8, 2026

View reviewed changes

x

50a6518

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH merged commit 7f5eba1 into NovaSky-AI:main Apr 8, 2026
4 of 6 checks passed

Conversation

hao-aaron commented Apr 2, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH commented Apr 3, 2026

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

hao-aaron Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hao-aaron commented Apr 2, 2026 •

edited by devin-ai-integration bot

Loading