Skip to content

[R3] Enable R3 with new inference#1428

Merged
SumanthRH merged 16 commits intoNovaSky-AI:mainfrom
hao-aaron:r3-new-inference
Apr 8, 2026
Merged

[R3] Enable R3 with new inference#1428
SumanthRH merged 16 commits intoNovaSky-AI:mainfrom
hao-aaron:r3-new-inference

Conversation

@hao-aaron
Copy link
Copy Markdown
Contributor

@hao-aaron hao-aaron commented Apr 2, 2026

SumanthRH and others added 9 commits March 19, 2026 17:45
…nference codepath

Adds a custom `/skyrl/v1/generate` endpoint to `VLLMServerActor` that
calls the vLLM engine directly and returns `routed_experts` alongside
token output. The standard `/inference/v1/generate` endpoint's
`GenerateResponseChoice` does not include `routed_experts` (only available
on the Python `CompletionOutput` object), so a custom endpoint is required.

Changes:
- `vllm_server_actor.py`: Add `/skyrl/v1/generate` endpoint with correct
  logprobs serialisation (placeholder `-9999.0` for missing entries,
  matching vLLM's `ChatCompletionLogProb` default) and `routed_experts`
  extraction. Raises `NotImplementedError` if LoRA is enabled.
- `remote_inference_client.py`: Switch `_generate_single` to
  `/skyrl/v1/generate`; extract and propagate `routed_experts` through
  to `InferenceEngineOutput.rollout_expert_indices`.
- `inference_servers/utils.py`: Pass `enable_return_routed_experts` to
  vLLM CLI args so the engine computes routed experts.
- `train/utils/utils.py`: Gate the `mp` backend assertion for R3 behind
  `if not _SKYRL_USE_NEW_INFERENCE` (new path uses ray backend); remove
  the `ValueError` blocking R3 on the new inference path; add startup
  validation that LoRA + R3 cannot be combined on the new path.
- `main_base.py`, `tests/gpu/utils.py`: Pass `enable_return_routed_experts`
  when constructing `RemoteInferenceClient`.
- `test_remote_inference_client.py`: Update mock endpoint to
  `/skyrl/v1/generate` returning a single choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pported

R3 requires the mp backend to avoid hangs, but mp is not yet supported
on the new inference path (tracked in NovaSky-AI#1309). Restore the ValueError
blocking R3 on new inference, and un-gate the mp assertion so it applies
to both old and new inference paths consistently.

The infrastructure changes (/skyrl/v1/generate endpoint, RemoteInferenceClient
propagation) remain as pre-work for when mp support lands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Made-with: Cursor

# Conflicts:
#	skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py
#	skyrl/train/entrypoints/main_base.py
#	tests/backends/skyrl_train/gpu/utils.py
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@hao-aaron hao-aaron marked this pull request as ready for review April 2, 2026 22:46
devin-ai-integration[bot]

This comment was marked as resolved.

gemini-code-assist[bot]

This comment was marked as resolved.

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@SumanthRH
Copy link
Copy Markdown
Member

For reference: We ran the Moonlight-16B script with the old and the new inference and got matching curves:

https://api.wandb.ai/links/sky-posttraining-uc-berkeley/lwaaqy73

Screenshot 2026-04-03 at 4 20 37 PM image

Copy link
Copy Markdown
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address the issue with the sample API

hao-aaron and others added 2 commits April 7, 2026 10:27
…ence_client.py

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…client.py

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
# needed for megatron tests
env_vars["CUDA_DEVICE_MAX_CONNECTIONS"] = "1"
env_vars["NVTE_FUSED_ATTN"] = "0"
env_vars["NVTE_FUSED_ATTN"] = "1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# disable fused attention for megatron with flash_attn

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was needed to run the r3 tests but i can revert it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to scripts? It will be lost here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test_router_replay.py to GPU CI with _SKYRL_USE_NEW_INFERENCE=1 ? The primary integration test is currently skipped but it will be good to have it in the CI script. @erictang000 is working on re-enabling this test so we will have coverage as soon as his changes land.

Add it here:

_SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_expert_parallel_inference.py

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@SumanthRH
Copy link
Copy Markdown
Member

Can you fix lint @hao-aaron

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@SumanthRH SumanthRH merged commit 7f5eba1 into NovaSky-AI:main Apr 8, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants