Initial Commit GPT-OSS #485

hlahkar · 2025-10-28T04:29:42Z

This enables GPT OSS with naive attention. Features enabled:

Sinks in Attention
Bias in MoE

Signed-off-by: Himangshu Lahkar <[email protected]>

michalkuligowski · 2025-11-04T08:07:17Z

vllm_gaudi/extension/ops.py

+            if self.bias is not None:
+                w1_bias_list = [self.w13_list[i].bias.squeeze() for i in experts_range]
+                w2_bias_list = [self.w2_list[i].bias.squeeze() for i in experts_range]
+                return torch.ops.hpu.mixture_of_experts.bias_fused_weights(hidden_states=hidden_states,


Test fails with:
"The underlying op of 'hpu.mixture_of_experts' has no overload name 'bias_fused_weights'. Did you mean: 'fp8_fused_weights'" please fix

The CI is on 1.22.0; this needs 1.23.0 software, that's the reason it's failing; we can merge this only after CI moves to 1.23.0 release

Copilot

Pull Request Overview

This PR enables GPT-OSS model support with two main features: attention sinks for improved context handling and bias support in Mixture of Experts (MoE) layers.

Key Changes:

Added sink attention mechanism to handle long-context scenarios across naive, FSDPA, and flat attention implementations
Implemented bias support in MoE operations for models requiring biased expert computations
Added model-specific routing logic for GPT-OSS in the MoE forward pass

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
vllm_gaudi/ops/hpu_fused_moe.py	Added bias handling in MoE layers and GPT-OSS specific router weight processing
vllm_gaudi/extension/utils.py	Extended FSDPA forward method to accept sinks parameter
vllm_gaudi/extension/ops.py	Implemented sink attention logic across multiple attention implementations and added bias support to MoE operations
vllm_gaudi/attention/backends/hpu_attn.py	Added sinks parameter to attention implementations with validation and dtype conversion
tests/unit_tests/sinks/test_gpt_oss.py	Added integration test for GPT-OSS model with expected outputs

Comments suppressed due to low confidence (2)

vllm_gaudi/attention/backends/hpu_attn.py:1

Missing space after '#' in comment. Should be '# causal' for proper comment formatting.

# SPDX-License-Identifier: Apache-2.0

vllm_gaudi/attention/backends/hpu_attn.py:1

Inconsistent TODO format: should be 'TODO:' with a colon instead of a dash.

# SPDX-License-Identifier: Apache-2.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T05:06:09Z

vllm_gaudi/extension/ops.py

+                    w12=w1_list,
+                    w3=w2_list,
+                    w12_bias=w1_bias_list_slice,
+                    w3_bias=w2_bias_list_slice,
+                    permuted_weights=permuted_weights,
+                    experts_min=self.experts_min,
+                    experts_max=self.experts_max)


Incorrect weight lists passed to MoE operation. Should use sliced lists w1_list_slice and w2_list_slice instead of full lists w1_list and w2_list to match the expert range being processed.

Suggested change

w12=w1_list,

w3=w2_list,

w12_bias=w1_bias_list_slice,

w3_bias=w2_bias_list_slice,

permuted_weights=permuted_weights,

experts_min=self.experts_min,

experts_max=self.experts_max)

w12=w1_list_slice,

w3=w2_list_slice,

w12_bias=w1_bias_list_slice,

w3_bias=w2_bias_list_slice,

permuted_weights=permuted_weights,

experts_min=min_expert,

experts_max=max_expert)

Copilot · 2025-11-10T05:06:10Z

vllm_gaudi/extension/ops.py

+                    experts_min=self.experts_min,
+                    experts_max=self.experts_max)


Incorrect expert range parameters. Should use min_expert and max_expert (computed for the current slice) instead of self.experts_min and self.experts_max to correctly process the expert slice.

Suggested change

experts_min=self.experts_min,

experts_max=self.experts_max)

experts_min=min_expert,

experts_max=max_expert)

Copilot · 2025-11-10T05:06:10Z

vllm_gaudi/attention/backends/hpu_attn.py

+                # TODO - change 128 to proper window size
+                window_size = (
+                    128,


Magic number 128 used for window size. Consider defining this as a named constant or deriving it from self.sliding_window as indicated by the TODO comment.

Suggested change

# TODO - change 128 to proper window size

window_size = (

128,

# Use self.sliding_window for window size instead of hardcoded 128

window_size = (

self.sliding_window,

Copilot · 2025-11-10T05:06:10Z

tests/unit_tests/sinks/test_gpt_oss.py

+            tensor_parallel_size=4,
+        )
+        generated_texts = do_sample(llm, original_output=original_output_120, rtol=1e-01, atol=1e-01, max_num_seqs=1)
+    assert generated_texts == expected_output


Assertion compares single generated text with expected output incorrectly. The function returns a list but only validates the first element earlier. This assertion will fail unless generated_texts contains exactly one element matching expected_output[0]. Consider assert generated_texts[0] == expected_output[0] or assert generated_texts == expected_output after validating the list length.

Suggested change

assert generated_texts == expected_output

assert len(generated_texts) == len(expected_output)

assert generated_texts[0] == expected_output[0]

Copilot · 2025-11-10T05:06:11Z

vllm_gaudi/extension/ops.py

+            attn_sink = attn_sink.exp()
+            if attn_sink.dtype == torch.float32:
+                attn_sink = attn_sink.to(value.dtype)
+            #TODO: Removing this .sum and using attn_sink directly


Corrected spacing in TODO comment: should be 'TODO:' with a space after the colon for consistency.

Suggested change

#TODO: Removing this .sum and using attn_sink directly

# TODO: Removing this .sum and using attn_sink directly

Copilot · 2025-11-10T05:06:11Z

vllm_gaudi/attention/backends/hpu_attn.py

-                    attn_bias = None
-                    window_size = (self.sliding_window, 0)
-                    common_args['window_size'] = window_size
+                # TODO - change 128 to proper window size


Inconsistent TODO format: should be 'TODO:' with a colon instead of a dash for consistency with project conventions.

Suggested change

# TODO - change 128 to proper window size

# TODO: change 128 to proper window size

Signed-off-by: Himangshu Lahkar <[email protected]>

Initial Commit GPT-OSS

c52dbfd

Signed-off-by: Himangshu Lahkar <[email protected]>

hlahkar requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 28, 2025 04:29

hlahkar mentioned this pull request Oct 28, 2025

Gpt Oss Enablement #441

Closed

hlahkar added 5 commits October 28, 2025 06:37

Update Formatting

bc3d704

Signed-off-by: Himangshu Lahkar <[email protected]>

Update Test Case

f3e2553

Signed-off-by: Himangshu Lahkar <[email protected]>

Remove unused variable from test

1d35ae9

Signed-off-by: Himangshu Lahkar <[email protected]>

Update model_runner

a350ae9

Signed-off-by: Himangshu Lahkar <[email protected]>

Set FUSED_SDPA to 0 for test

1928416

Signed-off-by: Himangshu Lahkar <[email protected]>

michalkuligowski requested changes Nov 4, 2025

View reviewed changes

Merge branch 'main' into gpt_oss

596433a

Copilot AI review requested due to automatic review settings November 10, 2025 05:04

Copilot AI reviewed Nov 10, 2025

View reviewed changes

Set window_size for fsdpa based on sliding_window

2a2968a

Signed-off-by: Himangshu Lahkar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial Commit GPT-OSS #485

Initial Commit GPT-OSS #485

hlahkar commented Oct 28, 2025

Uh oh!

michalkuligowski Nov 4, 2025

Uh oh!

hlahkar Nov 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	assert generated_texts == expected_output
	assert len(generated_texts) == len(expected_output)
	assert generated_texts[0] == expected_output[0]

	#TODO: Removing this .sum and using attn_sink directly
	# TODO: Removing this .sum and using attn_sink directly

	# TODO - change 128 to proper window size
	# TODO: change 128 to proper window size

Initial Commit GPT-OSS #485

Are you sure you want to change the base?

Initial Commit GPT-OSS #485

Conversation

hlahkar commented Oct 28, 2025

Uh oh!

michalkuligowski Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

hlahkar Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants