Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning #1158

gmagogsfm · 2025-11-21T00:03:01Z

This change automatically sets atol=0.0 and rtol=0.0 for autotuning accuracy checks when fp8 dtypes are detected in kernel outputs, while respecting user-specified tolerance values.

Behavior:

If user sets neither tolerance: automatic 0.0 for fp8 dtypes
If user sets either tolerance: both values are respected (no override)
Supports all fp8 variants: float8_e4m3fn, float8_e5m2, float8_e4m3fnuz, float8_e5m2fnuz, float8_e8m0fnu

Fixes issue where kernels returning fp8 types would fail autotuning with: "Rtol=0.01 and atol=0.01 are not supported for bitwise comparison of low dimensional floats", due to pytorch's bitwise comparison

gmagogsfm · 2025-11-21T00:17:56Z

@jansel @yf225 When testing with vLLM kernel, I discovered that quantized outputs (fp8 type) would trigger an assert in PyTorch when tolerance level is set to anything non-zero. This PR would automatically set the tolerance level correctly for these kernels.

This change automatically sets atol=0.0 and rtol=0.0 for autotuning accuracy checks when fp8 dtypes are detected in kernel outputs, while respecting user-specified tolerance values. Changes: - Added _user_set_atol and _user_set_rtol flags to Settings to track whether users explicitly set tolerance values - Added _compute_effective_tolerances() method to BaseSearch that detects fp8 dtypes and automatically sets tolerances to 0.0 when user hasn't explicitly specified them - Updated _validate_against_baseline() to use effective tolerances instead of settings values directly Behavior: - If user sets neither tolerance: automatic 0.0 for fp8 dtypes - If user sets either tolerance: both values are respected (no override) - Supports all fp8 variants: float8_e4m3fn, float8_e5m2, float8_e4m3fnuz, float8_e5m2fnuz, float8_e8m0fnu Tests: - test_autotune_fp8_automatic_tolerance: verifies automatic detection - test_autotune_fp8_explicit_tolerance_override: verifies user values respected - test_autotune_fp8_explicit_default_tolerance: verifies explicit 1e-2 respected - test_autotune_fp8_partial_tolerance_override: verifies partial specification respected Fixes issue where fp8 kernels would fail autotuning with: "Rtol=0.01 and atol=0.01 are not supported for bitwise comparison of low dimensional floats"

helion/autotuner/base_search.py

yf225 · 2025-11-21T02:23:36Z

helion/runtime/settings.py

+    autotune_baseline_rtol: float = dataclasses.field(default=1e-2)
+    # Internal fields to track if user explicitly set tolerance values
+    _user_set_atol: bool = dataclasses.field(default=False, init=False, repr=False)
+    _user_set_rtol: bool = dataclasses.field(default=False, init=False, repr=False)


Not sure if it's possible, but would really love to avoid needing to have these private fields on the settings class, to keep things more simple

Good point, I changed the implementation a bit. Now the autotune_baseline_atol field is an optional[float]. None value represents unset tolerance level.

gmagogsfm

PTAL

helion/autotuner/base_search.py

gmagogsfm · 2025-11-21T06:08:03Z

helion/runtime/settings.py

+    autotune_baseline_rtol: float = dataclasses.field(default=1e-2)
+    # Internal fields to track if user explicitly set tolerance values
+    _user_set_atol: bool = dataclasses.field(default=False, init=False, repr=False)
+    _user_set_rtol: bool = dataclasses.field(default=False, init=False, repr=False)


Good point, I changed the implementation a bit. Now the autotune_baseline_atol field is an optional[float]. None value represents unset tolerance level.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 21, 2025

gmagogsfm force-pushed the auto-fp8-tolerance branch 2 times, most recently from b5bbd30 to fa64d26 Compare November 21, 2025 00:17

gmagogsfm force-pushed the auto-fp8-tolerance branch from fa64d26 to de76da9 Compare November 21, 2025 01:05

yf225 reviewed Nov 21, 2025

View reviewed changes

helion/autotuner/base_search.py Show resolved Hide resolved

yf225 reviewed Nov 21, 2025

View reviewed changes

remove use of private field

66b0a8f

gmagogsfm commented Nov 21, 2025

View reviewed changes

gmagogsfm requested a review from yf225 November 21, 2025 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning #1158

Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning #1158

gmagogsfm commented Nov 21, 2025

Uh oh!

gmagogsfm commented Nov 21, 2025

Uh oh!

Uh oh!

yf225 Nov 21, 2025

Uh oh!

gmagogsfm Nov 21, 2025

Uh oh!

gmagogsfm left a comment

Uh oh!

Uh oh!

gmagogsfm Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning #1158

Are you sure you want to change the base?

Automatically use zero tolerance for bitwise comparison for fp8 dtypes during autotuning #1158

Conversation

gmagogsfm commented Nov 21, 2025

Uh oh!

gmagogsfm commented Nov 21, 2025

Uh oh!

Uh oh!

yf225 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

gmagogsfm Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

gmagogsfm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gmagogsfm Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants