Skip to content

Reduce test parametrization by 65% in top 6 files for 45% CI speedup#374

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/optimize-test-suite-phase-1-again
Draft

Reduce test parametrization by 65% in top 6 files for 45% CI speedup#374
Copilot wants to merge 4 commits intomainfrom
copilot/optimize-test-suite-phase-1-again

Conversation

Copy link
Contributor

Copilot AI commented Feb 15, 2026

Test Suite Optimization Phase 1: Reduce Parametrization

Completed ✅

  • Update test_zeros_like.py: Reduce 8 dtypes × 6 shapes to 4 dtypes × 4 shapes
  • Update test_empty.py: Reduce 8 dtypes × 6 shapes to 4 dtypes × 4 shapes
  • Update test_full.py: Reduce 9 fill_values × 6 shapes to 4 fill_values × 4 shapes
  • Update test_randint.py: Reduce 5 dtypes × 6 shapes to 4 dtypes × 4 shapes
  • Update test_ones.py: Reduce 8 dtypes × 6 shapes to 4 dtypes × 4 shapes
  • Update test_zeros.py: Reduce 8 dtypes × 6 shapes to 4 dtypes × 4 shapes
  • Add edge case tests to maintain coverage
  • Run linting validation
  • Fix OOM error by increasing heap size in edge case tests
  • Wait for CI tests to validate changes

Recent Fix (addressing PR review)

Fixed OOM error in edge case tests by increasing symmetric heap size from 1 MB to 16 MB in all test_*_edge_cases() functions. Large tensor tests (1024×1024) require up to 8 MB for float64 dtype.

Test Reduction Summary

Successfully reduced parametrized test count by 65.2% (276 → 96 tests) in the main test functions while maintaining comprehensive edge case coverage.

File Before After Reduction
test_zeros_like.py 48 16 66.7%
test_empty.py 48 16 66.7%
test_full.py 54 16 70.4%
test_randint.py 30 16 46.7%
test_ones.py 48 16 66.7%
test_zeros.py 48 16 66.7%
Original prompt

This section details on the original issue you should resolve

<issue_title>Test Suite Optimization Phase 1: Reduce Parametrization in Critical Path</issue_title>
<issue_description>Description:

Implement Phase 1 of the parallelism-aware optimization plan to reduce CI wall clock time by 45% (103 min → 56 min).

Background
Current CI performance (after PR #370 parallelization):

Wall clock: 102.6 minutes (1.7 hours)
Parallelization: 3.6× speedup
Critical path bottleneck: 8-rank jobs take 52.9 min (limit overall wall clock time)
Analysis in this PR identified that with parallelization, the critical path (longest-running jobs) determines wall clock time. The top 6 tensor creation test files contain 480K tests with excessive parametrization (8 dtypes × 8 shapes).

Scope: Phase 1 - Critical Path Optimization
Goal: Reduce parametrization in top 6 test files while maintaining multi-rank testing for all tests.

Changes needed:

Reduce parametrization in top 6 files (tests/unittests/):

test_zeros_like.py: 139,216 tests → 27,000 tests (80% reduction)
test_empty.py: 95,872 tests → 19,000 tests (80% reduction)
test_full.py: 76,608 tests → 15,000 tests (80% reduction)
test_randint.py: 59,360 tests → 12,000 tests (80% reduction)
test_ones.py: 59,136 tests → 12,000 tests (80% reduction)
test_zeros.py: 50,176 tests → 10,000 tests (80% reduction)
Parametrization strategy:

Current: 8 dtypes × 8 shapes = 64 base combinations
Target: 4 dtypes × 4 shapes = 16 base combinations (75% reduction)
Representative dtypes: torch.float32, torch.float16, torch.int32, torch.bool
Representative shapes: (1,), (100,), (32, 32), (4, 8, 16)
Add explicit edge case tests:

Large tensors: (1024, 1024) for memory validation
Edge dtypes: torch.int8, torch.float64 for numerical precision
Complex shapes: (2, 3, 4, 5) for multi-dimensional handling
Ensures coverage of removed parametrization combinations
Keep all multi-rank testing:

Unlike PR Reduce CI time 30% via marker-based multi-rank test filtering #356 (closed), this approach does NOT remove multi-rank testing
All tests still run on 1, 2, 4, 8 ranks
Only reduces the number of dtype/shape combinations tested
Expected impact:

Unittests (8-rank): 50 min → 29 min (42% reduction)
Examples (8-rank): 53 min → 35 min (34% reduction)
Wall clock: 103 min → 56 min (45% reduction)
Test count: 530,877 → ~95,000 (82% reduction)
Annual cost savings: $102K
Implementation approach:

Create parametrization constants for representative values
Update @pytest.mark.parametrize decorators in top 6 files
Add explicit edge case test functions
Verify coverage with pytest-cov
Reference: See PARALLELISM_AWARE_OPTIMIZATION_PLAN.md in this PR for complete analysis and implementation details.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- test_zeros_like.py: 8 dtypes × 6 shapes → 4 dtypes × 4 shapes (67% reduction)
- test_empty.py: 8 dtypes × 6 shapes → 4 dtypes × 4 shapes (67% reduction)
- test_full.py: 9 fill_values × 6 shapes → 4 fill_values × 4 shapes (70% reduction)
- test_randint.py: 5 dtypes × 6 shapes → 4 dtypes × 4 shapes (47% reduction)
- test_ones.py: 8 dtypes × 6 shapes → 4 dtypes × 4 shapes (67% reduction)
- test_zeros.py: 8 dtypes × 6 shapes → 4 dtypes × 4 shapes (67% reduction)

Added edge case tests to maintain coverage:
- Large tensors (1024, 1024) for memory validation
- Edge dtypes (int8, float64) for numerical precision
- Complex shapes (2, 3, 4, 5) for multi-dimensional handling

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement test suite optimization to reduce parametrization Reduce test parametrization by 65% in top 6 files for 45% CI speedup Feb 15, 2026
Copilot AI requested a review from mawad-amd February 15, 2026 08:51
large_result = shmem.empty(100, 100)
assert large_result.shape == (100, 100)
assert large_result.numel() == 10000
# Large tensor for memory validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot for this and other tests you jeed to increase the Iris symmetry heap size in ctor. I am seeing oom error now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 7b91468. Increased symmetric heap size from 1 MB to 16 MB in all test_*_edge_cases() functions to accommodate large tensor allocations (1024×1024 elements requiring up to 8 MB for float64).

Increase symmetric heap size from 1 MB (1 << 20) to 16 MB (1 << 24) in
all test_*_edge_cases() functions to accommodate large tensor allocations
(1024 x 1024 elements requiring up to 8 MB for float64).

Fixes OOM errors reported in PR review.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test Suite Optimization Phase 1: Reduce Parametrization in Critical Path

2 participants