Skip to content

Conversation

@tenpercent
Copy link
Contributor

Summary

Adds a new command-line profiling tool (ck-rocprof) to simplify GPU performance analysis workflow using AMD rocprof-compute.

Key Features

  • Easy setup - Automatic Python venv configuration with one command
  • Simple CLI - Intuitive commands: setup, run, analyze, compare, list
  • Auto GPU detection - Automatically detects GPU architecture (e.g., gfx950, MI350)
  • LDS metrics focus - Specializes in Local Data Share (Block 12) metrics for bank conflict analysis
  • Comprehensive docs - Detailed documentation with examples and troubleshooting

Usage Examples

# One-time setup
ck-rocprof setup

# Profile an executable
ck-rocprof run baseline ./bin/tile_example_gemm_universal

# Analyze LDS metrics
ck-rocprof analyze baseline

# Compare two profiling runs
ck-rocprof compare baseline optimized

# List available runs
ck-rocprof list

Files Added

  • script/tools/ck-rocprof (414 lines) - Main profiling tool script
  • script/tools/ck-rocprof.md (415 lines) - Complete documentation

Benefits

  • Streamlines GPU profiling workflow for CK developers
  • Reduces complexity of using rocprof-compute directly
  • Enables easier performance optimization and bank conflict detection
  • Provides convenient comparison tools for A/B testing kernel changes

Test Plan

  • Manual testing with various CK executables
  • Verified setup process creates proper Python venv
  • Validated profiling runs capture LDS metrics correctly
  • Tested comparison functionality between multiple runs
  • Confirmed documentation examples work as expected

Adds a command-line profiling tool to simplify GPU performance
analysis workflow using AMD rocprof-compute.

Features:
- Easy setup with automatic Python venv configuration
- Simple CLI: setup, run, analyze, compare, list
- Automatic GPU architecture detection
- Focus on LDS metrics (Block 12) for bank conflict analysis
- Comprehensive documentation with examples and troubleshooting

Usage:
  ck-rocprof setup                    # One-time environment setup
  ck-rocprof run <name> <executable>  # Profile executable
  ck-rocprof analyze <name> [block]   # Analyze metrics
  ck-rocprof compare <name1> <name2>  # Compare two runs
  ck-rocprof list                     # List available runs
- Streamlined documentation from 416 to 157 lines (62% reduction)
- Focused on essential commands, metrics, and workflows
- Enhanced script to run all operations inside Docker containers
- Fixed workload directory path and improved container management
- Added automatic rocprofiler-compute installation and dependency handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants