[CK Profiler] Initialize tensors on GPU in CK profiler #3550

johannes-graner · 2026-01-12T12:24:12Z

Proposed changes

This PR migrates CK profilers from CPU-side to GPU-side tensor initialization, eliminating a performance bottleneck. The convolution profilers now use the DeviceMemory API methods for on-device data generation, removing expensive CPU-GPU transfers.

Profiler	Before	After	Speedup
Grouped Conv Forward	45.0 s	16.0 s	2.8x
Grouped Conv Backward Data	15.0 s	8.8 s	1.7x
Grouped Conv Backward Weight	20.0 s	6.9 s	2.9x

This saves in total ~50 s per CI run.

Checklist

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

AviralGoelAMD

Great improvement!

* Initialize tensors on GPU in CK profiler * Kick CI

Initialize tensors on GPU in CK profiler

9946381

johannes-graner requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners January 12, 2026 12:24

AviralGoelAMD approved these changes Jan 12, 2026

View reviewed changes

Kick CI

65ed81d

bartekxk approved these changes Jan 14, 2026

View reviewed changes

johannes-graner merged commit 3ccb15e into develop Jan 14, 2026
21 checks passed

johannes-graner deleted the jograner/gpu-tensor-init branch January 14, 2026 15:04

shumway pushed a commit that referenced this pull request Jan 15, 2026

[CK Profiler] Initialize tensors on GPU in CK profiler (#3550)

615178a

* Initialize tensors on GPU in CK profiler * Kick CI

johannes-graner mentioned this pull request Jan 16, 2026

[CK Profiler] Restore CPU tensor initialization when verification is not done on GPU #3594

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK Profiler] Initialize tensors on GPU in CK profiler #3550

[CK Profiler] Initialize tensors on GPU in CK profiler #3550

Uh oh!

johannes-graner commented Jan 12, 2026

Uh oh!

AviralGoelAMD left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CK Profiler] Initialize tensors on GPU in CK profiler #3550

[CK Profiler] Initialize tensors on GPU in CK profiler #3550

Uh oh!

Conversation

johannes-graner commented Jan 12, 2026

Proposed changes

Checklist

Uh oh!

AviralGoelAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants