Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
README_en.md	README_en.md
agent_architectures.md	agent_architectures.md
agent_architectures_en.md	agent_architectures_en.md
kernel_generation.md	kernel_generation.md
kernel_generation_en.md	kernel_generation_en.md
kernel_skills_framework.md	kernel_skills_framework.md
kernel_skills_framework_en.md	kernel_skills_framework_en.md
kernel_verification.md	kernel_verification.md
kernel_verification_en.md	kernel_verification_en.md
llm_autotuning.md	llm_autotuning.md
llm_autotuning_en.md	llm_autotuning_en.md
tutorial.md	tutorial.md
tutorial_en.md	tutorial_en.md

中文 | English

Kernel Agent: LLM-Driven GPU Kernel Development

Overview

Kernel Agent 是一个新兴方向：利用 LLM (Large Language Model) 辅助甚至自动化 GPU kernel 的开发、优化和调试。

为什么需要 Kernel Agent

GPU kernel 开发的痛点：

高门槛: 需要深入理解 GPU 架构、memory hierarchy、warp scheduling
耗时: 一个高性能 kernel 可能需要数周迭代
硬件碎片化: NVIDIA/AMD/Intel 各有不同的指令集和优化策略
调优复杂: tile size, pipeline stages, memory layout 等参数组合爆炸

LLM 的机会：

已有大量开源 kernel 代码可供学习 (CUTLASS, Triton, FlashAttention...)
Kernel 的正确性可以通过 numerical testing 自动验证
性能可以通过 profiling 自动评估
形成 generate → verify → profile → iterate 的闭环

当前研究方向

                    Kernel Agent Landscape
                    
    ┌─────────────────────────────────────────────┐
    │              Kernel Generation               │
    │  (LLM generates CUDA/Triton kernel code)     │
    ├─────────────────────────────────────────────┤
    │            Kernel Optimization               │
    │  (LLM suggests tuning configs / code edits)  │
    ├─────────────────────────────────────────────┤
    │            Kernel Verification               │
    │  (Automated testing, numerical accuracy)     │
    ├─────────────────────────────────────────────┤
    │           LLM-Guided Auto-Tuning            │
    │  (Replace blind search with LLM reasoning)   │
    ├─────────────────────────────────────────────┤
    │             Agent Architecture               │
    │  (ReAct, tool-augmented, multi-agent)        │
    └─────────────────────────────────────────────┘

Topic	Description
Kernel Generation	LLM-based kernel code generation: KernelBench, ChatCUDA, Triton codegen
Kernel Verification	How to verify generated kernels: numerical accuracy, correctness testing
LLM-Guided Auto-Tuning	Using LLMs to predict tuning configs, replacing blind search
Agent Architectures	ReAct agents, tool-augmented agents, multi-agent systems for kernel dev
Practical Tutorial	End-to-end example: using an LLM agent to write and optimize a Triton kernel
Kernel Skills Framework	三层 skill 体系（编排/优化技法/算子模式）、6 层 Playbook、验证评估层 [WIP]

Key Projects & Papers

Project	Description	Link
KernelBench	Benchmark for LLM kernel generation	GitHub
KernelLLM	NVIDIA's LLM for kernel generation	Blog
KernelAgent	PyTorch 官方多 Agent kernel 优化	Blog
CUDA-Agent	RL-based kernel optimization (ByteDance/Tsinghua)	GitHub
TritonForge	RL + compiler kernel generation	GitHub
AutoKernel	Systematic optimization loop	GitHub
AutoTuner-Agent	LLM-guided Triton autotuning	Research
SWE-bench	通用软件工程评测基准（非 GPU kernel 专用）	swe-bench.github.io

Community Kernel Skills & Tools

Profiling & Optimization Skills

Skill	描述	链接
ncu-cuda-profiling-skill	自动识别 5 种 GPU 瓶颈类型（DRAM/L1/Latency/Compute/Occupancy）	GitHub
cuda-optimization-skill	基于 AI Agent 的 CUDA kernel 自动优化闭环	GitHub
sparse-mask-attention	用 LLM 写出超越社区的 GPU 算子	GitHub

Kernel 生态

Project	描述	链接
HF kernels	Hugging Face 的 CUDA kernel + Agent Skills 系统	GitHub
HF upskills	用强模型 trace 蒸馏 skill → 小模型评估 skill lift	Blog
TritonLLM	Triton kernel 优先的推理框架	GitHub
triton-index	Triton pattern 库	GitHub
FlashKernel	真实 kernel 实现集合	GitHub
KernelLLM (Meta)	Meta 的 Kernel 生成专用 LLM	HuggingFace
KernelGYM & Dr.Kernel	Kernel 训练与评估	GitHub

Benchmarks

Benchmark	链接
KernelBench (Stanford)	GitHub
SOL-ExecBench (NVIDIA)	Link
LeetGPU	Link
FlashInfer Bench	Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Kernel Agent: LLM-Driven GPU Kernel Development

Overview

为什么需要 Kernel Agent

当前研究方向

Contents

Key Projects & Papers

Community Kernel Skills & Tools

Profiling & Optimization Skills

Kernel 生态

Benchmarks

Reference

FilesExpand file tree

16-kernel-agent

Directory actions

More options

Directory actions

More options

Latest commit

History

16-kernel-agent

Folders and files

parent directory

README.md

Kernel Agent: LLM-Driven GPU Kernel Development

Overview

为什么需要 Kernel Agent

当前研究方向

Contents

Key Projects & Papers

Community Kernel Skills & Tools

Profiling & Optimization Skills

Kernel 生态

Benchmarks

Reference