Tutorial Tue, Dec 2, 2025 • 1:30 PM – 4:00 PM PST Don Alberto 2

How to Build Agents to Generate Kernels for Faster LLMs (and Other Models!)

sina rafati · Hao Li · Efthimios T Gianitsos · Sharon Zhou

[ Slides]

Abstract

The compute demanded by modern AI has been exploding since 2016; the FLOPs used to train frontier models have grown at a rate of 2.4x per year [0], and the inference side is growing even faster—already an estimated 80% of total AI electricity use [1]. Large language models and other deep networks rely on highly tuned GPU kernels to achieve state-of-the-art performance; these efficient kernels directly translate to cost and energy savings. In this 2.5-hour in-person tutorial, we demonstrate how LLM-powered agents can generate and optimize GPU kernels for CUDA, HIP/ROCm, and Triton. We begin with a unified primer on GPU‐programming fundamentals and common tooling (memory hierarchy, occupancy, profilers), then introduce an agentic loop: prompt engineering, compiler/profiler feedback as tools, iterative kernel refinement, correctness validation, and automated benchmarking. We will provide additional benchmarking examples on HIP and Triton, on top of Stanford’s KernelBench that covers CUDA [2], KernelBot as reliable source of human curated dataset for heterogenous GPU code [3], and show how to turn runtime and profiler metrics into reward signals that drive kernel optimizations. On top of this loop, we build an inference-scaling framework in which the LLM proposes candidate kernels, compiles them, measures latency/throughput/energy, and feeds those signals back as rewards. By combining test-time scaling techniques the agent iteratively discovers increasingly accurate and efficient kernels. Attendees will compare generated code against expert kernels, inspect wins and losses. By the end, participants will walk away with a reproducible pipeline for LLM-driven GPU‐kernel optimization.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

1:30 PM

Intro

Sharon Zhou

Video

1:40 PM

GPU Architecture

sina rafati

Video

2:00 PM

HIP Kernel optimization and hands on demo

Hao Li

Video

2:30 PM

Triton Kernel optimization and hands on demo

Efthimios T Gianitsos

Video

3:00 PM

Agents to generate Kernels

sina rafati

Video

3:30 PM

close out remarks

Sharon Zhou

Video

3:45 PM

Q&A

sina rafati · Hao Li · Efthimios T Gianitsos · Sharon Zhou

Video