QuickRunCUDA

No description available

Generate Convert Improve

Install / Use

/learn @ademeure/QuickRunCUDA

About this skill

Quality Score

0/100

README

QuickRunCUDA

This is the microbenchmarking framework I used to build the project that won the SemiAnalysis GPU Hackathon ("Optimizing NVIDIA Blackwell’s Split L2"): https://semianalysis.com/2025-hackathon-eol/

The finished & polished project code is available here: https://github.com/ademeure/QuickRunCUDA/blob/main/tests/side_aware.cu

Example command to run the L2 Side Aware reduction that calculates the FP32 absmax of an input array (on H100/GH200/GB200):

make

./QuickRunCUDA -i -p -t 1024 -A 1000000000 -0 1000000000 -T 100 -P 4.0 -U GB/s tests/side_aware.cu

You can uncomment "FORCE_RANDOM_SIDE" to prevent the optimization (but keeping some of the overhead). This shows that performance doesn't significantly improve, but it reduces power consumption by up to ~9% on GH200 with random data ('-r' flag)!

It is possible to extend this to any elementwise operation or memcpy, but it requires very complicated manual memory management to make it work on both the input and output sides simultaneously. So it can't really be done as part of this kind of microbenchmarking framework. It might be possible to do it in PyTorch using a custom allocator and mempool but I'm not 100% sure at this point.

Let me know if you have any questions about the L2 Side Aware project or the QuickRunCUDA framework in general!

Related Skills

node-connect

349.7k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.7k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.7k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。