SkillAgentSearch skills...

Quack

A Quirky Assortment of CuTe Kernels

Install / Use

/learn @Dao-AILab/Quack
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🦆 QuACK: A Quirky Assortment of CuTe Kernels 🦆

Kernels are written in the CuTe-DSL.

Installation

# For CUDA 12.9:
pip install quack-kernels

# For CUDA 13.1:
pip install 'quack-kernels[cu13]' --extra-index-url https://download.pytorch.org/whl/cu130

# Or using uv (faster):
uv pip install 'quack-kernels[cu13]'

# Optional: install NVIDIA matmul heuristics for better untuned GEMM configs
pip install 'quack-kernels[heuristics]'

Requirements

  • H100 or B200/B300 GPU
  • CUDA toolkit 12.9+
  • Python 3.12

Kernels 🐥

  • 🦆 RMSNorm forward + backward
  • 🦆 Softmax forward + backward
  • 🦆 Cross entropy forward + backward
  • 🦆 Layernorm forward
  • 🦆 Hopper gemm + epilogue
  • 🦆 Blackwell gemm + epilogue

Usage

from quack import rmsnorm, softmax, cross_entropy

Documentations

[2025-07-10] We have a comprehensive blogpost on how to get memory-bound kernels to speed-of-light, right in the comfort of Python thanks to the CuTe-DSL.

Performance

<div align="center"> <figure> <img src="media/bf16_kernel_benchmarks_single_row.svg" > </figure> </div>

See our blogpost for the details.

Development

To set up the development environment:

pip install -e '.[dev]'
pre-commit install

# For CUDA 13.1:
pip install 'quack-kernels[dev,cu13]' --extra-index-url https://download.pytorch.org/whl/cu130

# Or using uv:
uv pip install 'quack-kernels[dev,cu13]'
View on GitHub
GitHub Stars898
CategoryDevelopment
Updated6h ago
Forks104

Languages

Python

Security Score

95/100

Audited on Apr 6, 2026

No findings