Teenygrad
teaching software 2.0 to programmers of software 1.0
Install / Use
/learn @j4orz/TeenygradREADME

teenygrad
The Structure and Interpretation of Tensor Programs' capstone project
Contents
Motivation
The SITP and teenygrad project is trying to fill a pedagogical gap in the discipline of deep learning systems.
With traditional software 1.0, the languages and runtimes that makeup production-grade systems such as LLVM and Linux
have way too much tail-end complexity (both fundamental and accidental) which make them inappropriate as learning vehicles.
Instead, there exists teaching compilers and operating systems — to name a few,
- a mini Lisp-like interpreter, a metacircular evaluator
- a mini C-like compiler chibicc (in turn inspired by tcc and lcc),
- a mini LLVM-like SSA instruction set Bril
- a mini Unix-like operating system xv6
- a mini x86-like instruction set LC3
For deep learning systems, given that the discpline is relatively new (the 2020-2025? era of scaling has just passed), the pedagogical material is quite nascent.
While there are some great resources such as Sasha Rush's minitorch course at Cornell and
Tianqi Chen's needl course at Carnegie Mellon,
there are a few gaps that I personally would like to see filled, which is what SITP and teenygrad trying to do.
SITP Installation (Book)
- Install mdbook
-
cd sitp/ mdbook serve
teenygrad Installation (Codebase)
Follow these instructions for a quick setup.
To understand the physical layout of the project repo, refer to the ARCHITECTURE.md
Eager Mode
teenygrad eager mode (developed in part 1 and 2 of the book)
has a mixed source of Python, Rust, and CUDA Rust in order to support CPU and GPU acceleration.
The Python to Rust interop is implemented using CPython Extension Modules via PyO3,
with the shared object files compiled by driving cargo via PyO3's build tool maturin.
CPU kernels (RISC-V)
- CPU kernels do not use the docker container (for now).
cd teeny/ uv pip install maturin # install maturin (which drives pyo3) cd rust && cargo run # run cpu acccelerated gemm kernel maturin develop # build shared object for cpython's extension modules uv run examples/abstractions.py # run cpu accelerated gemm kernel from python
GPU kernels (PTX)
To enable GPU acceleration, teenygrad uses CUDA Rust,
which in turn requires a specific version matrix required (notably the LLVM subset NVVM pinned to LLVM 7.x,
because CUDA Rust targets NVVM rather than using LLVM's PTX codegen)
and so docker containers and shell scripts provided by CUDA Rust
are reused for teenygrad development.
- Install NVIDIA Container Toolkit on your machine
- Then run the following in your shell:
Also note thatcd teeny/ sudo nvidia-ctk runtime configure --runtime=docker # set nvidia's container runtime to docker sudo systemctl restart docker # restart docker ./dcr.sh # create container with old version of llvm for cuda rust ./dex.sh "cd eagkers && cargo run --features gpu" # run gpu accelerated gemm kernel ./dex.sh "maturin develop" # build the shared object for cpython's extension modules ./dex.sh "uv run examples/abstractions.py" # run gpu accelerated gemm kernel from python./dcr.shis the production container, so that any commands to run the Rust withcargo, build the Rust withmaturin, or run the Python withuvmust be qualified with./dex.sh. - For VSCode development, when you open the project with VS Code you will be prompted with
"Folder contains a Dev Container configuration file. Reopen folde to develop in a container"in which you press the buttonReopen Container, which will restart vscode with the development container specified at.devcontainerwith the CUDA Rust provided containers in order to enablerustanalyzer. The final step is to pointrustanalyzerto the Rust and CUDA Rust source insettings.json:
Note that when VSCode opening the project's development container, none of the{ <!-- other fields in settings.json --> "rust-analyzer.linkedProjects": ["teeny/eagkers/Cargo.toml"], "rust-analyzer.cargo.features": ["gpu"], }./dex.shcommands from step 2 will work, since the development container doesn't have docker. For that, either enter those commands in the shell of a second VSCode editor, or simply different shell software.
Graph Mode
teenygrad graph mode (developed in part 3 of the book) is a pure Python Tensor compiler.
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
