SkillAgentSearch skills...

Teenygrad

teaching software 2.0 to programmers of software 1.0

Install / Use

/learn @j4orz/Teenygrad
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

teenygrad

The Structure and Interpretation of Tensor Programs' capstone project

Contents

Motivation

The SITP and teenygrad project is trying to fill a pedagogical gap in the discipline of deep learning systems. With traditional software 1.0, the languages and runtimes that makeup production-grade systems such as LLVM and Linux have way too much tail-end complexity (both fundamental and accidental) which make them inappropriate as learning vehicles. Instead, there exists teaching compilers and operating systems — to name a few,

  • a mini Lisp-like interpreter, a metacircular evaluator
  • a mini C-like compiler chibicc (in turn inspired by tcc and lcc),
  • a mini LLVM-like SSA instruction set Bril
  • a mini Unix-like operating system xv6
  • a mini x86-like instruction set LC3

For deep learning systems, given that the discpline is relatively new (the 2020-2025? era of scaling has just passed), the pedagogical material is quite nascent. While there are some great resources such as Sasha Rush's minitorch course at Cornell and Tianqi Chen's needl course at Carnegie Mellon, there are a few gaps that I personally would like to see filled, which is what SITP and teenygrad trying to do.

SITP Installation (Book)

  1. Install mdbook
  2. cd sitp/
    mdbook serve
    

teenygrad Installation (Codebase)

Follow these instructions for a quick setup. To understand the physical layout of the project repo, refer to the ARCHITECTURE.md

Eager Mode

teenygrad eager mode (developed in part 1 and 2 of the book) has a mixed source of Python, Rust, and CUDA Rust in order to support CPU and GPU acceleration. The Python to Rust interop is implemented using CPython Extension Modules via PyO3, with the shared object files compiled by driving cargo via PyO3's build tool maturin.

CPU kernels (RISC-V)

  1. CPU kernels do not use the docker container (for now).
    cd teeny/
    uv pip install maturin                             # install maturin (which drives pyo3)
    cd rust && cargo run                               # run cpu acccelerated gemm kernel
    maturin develop                                    # build shared object for cpython's extension modules
    uv run examples/abstractions.py                    # run cpu accelerated gemm kernel from python
    

GPU kernels (PTX)

To enable GPU acceleration, teenygrad uses CUDA Rust, which in turn requires a specific version matrix required (notably the LLVM subset NVVM pinned to LLVM 7.x, because CUDA Rust targets NVVM rather than using LLVM's PTX codegen) and so docker containers and shell scripts provided by CUDA Rust are reused for teenygrad development.

  1. Install NVIDIA Container Toolkit on your machine
  2. Then run the following in your shell:
    cd teeny/
    sudo nvidia-ctk runtime configure --runtime=docker # set nvidia's container runtime to docker
    sudo systemctl restart docker                      # restart docker
    ./dcr.sh                                           # create container with old version of llvm for cuda rust
    ./dex.sh "cd eagkers && cargo run --features gpu"  # run gpu accelerated gemm kernel
    ./dex.sh "maturin develop"                         # build the shared object for cpython's extension modules
    ./dex.sh "uv run examples/abstractions.py"         # run gpu accelerated gemm kernel from python
    
    Also note that ./dcr.sh is the production container, so that any commands to run the Rust with cargo, build the Rust with maturin, or run the Python with uv must be qualified with ./dex.sh.
  3. For VSCode development, when you open the project with VS Code you will be prompted with "Folder contains a Dev Container configuration file. Reopen folde to develop in a container" in which you press the button Reopen Container, which will restart vscode with the development container specified at .devcontainer with the CUDA Rust provided containers in order to enable rustanalyzer. The final step is to point rustanalyzer to the Rust and CUDA Rust source in settings.json:
    {
      <!-- other fields in settings.json -->
      "rust-analyzer.linkedProjects": ["teeny/eagkers/Cargo.toml"],
      "rust-analyzer.cargo.features": ["gpu"],
    }
    
    Note that when VSCode opening the project's development container, none of the ./dex.sh commands from step 2 will work, since the development container doesn't have docker. For that, either enter those commands in the shell of a second VSCode editor, or simply different shell software.

Graph Mode

teenygrad graph mode (developed in part 3 of the book) is a pure Python Tensor compiler.

Related Skills

View on GitHub
GitHub Stars63
CategoryDevelopment
Updated1d ago
Forks7

Languages

Python

Security Score

95/100

Audited on Apr 5, 2026

No findings