Grilly
GPU-accelerated neural network operations using Vulkan compute shaders.
Install / Use
/learn @Grillcheese-AI/GrillyREADME
Grilly
<p align="center"> <img src="https://raw.githubusercontent.com/grillcheese-ai/grilly/main/assets/grilly_mascott_github.png" alt="Grilly" width="400"> </p>Deep learning, well done.
GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU -- AMD, NVIDIA, Intel -- no CUDA dependency. 231 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer with automatic CPU fallback.
Alpha software (v0.6.1). APIs may change between minor versions.
Why Grilly?
- Any GPU: Vulkan runs on AMD, NVIDIA, Intel, and Apple (via MoltenVK). No CUDA lock-in.
- PyTorch-like API:
nn.Module,F.relu,AdamW-- familiar patterns, new backend. - Always works: Pure-Python numpy fallback if no GPU is available. Same code, same results.
- Research-ready: Spiking neural networks, Vector Symbolic Architectures, Mixture of Experts, cognitive controllers, temporal reasoning -- all GPU-accelerated.
- Lightweight: Core dependency is numpy only. Optional extras for torch, HuggingFace, ONNX.
Installation
Option 1: Python-only (no GPU acceleration)
pip install grilly
Works immediately with numpy. No GPU, no Vulkan SDK, no C++ compiler needed.
Option 2: With Vulkan GPU acceleration
Linux / Google Colab (one-liner)
# Full build (~30 min — includes validation layers, all SDK tools)
curl -sSL https://raw.githubusercontent.com/Grillcheese-AI/grilly/main/scripts/install.sh | bash
# Fast build (~5 min — shaderc + loader only, recommended for Colab/CI)
curl -sSL https://raw.githubusercontent.com/Grillcheese-AI/grilly/main/scripts/install.sh | bash -s -- --fast
On Colab:
# Recommended: fast mode for Colab (5 min instead of 30)
!wget -qO- https://raw.githubusercontent.com/Grillcheese-AI/grilly/main/scripts/install.sh | bash -s -- --fast
This installs system deps, downloads and builds Vulkan SDK 1.4, compiles the grilly C++ extension, and installs the Python package. The --fast flag builds only the components grilly needs (shaderc, loader, headers) and skips validation layers.
Linux (manual step-by-step)
# 1. System dependencies (Ubuntu/Debian)
sudo apt-get install -y cmake g++ ninja-build pkg-config \
libxcb-dri3-0 libxcb-present0 libpciaccess0 libpng-dev \
libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev libwayland-dev \
libxrandr-dev libxcb-randr0-dev libx11-xcb-dev wayland-protocols
# 2. Vulkan SDK (download from https://vulkan.lunarg.com/sdk/home)
wget https://sdk.lunarg.com/sdk/download/1.4.341.1/linux/vulkansdk-linux-x86_64-1.4.341.1.tar.xz
tar xf vulkansdk-linux-x86_64-1.4.341.1.tar.xz
cd 1.4.341.1 && ./vulkansdk all -j $(nproc)
export VULKAN_SDK=$(pwd)/x86_64
export PATH=$VULKAN_SDK/bin:$PATH
export LD_LIBRARY_PATH=$VULKAN_SDK/lib:$LD_LIBRARY_PATH
# 3. Build grilly
git clone --recurse-submodules https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel $(nproc)
# 4. Install the compiled extension
cp build/grilly_core.*.so $(python -c "import grilly; print(grilly.__path__[0])")/
Windows
# 1. Install Vulkan SDK from https://vulkan.lunarg.com/sdk/home (Windows installer)
# 2. Install Visual Studio 2022 with C++ workload
git clone --recurse-submodules https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build\Release\grilly_core.*.pyd .
Pre-built binary (Windows x64, Python 3.12): Download grilly_core.cp312-win_amd64.pyd from the latest release and copy it into your grilly install directory.
macOS
# 1. Install Vulkan SDK from https://vulkan.lunarg.com/sdk/home#mac
brew install cmake ninja
# 2. Follow the Linux build steps above (uses MoltenVK)
Verify installation
import grilly
print(f"grilly {grilly.__version__}")
# Check GPU backend
try:
from grilly.backend import _bridge
print(f"Vulkan: {'enabled' if _bridge.is_available() else 'not available'}")
except ImportError:
print("Vulkan: not installed (numpy fallback active)")
Requirements
| | Minimum | Recommended | |---|---|---| | Python | 3.12+ | 3.12 | | GPU VRAM | 8 GB | 12 GB+ | | System RAM | 32 GB | 64 GB | | Vulkan | 1.1+ | 1.4 (latest SDK) |
Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series), Apple (M1+ via MoltenVK).
Quick Start
import numpy as np
from grilly import nn
from grilly.optim import AdamW
# Build a model
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10),
)
# Train
optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()
x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))
logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)
model.zero_grad()
model.backward(grad)
optimizer.step()
Autograd
from grilly.nn import Variable, tensor
x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad) # [2.0, 4.0, 6.0]
Functional API
import grilly.functional as F
out = F.linear(x, weight, bias)
out = F.relu(out)
out = F.softmax(out, dim=-1)
attn = F.flash_attention2(q, k, v)
See notebooks/01_getting_started.ipynb for a complete walkthrough.
Features
Layers (100+)
| Category | Modules |
|----------|---------|
| Linear | Linear, Embedding, CapsuleEmbedding, Dropout |
| Convolution | Conv1d, Conv2d |
| Recurrent | LSTM, LSTMCell, GRU, GRUCell |
| Normalization | LayerNorm, RMSNorm, BatchNorm1d/2d |
| Activations | ReLU, GELU, SiLU, SwiGLU, GCU, RoSwish |
| Attention | FlashAttention2/3, HYLAAttention, MultiheadAttention, RoPE |
| LoRA | LoRALinear, LoRAAttention, LoRAModel |
| Pooling | MaxPool2d, AvgPool2d, AdaptiveMaxPool2d/AvgPool2d |
| Loss | MSELoss, CrossEntropyLoss, BCELoss |
| Containers | Sequential, Residual |
| Multimodal | PerceiverIO, ImageBindFusion, FlamingoFusion, VisionLanguageModel |
| Memory | MemoryRead, MemoryWrite, MemoryContextAggregate |
| Routing | DomainRouter, DomainPredictor, ExpertCombiner |
Spiking Neural Networks
Full SNN framework with GPU-accelerated spike dynamics:
- Neurons:
IFNode,LIFNode,ParametricLIFNode - Surrogate gradients:
ATan,Sigmoid,FastSigmoid - Synapses:
STPSynapse,DualTimescaleSynapse,SynapseFilter - Temporal containers:
SeqToANNContainer,MultiStepContainer - Spiking attention:
SpikingSelfAttention,QKAttention,TemporalWiseAttention - ANN-to-SNN conversion:
Converter,VoltageScaler
Optimizers
| Optimizer | Description |
|-----------|-------------|
| Adam | Classic Adam |
| AdamW | Adam with decoupled weight decay |
| SGD | Stochastic gradient descent |
| NLMS | Normalized Least Mean Squares |
| NaturalGradient | Fisher-preconditioned |
| HypergradientAdamW | OSGM-style auto learning rate |
| AutoHypergradientAdamW | Fully automatic hypergradient |
| AffectAdam | Emotion-weighted updates |
Schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau, OneCycleLR.
Experimental Modules
| Module | Description |
|--------|-------------|
| experimental.vsa | Vector Symbolic Architectures (binary, holographic, block-codes, resonator networks) |
| experimental.moe | Mixture of Experts (relational encoder, resonator routing) |
| experimental.temporal | Temporal reasoning (causal chains, counterfactuals, world models) |
| experimental.cognitive | Cognitive controller (working memory, simulation, understand-think-speak) |
| experimental.language | Language processing (encoding, generation, parsing) |
Architecture
Python API C++ Bridge GPU Shaders
───────────── ────────── ───────────
nn.Module layers pybind11 bindings 231 SPIR-V kernels
F.* stateless ops → dual-validity tensors → AMD / NVIDIA / Intel
optim.* optimizers zero CPU↔GPU ping-pong No CUDA dependency
autograd engine buffer pool management Vulkan 1.1+ compute
3-Level GPU Fallback
Every operation has automatic fallback:
- grilly C++ / Vulkan -- native compute shaders (fastest)
- PyTorch CUDA -- if torch is available (fast)
- NumPy CPU -- always available (correct)
Same API, same results, different speed. Your code never changes.
GPU Kernels (231 operations)
| Category | Count | Examples | |----------|-------|---------| | Linear algebra | 20+ | GEMM, FFT, SVD, matmul | | Attention | 15+ | flash attention, multi-head, spiking | | Convolution | 10+ | conv2d forward/backward, im2col | | Learning | 20+ | Adam, STDP, Hebbian, EWC, NLMS | | VSA | 10+ | bind, bundle, similarity, resonator | | SNN | 15+ | LIF/IF neuron, synapse, spike generation | | Normalization | 10+ | layer norm, batch norm, RMS norm | | Activat
