ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
Install / Use
/learn @siboehm/ShallowSpeedREADME
Shallowspeed
A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:
- Sequential models / deep MLPs, training using SGD.
- Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
- Pipeline parallel training:
- Naive schedule without interleaved stages.
- Gpipe schedule with interleaved FWD & interleaved BWD.
- (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
- Any combination of DP & PP algorithms.
Setup
conda env create
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python download_dataset.py
pytest
Usage
# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe
Internals

Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
