ShallowSpeed

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Generate Convert Improve

Install / Use

/learn @siboehm/ShallowSpeed

About this skill

Quality Score

0/100

README

Shallowspeed

A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:

Sequential models / deep MLPs, training using SGD.
Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
Pipeline parallel training:
- Naive schedule without interleaved stages.
- Gpipe schedule with interleaved FWD & interleaved BWD.
- (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
Any combination of DP & PP algorithms.

Setup

conda env create
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python download_dataset.py
pytest

Usage

# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe

Internals

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

siboehm

View profile

View on GitHub

GitHub Stars164

CategoryEducation

Updated16d ago

Forks9

siboehm/ShallowSpeed

Languages

Python

Security Score

85/100

Audited on Mar 20, 2026

No findings