Graphzero
graphzero: High performance C++ backed python library for graphs
Install / Use
/learn @KrishSingaria/GraphzeroREADME
GraphZero
High-Performance, Zero-Copy Graph Engine for Massive Datasets on Consumer Hardware.
GraphZero is a C++ graph processing engine with lightweight Python bindings designed to solve the "Memory Wall" in Graph Neural Networks (GNNs). It allows you to load and sample 100 Million+ node graphs (like ogbn-papers100M) and their massive feature matrices on a standard 16GB RAM laptop—something standard libraries like PyTorch Geometric (PyG) or DGL cannot do.
The Problem
GNN datasets can be massive. ogbn-papers100M contains 111 Million nodes, 1.6 Billion edges, and gigabytes of node embeddings.
- Standard approach (PyG/NetworkX): Tries to load the entire graph structure and all node features into RAM before training begins.
- The Result:
MemoryError(OOM) on consumer hardware. You need 64GB+ RAM servers just to load the data.
The Solution:
GraphZero abandons the "Load-to-RAM" model. Instead, it uses a custom Zero-Copy Architecture:
- Memory Mapping (
mmap): The graph and its features stay on disk. The OS only loads the specific "hot" pages needed for computation into RAM via page faults. - Compressed CSR (
.gl): A custom binary format that compresses raw edges by ~60% (30GB CSV $\to$ 13GB Binary). - Columnar Tensor Store (
.gd): A raw, C-contiguous binary format for node features that instantly translates to PyTorch tensors without memory allocation. - Parallel Sampling: OpenMP-accelerated random walks that saturate NVMe SSD throughput, using thread-local RNGs to eliminate lock contention.
🏆 Benchmarks: GraphZero vs. PyTorch Geometric
Task: Load ogbn-papers100M (56GB Raw) and perform random walks.
Hardware: Windows Laptop (16GB RAM, NVMe SSD).
| Metric | GraphZero (v0.2) | PyTorch Geometric | | --- | --- | --- | | Load Time | 0.000000 s ⚡ | FAILED (Crash) ❌ | | Peak RAM Usage | ~5.1 GB (OS Cache) | >24.1 GB (Required) | | Throughput | 1,264,000 steps/s | N/A | | Status | ✅ Success | ❌ OOM Error |
Proof of Performance
<p float="left "> <img src="benchmark/images/gz_bench.png" width="45%" /> <img src="benchmark/images/py_crash.png" width="45%" /> </p>Left: GraphZero loading instantly and utilizing OS Page Cache. Right: PyG crashing with
Unable to allocate 24.1 GiB.
📦 Installation
GraphZero is available on PyPI:
pip install graphzero
🚀 Quick Start
1. Convert Your Data (Topology & Features)
GraphZero uses high-efficiency binary formats. Convert your generic CSV lists once.
example edges.csv, weights are optional:
src,dst,weight
0,1,0.5
1,2,1.0
import graphzero as gz
# 1. Convert Topology (Edges & Weights) to .gl
gz.convert_csv_to_gl(
input_csv="dataset/edges.csv",
output_bin="graph.gl",
directed=True
)
# 2. Convert Node Features to .gd (Float32, Int64, etc.)
gz.convert_csv_to_gd(
csv_path="dataset/features.csv",
out_path="features.gd",
dtype=gz.DataType.FLOAT32
)
2. High-Speed Sampling & Zero-Copy Tensors
Once converted, the graph and its multi-gigabyte feature matrix are instantly accessible without consuming RAM.
import graphzero as gz
import numpy as np
# TOPOLOGY
# Zero-Copy Load (Instant)
g = gz.Graph("graph.gl")
# Define Start Nodes
start_nodes = np.random.randint(0, g.num_nodes, 1000).astype(np.uint64)
# Parallel Biased Random Walk (Node2Vec style: p=1.0, q=0.5)
walks = g.batch_random_walk(
start_nodes=start_nodes,
walk_length=10,
p=1.0,
q=0.5
)
# FEATURES
# Zero-Copy Feature Load (Instant)
fs = gz.FeatureStore("features.gd")
# Get a perfect 2D Numpy/PyTorch Tensor mapping directly to the SSD
# RAM used: 0 Bytes!
node_features = fs.get_tensor()
print(f"Graph loaded. Feature Matrix Shape: {node_features.shape}")
⚙️ Under the Hood
GraphZero is built for Systems & GNN enthusiasts.
- Core: C++20 with
nanobindfor Python bindings. - Parallelism: Uses
#pragma ompwith thread-local deterministic RNGs. - IO: Direct
CreateFileMapping(Windows) andmmap(Linux) calls with alignment optimization (4KB/2MB pages).
🌟 Current Features List (v0.2)
GraphZero currently supports the following high-performance ML capabilities:
Graph Structural Engine
- Instant Ingestion: Fast
mmap-backed loading of directed, undirected, and weighted graphs. - Zero-Copy CSR: Custom
.glbinary format for dense, continuous memory alignment and 64-byte CPU cache line optimization. - Thread-Safe Sampling: OpenMP-accelerated
batch_random_walk_uniformandbatch_random_fanout. - Biased Walks (Node2Vec): Hardware-optimized Alias Table generation for $O(1)$ weighted sampling (
batch_random_walkwithpandqparameters). - Fault-Tolerant: Automatic handling of dead-ends (sinks) and out-of-bounds nodes.
Graph Data Engine
- Columnar Tensor Store: Custom
.gdbinary format for storing $N \times F$ feature matrices. - Strong Typing: Native C++ template dispatching supporting
FLOAT32,FLOAT64,INT32, andINT64. - Zero-Copy Bridge: Direct translation of
mmappointers to Numpy/PyTorch multidimensional arrays.
🗺️ Roadmap
-
v0.3 (The Algorithmic Core): High-performance analytics engine adding OpenMP-accelerated Parallel BFS/DFS, PageRank, and Connected Components.
-
v0.4 (Dynamic Updates): Breaking the immutable CSR barrier via an LSM-Tree/Adjacency List memory overlay to allow real-time edge/node insertions.
-
v0.5 (Production Hardening): ACID-compliant safety for multi-process PyTorch training using Reader-Writer Locks, Write-Ahead Logging (WAL), and graceful exception handling.
📄 License
MIT License. Created by Krish Singaria (IIT Mandi).
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
