Graphzero

graphzero: High performance C++ backed python library for graphs

Generate Convert Improve

Install / Use

/learn @KrishSingaria/Graphzero

About this skill

Quality Score

0/100

README

GraphZero

High-Performance, Zero-Copy Graph Engine for Massive Datasets on Consumer Hardware.

GraphZero is a C++ graph processing engine with lightweight Python bindings designed to solve the "Memory Wall" in Graph Neural Networks (GNNs). It allows you to load and sample 100 Million+ node graphs (like ogbn-papers100M) and their massive feature matrices on a standard 16GB RAM laptop—something standard libraries like PyTorch Geometric (PyG) or DGL cannot do.

The Problem

GNN datasets can be massive. ogbn-papers100M contains 111 Million nodes, 1.6 Billion edges, and gigabytes of node embeddings.

Standard approach (PyG/NetworkX): Tries to load the entire graph structure and all node features into RAM before training begins.
The Result: MemoryError (OOM) on consumer hardware. You need 64GB+ RAM servers just to load the data.

The Solution:

GraphZero abandons the "Load-to-RAM" model. Instead, it uses a custom Zero-Copy Architecture:

Memory Mapping (mmap): The graph and its features stay on disk. The OS only loads the specific "hot" pages needed for computation into RAM via page faults.
Compressed CSR (.gl): A custom binary format that compresses raw edges by ~60% (30GB CSV $\to$ 13GB Binary).
Columnar Tensor Store (.gd): A raw, C-contiguous binary format for node features that instantly translates to PyTorch tensors without memory allocation.
Parallel Sampling: OpenMP-accelerated random walks that saturate NVMe SSD throughput, using thread-local RNGs to eliminate lock contention.

🏆 Benchmarks: GraphZero vs. PyTorch Geometric

Task: Load ogbn-papers100M (56GB Raw) and perform random walks. Hardware: Windows Laptop (16GB RAM, NVMe SSD).

| Metric | GraphZero (v0.2) | PyTorch Geometric | | --- | --- | --- | | Load Time | 0.000000 s ⚡ | FAILED (Crash) ❌ | | Peak RAM Usage | ~5.1 GB (OS Cache) | >24.1 GB (Required) | | Throughput | 1,264,000 steps/s | N/A | | Status | ✅ Success | ❌ OOM Error |

Proof of Performance

Left: GraphZero loading instantly and utilizing OS Page Cache. Right: PyG crashing with Unable to allocate 24.1 GiB.

📦 Installation

GraphZero is available on PyPI:

pip install graphzero

🚀 Quick Start

1. Convert Your Data (Topology & Features)

GraphZero uses high-efficiency binary formats. Convert your generic CSV lists once.

example edges.csv, weights are optional:

src,dst,weight
0,1,0.5
1,2,1.0

import graphzero as gz

# 1. Convert Topology (Edges & Weights) to .gl
gz.convert_csv_to_gl(
    input_csv="dataset/edges.csv", 
    output_bin="graph.gl", 
    directed=True
)

# 2. Convert Node Features to .gd (Float32, Int64, etc.)
gz.convert_csv_to_gd(
    csv_path="dataset/features.csv",
    out_path="features.gd",
    dtype=gz.DataType.FLOAT32
)

2. High-Speed Sampling & Zero-Copy Tensors

Once converted, the graph and its multi-gigabyte feature matrix are instantly accessible without consuming RAM.

import graphzero as gz
import numpy as np

# TOPOLOGY
# Zero-Copy Load (Instant)
g = gz.Graph("graph.gl")

# Define Start Nodes
start_nodes = np.random.randint(0, g.num_nodes, 1000).astype(np.uint64)

# Parallel Biased Random Walk (Node2Vec style: p=1.0, q=0.5)
walks = g.batch_random_walk(
    start_nodes=start_nodes, 
    walk_length=10,
    p=1.0, 
    q=0.5
)

# FEATURES
# Zero-Copy Feature Load (Instant)
fs = gz.FeatureStore("features.gd")

# Get a perfect 2D Numpy/PyTorch Tensor mapping directly to the SSD
# RAM used: 0 Bytes!
node_features = fs.get_tensor() 

print(f"Graph loaded. Feature Matrix Shape: {node_features.shape}")

⚙️ Under the Hood

GraphZero is built for Systems & GNN enthusiasts.

Core: C++20 with nanobind for Python bindings.
Parallelism: Uses #pragma omp with thread-local deterministic RNGs.
IO: Direct CreateFileMapping (Windows) and mmap (Linux) calls with alignment optimization (4KB/2MB pages).

🌟 Current Features List (v0.2)

GraphZero currently supports the following high-performance ML capabilities:

Graph Structural Engine

Instant Ingestion: Fast mmap-backed loading of directed, undirected, and weighted graphs.
Zero-Copy CSR: Custom .gl binary format for dense, continuous memory alignment and 64-byte CPU cache line optimization.
Thread-Safe Sampling: OpenMP-accelerated batch_random_walk_uniform and batch_random_fanout.
Biased Walks (Node2Vec): Hardware-optimized Alias Table generation for $O(1)$ weighted sampling (batch_random_walk with p and q parameters).
Fault-Tolerant: Automatic handling of dead-ends (sinks) and out-of-bounds nodes.

Graph Data Engine

Columnar Tensor Store: Custom .gd binary format for storing $N \times F$ feature matrices.
Strong Typing: Native C++ template dispatching supporting FLOAT32, FLOAT64, INT32, and INT64.
Zero-Copy Bridge: Direct translation of mmap pointers to Numpy/PyTorch multidimensional arrays.

🗺️ Roadmap

v0.3 (The Algorithmic Core): High-performance analytics engine adding OpenMP-accelerated Parallel BFS/DFS, PageRank, and Connected Components.
v0.4 (Dynamic Updates): Breaking the immutable CSR barrier via an LSM-Tree/Adjacency List memory overlay to allow real-time edge/node insertions.
v0.5 (Production Hardening): ACID-compliant safety for multi-process PyTorch training using Reader-Writer Locks, Write-Ahead Logging (WAL), and graceful exception handling.

📄 License

MIT License. Created by Krish Singaria (IIT Mandi).

Related Skills

node-connect

344.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

96.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。