CppNet
A high-performance C++ deep learning library for building and training neural networks
Install / Use
/learn @LoqmanSamani/CppNetREADME
CppNet
<div align="center"> <img src="imgs/___log2.png" alt="CppNet Logo" width="400"/> </div> <p align="center"> <b>CppNet</b> is a high-performance C++17 deep learning library for building and training neural networks from scratch.<br/> Built on <a href="https://eigen.tuxfamily.org">Eigen</a> for fast tensor operations, <a href="https://www.openmp.org/">OpenMP</a> for CPU parallelism, and <a href="https://developer.nvidia.com/cuda-zone">CUDA</a> for GPU acceleration. </p> <p align="center"> <a href="#installation"><img src="https://img.shields.io/badge/C%2B%2B-17-blue.svg" alt="C++17"/></a> <a href="#installation"><img src="https://img.shields.io/badge/CMake-%E2%89%A53.18-blue.svg" alt="CMake"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License"/></a> <a href="#gpu-acceleration"><img src="https://img.shields.io/badge/CUDA-optional-yellowgreen.svg" alt="CUDA"/></a> <a href="https://loqmansamani.github.io/CppNet/"><img src="https://img.shields.io/badge/Docs-Website-58a6ff.svg" alt="Website"/></a> </p>Table of Contents
- Features
- Installation
- Quick Start
- API Overview
- Examples
- GPU Acceleration
- Benchmarks
- Testing
- Project Structure
- Roadmap
- Contributing
- License
Features
- High Performance — Vectorized tensor operations via Eigen, multi-threaded with OpenMP, full CUDA GPU backend for all layers, activations, losses, and optimizers.
- Rich Layer Library — Linear, Conv2D, MaxPool2D, RNN, LSTM, GRU, Multi-Head Attention, Dropout, BatchNorm, Embedding, Residual, GlobalPool, MeanPool1D, Flatten.
- Multiple Backends — Per-layer compute backend selection:
"cpu-eigen"(Eigen contractions),"cpu"(OpenMP loops),"gpu"(CUDA kernels). - Complete CUDA Coverage — 41 CUDA kernel files covering all layers, activations, losses, and optimizers for end-to-end GPU training.
- Modular Architecture — Clean separation of layers, activations, losses, optimizers, metrics, regularizations, and utilities.
- Training Utilities — DataLoader with batching & shuffling, learning rate schedulers, early stopping callbacks, gradient clipping, model serialization.
- Visualization — Built-in
TrainingLoggerfor tracking metrics and exporting training history to CSV. - Extensible — Abstract base classes for layers, losses, and optimizers make it straightforward to add custom components.
- Single-Header Access —
#include <CppNet/CppNet.hpp>brings in the entire library.
Installation
Prerequisites
| Dependency | Version | Required | |:-----------|:--------|:---------| | C++ compiler (GCC, Clang, MSVC) | C++17 support | Yes | | CMake | ≥ 3.18 | Yes | | Eigen3 | ≥ 3.3 | Yes | | OpenMP | any | Optional (CPU parallelism) | | CUDA Toolkit | any | Optional (GPU acceleration) |
Build from Source
git clone https://github.com/LoqmanSamani/CppNet.git
cd CppNet
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
Install System-Wide
sudo make install
This installs headers to /usr/local/include/CppNet/ and the static library to /usr/local/lib/.
Use in Your CMake Project
find_package(CppNet REQUIRED)
target_link_libraries(your_target PRIVATE CppNet::CppNet)
Quick Start
A minimal binary classification example:
#include <CppNet/CppNet.hpp>
#include <iostream>
int main() {
// Define layers
CppNet::Layers::Linear layer1(30, 64, "fc1", true, true, "cpu-eigen", "xavier");
CppNet::Layers::Linear layer2(64, 1, "fc2", true, true, "cpu-eigen", "xavier");
CppNet::Activations::ReLU relu("cpu-eigen");
CppNet::Activations::Sigmoid sigmoid;
// Loss & optimizer
CppNet::Losses::BinaryCrossEntropy loss_fn("mean");
CppNet::Optimizers::Adam optimizer;
float lr = 0.001;
// Training loop
for (int epoch = 0; epoch < 100; ++epoch) {
auto h = relu.forward(layer1.forward(X_train));
auto pred = sigmoid.forward(layer2.forward(h));
float loss = loss_fn.forward(pred, Y_train);
auto grad = loss_fn.backward(pred, Y_train);
grad = layer2.backward(sigmoid.backward(grad));
layer1.backward(relu.backward(grad));
layer2.step(optimizer, lr);
layer1.step(optimizer, lr);
std::cout << "Epoch " << epoch << " — Loss: " << loss << std::endl;
}
return 0;
}
API Overview
Layers
All layers inherit from CppNet::Layers::Layer and implement forward(), backward(), step(), freeze(), unfreeze(), and print_layer_info().
| Layer | Description | Key Parameters |
|:------|:------------|:---------------|
| Linear | Fully connected layer | in_size, out_size, bias, device, weight_init |
| Conv2D | 2D convolution | in_channels, out_channels, kernel_size, stride, padding |
| MaxPool2D | 2D max pooling | kernel_size, stride |
| Flatten | Reshape to 2D | — |
| RNN | Vanilla recurrent layer | input_size, hidden_size |
| LSTM | Long Short-Term Memory | input_size, hidden_size |
| GRU | Gated Recurrent Unit | input_size, hidden_size |
| MultiHeadAttention | Scaled dot-product multi-head attention | embed_dim, num_heads |
| Dropout | Dropout regularization | drop_rate |
| BatchNorm | Batch normalization | num_features |
| Embedding | Embedding lookup table | vocab_size, embed_dim |
| Residual | Residual (skip) connection wrapper | — |
| GlobalPool | Global average/max pooling | — |
| MeanPool1D | Mean pooling over sequence dimension | — |
Activations
| Activation | Function |
|:-----------|:---------|
| ReLU | $\max(0, x)$ |
| LeakyReLU | $\max(\alpha x, x)$ |
| Sigmoid | $\sigma(x) = \frac{1}{1 + e^{-x}}$ |
| Tanh | $\tanh(x)$ |
| Softmax | $\frac{e^{x_i}}{\sum_j e^{x_j}}$ |
All activations support both 2D and 4D tensor inputs and run on all three backends (cpu-eigen, cpu, gpu).
Losses
| Loss | Typical Use |
|:-----|:------------|
| MSE | Regression |
| MAE | Regression |
| Huber | Robust regression |
| BinaryCrossEntropy | Binary classification |
| CategoricalCrossEntropy | Multi-class classification |
| SoftmaxCrossEntropy | Multi-class (fused softmax + CE) |
All support configurable reduction modes ("mean", "sum") and CUDA GPU acceleration.
Optimizers
| Optimizer | Description |
|:----------|:------------|
| SGD | Stochastic Gradient Descent |
| Adam | Adaptive Moment Estimation (default: $\beta_1=0.9$, $\beta_2=0.999$, $\epsilon=10^{-8}$) |
| Adagrad | Adaptive gradient accumulation |
| Momentum | SGD with momentum |
| RMSProp | Root Mean Square Propagation |
All optimizers have dedicated CUDA kernels for GPU-side weight updates.
Metrics
CppNet::Metrics::accuracy(predictions, targets);
CppNet::Metrics::binary_accuracy(predictions, targets, 0.5);
CppNet::Metrics::precision(predictions, targets, 0.5);
CppNet::Metrics::recall(predictions, targets, 0.5);
CppNet::Metrics::f1_score(predictions, targets, 0.5);
Regularizations
CppNet::Regularizations::l1_penalty(weights, lambda);
CppNet::Regularizations::l2_penalty(weights, lambda);
CppNet::Regularizations::elastic_net_penalty(weights, lambda, l1_ratio);
// Corresponding gradient functions: l1_gradient, l2_gradient, elastic_net_gradient
Utilities
| Utility | Description |
|:--------|:------------|
| DataLoader | Batched iteration with shuffling. Supports range-based for loops. |
| Weight Init | Xavier (uniform/normal), He (uniform/normal), constant, custom. |
| Gradient Clipping | clip_by_value() and clip_by_norm(). |
| Serialization | save_model() / load_model() for full model persistence; tensor-level binary I/O. |
| LR Schedulers | StepLR, ExponentialLR, CosineAnnealingLR. |
| Callbacks | EarlyStopping with configurable patience, delta, and mode. |
| Elapsed Time | Training duration measurement. |
DataLoader example:
CppNet::Utils::DataLoader loader(X, Y, /*batch_size=*/32, /*shuffle=*/true);
for (auto& [x_batch, y_batch] : loader) {
// forward / backward / step
}
loader.reset(); // re-shuffle for next epoch
Learning rate scheduler example:
CppNet::Schedulers::CosineAnnealingLR scheduler(/*initial_lr=*/0.01, /*T_max=*/100);
for (int epoch = 0; epoch < 100; ++epoch) {
float lr = scheduler.step();
// ... train with lr
}
Visualization
CppNet::Visualizations::TrainingLogger logger;
// Inside training loop:
logger.log("train_loss", loss);
logger.log("val_accuracy", val_acc);
logger.next_epoch();
// After training:
logger.print_epoch_summary();
logger.export_csv("training_history.csv");
Examples
The examples/ directory contains complete, self-contained deep learning programs that train on synthetic data — no downloads required. Each example generates its own dataset, trains a model, and reports final metrics.
| Example | Architecture | Dataset | Key Components | Result |
|:--------|:------------|:--------|:---------------|:-------|
| mlp_classification.cpp | Linear→ReLU→Linear→ReLU→Linear | 3-class spiral (600 samples, 2D) | ReLU, SoftmaxCrossEntropy, Adam | ~75% accuracy |
| cnn_image_classification.cpp | Conv2D→ReLU→MaxPool2D→Flatt
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
last30days-skill
17.6kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
