RustTensor
A learning-focused, high-performance tensor computation library built from scratch in Rust, featuring automatic differentiation and CPU/CUDA backends.
Install / Use
/learn @ramsyana/RustTensorREADME
RustTensor Library
A learning-focused, high-performance tensor computation library built from scratch in Rust, featuring automatic differentiation and CPU/CUDA backends.
Vision & Goals
This library is primarily an educational exploration into building the core components of a modern deep learning framework. Key goals include:
- Deep Understanding: Gain insight into how Tensors, automatic differentiation (Autograd), backend abstractions (CPU/GPU), and optimizers function internally by implementing them directly.
- Performance with Rust & CUDA: Leverage Rust's safety and performance alongside custom CUDA kernels and cuBLAS integration for efficient GPU acceleration, complementing a solid
ndarray-based CPU backend. - Rust ML Foundation: Provide a growing set of building blocks (Tensors, a comprehensive suite of Ops, Autograd, multiple Optimizers, and foundational NN Layers) for defining, training, and experimenting with custom machine learning models, including CNNs and sequence models, entirely within the Rust ecosystem.
Documentation
- User Guide: Step-by-step guide to using the library, from installation to advanced features.
- Architecture Overview: Detailed explanation of the library's design and components.
- Performance Guide: Benchmarking, profiling, and optimization information.
Project Status
Status: This library is under active development. While core features like CPU/CUDA backends, autograd, and foundational operations are implemented and tested (sufficient for training MLPs like the MNIST example), it currently serves educational and experimental purposes best.
- Strengths: Clear backend abstraction, working CUDA integration with custom kernels, functional dynamic autograd, extensive set of mathematical and array manipulation operations with CPU/CUDA backends, support for foundational CNN layers (Conv2D, MaxPool2D, Conv2DTranspose), multiple standard optimizers (SGD, Adam, Adagrad, MomentumSGD), and demonstrated capability to build and train MLPs, CNNs, and even character-level LSTMs (from fundamental ops).
- Limitations: While foundational layers like Conv2D, MaxPool2D, and Conv2DTranspose are implemented, more advanced/specialized layers (e.g., optimized RNN/LSTM cells, Attention mechanisms) are future work. API is stabilizing but may still see minor evolutionary changes.
Contributions and feedback are highly welcome!
Features
- Operator Overloading & Ergonomic API:
- Use standard Rust operators (
+,-,*,/) for arithmetic on tensors. - Intuitive methods like
.mean(),.backward(),.matmul(), and more for common operations. - Cleaner, more readable code for model building and experimentation.
- Use standard Rust operators (
Debugging and Introspection
-
.show("label"): Prints the tensor's ID, shape, and a sample of its data. -
.show_shape("label"): Prints the tensor's ID and shape. -
CPU & CUDA Backends:
- CPU backend using
ndarrayfor host computation. - Supports optional integration with system BLAS libraries (like OpenBLAS) for potentially accelerated
matmulvia feature flags (see below).
- CPU backend using
-
CUDA backend leveraging custom kernels and cuBLAS (via
custandcublas-sys) for GPU acceleration (requirescudafeature). -
Dynamic Autograd:
- Constructs computation graphs on-the-fly.
- Computes gradients automatically via reverse-mode differentiation.
-
Comprehensive Operations Suite:
- Arithmetic: Add, Subtract, Multiply, Divide (with broadcasting).
- Matrix: Matmul (CPU/cuBLAS), Transpose.
- Trigonometric: Sin, Cos, Tan.
- Exponential & Power: Exp, Log (ln), Pow, Sqrt, Square.
- Activation Functions: ReLU, Sigmoid, Tanh, Softplus, ELU, LogSoftmax.
- Reduction Operations: Sum, Mean, Max, Min, Prod, LogSumExp (global or along axes).
- Indexing & Manipulation: Slice, Concat, ExpandDims, Squeeze, View/Reshape.
- Comparison & Clipping: Equal, Greater, Less (and variants), Clip.
- Normalization-related: LogSoftmax.
-
Optimizers:
- Stochastic Gradient Descent (SGD)
- Adam
- Adagrad
- MomentumSGD
- (All optimizers support both CPU and CUDA backends with custom kernels for GPU acceleration where applicable).
-
Serialization:
- Save and load tensors to/from files with the
serializationfeature. - Seamless cross-device serialization (save from GPU, load to CPU and vice versa).
- Preserves tensor data, shape, gradient (if present), and metadata.
- Save and load tensors to/from files with the
-
Neural Network Layers:
- Convolutional:
Conv2D(with CPU and CUDA im2col/col2im + matmul implementations). - Pooling:
MaxPool2D(with CPU and CUDA implementations, including index tracking for backward pass). - Transposed Convolution:
Conv2DTranspose(implemented for CPU and CUDA).
- Convolutional:
-
Rich Examples Suite:
- MLP for MNIST: Trains a Multi-Layer Perceptron on the MNIST dataset (CPU:
train_mnist_cpu.rs, GPU:train_mnist_gpu.rs). - CNN for MNIST: Demonstrates Convolutional Neural Network training on MNIST, utilizing
Conv2DandMaxPool2Dlayers (CPU:train_mnist_cnn_cpu.rs, GPU:train_mnist_cnn_gpu.rs). - Sine Wave Regression: A simple MLP model learns to fit a noisy sine wave, showcasing basic regression and optimization (CPU:
sine_regression_cpu.rs). - Character-Level LSTM RNN: A more advanced example building an LSTM cell from fundamental tensor operations to perform character-level text generation, demonstrating the flexibility of the autograd system (CPU:
lstm_char_rnn_cpu.rs).
- MLP for MNIST: Trains a Multi-Layer Perceptron on the MNIST dataset (CPU:
-
Built in Rust: Aims to provide a memory-safe and performant implementation.
Requirements
Basic Setup
- Rust 1.70 or later (check
Cargo.tomlfor specific MSRV if set). - Cargo (Rust's package manager).
Dataset Requirement (MNIST)
- Before running or testing any MNIST examples, you must obtain the dataset files:
mnist_train.csvmnist_test.csv
- Place both files inside a
data/directory at the project root (i.e.,./data/mnist_train.csv). - These files are commonly available online—please search for "mnist_train.csv" and "mnist_test.csv" to find sources. (Direct links are not provided here.)
CUDA Support (Optional)
To enable and use the CUDA backend (--features cuda):
-
NVIDIA CUDA Toolkit: Version 11.0 or later recommended. This includes the
nvcccompiler, runtime libraries (likecudart), and development libraries (likecublas). -
NVIDIA GPU: A CUDA-capable GPU (Compute Capability 3.5+ recommended, check
custcrate compatibility). -
NVIDIA Driver: An up-to-date driver compatible with your GPU and CUDA Toolkit version.
-
Environment Variables: Crucial for both building and running:
CUDA_PATH: (Build & Runtime) Set to the root directory of your CUDA Toolkit installation (e.g.,/usr/local/cuda-11.8,C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8). Needed to findnvccand headers/libs.CUBLAS_LIB_DIR: (Build Time) Path to the cuBLAS library file (e.g.,$CUDA_PATH/lib64,%CUDA_PATH%\lib\x64). Used bybuild.rsto link against cuBLAS.LD_LIBRARY_PATH(Linux/macOS) orPATH(Windows): (Runtime) Must include the directory containing CUDA runtime libraries (libcudart.so,libcublas.so,.dllequivalents) so the executable can find them. Often this is$CUDA_PATH/lib64on Linux or%CUDA_PATH%\binon Windows.
Example (Linux/macOS):
# Adjust version/path as needed export CUDA_PATH=/usr/local/cuda-11.8 export CUBLAS_LIB_DIR=$CUDA_PATH/lib64 # Add CUDA libs to runtime linker path export LD_LIBRARY_PATH=$CUDA_PATH/lib64:${LD_LIBRARY_PATH:-}
Installation
Add this crate to your project's Cargo.toml:
[dependencies]
# CPU only:
rust_tensor_library = "0.1.0"
# --- OR ---
# With CUDA support (ensure environment variables are set *before* building!):
# rust_tensor_library = { version = "0.1.0", features = ["cuda"] }
# With serialization support:
# rust_tensor_library = { version = "0.1.0", features = ["serialization"] }
# With both CUDA and serialization support:
# rust_tensor_library = { version = "0.1.0", features = ["cuda", "serialization"] }
Quick Start
use rust_tensor_library::{Tensor, CpuBackend};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create tensors that require gradient tracking
let a = Tensor::<CpuBackend>::from_vec(vec![1.0, 2.0, 3.0], &[3], true)?;
let b = Tensor::<CpuBackend>::from_vec(vec![4.0, 5.0, 6.0], &[3], true)?;
// Perform operations
let c = &a + &b; // Element-wise addition
let d = c.mean(None)?; // Global mean reduction
// Print results
println!("a: {:?}", a.to_vec()?);
println!("b: {:?}", b.to_vec()?);
println!("c = a + b: {:?}", c.to_vec()?);
println!("d = mean(c): {:?}", d.to_vec()?);
// Compute gradients
d.backward()?;
// Access and print gradients
if let Some(grad_a_ref) = a.grad() {
let grad_a_data = CpuBackend::copy_to_host(&*grad_a_ref)?;
println!("Gradient of a: {:?}", grad_a_data);
// For d = mean(a+b), and a = [a1, a2, a3], b = [b1, b2, b3]
// d = ((a1+b1) + (a2+b2) + (a3+b3)) / 3
// d(d)/da_i = 1/3. So grad_a should be [1/3, 1/3, 1/3]
// Expected: [0.333..., 0.333..., 0.333...]
}
if let Some(grad_b_ref) = b.grad() {
let grad_b_data = CpuBackend::copy_to_host(&*grad_b_ref)?;
println!("Gradient of b: {:?}", grad_b_data);
// Similarly, d(
