TenfloweRS

A pure Rust implementation of TensorFlow, providing a full-featured machine learning framework with Rust's safety and performance.

v0.1.0 (2026-03-20)

TenfloweRS v0.1.0 is the first release, with 12,949 tests passing across 6 crates, zero clippy warnings, zero security vulnerabilities, and comprehensive documentation.

Overview

TenfloweRS is a native Rust machine learning framework inspired by TensorFlow, designed to bring the power of deep learning to the Rust ecosystem. It leverages Rust's memory safety, zero-cost abstractions, and excellent performance while maintaining compatibility with the broader ML ecosystem through ONNX support.

Design Principles

TenfloweRS adapts TensorFlow's proven architecture to Rust's strengths:

Memory Safety First: All operations are memory-safe by design, eliminating segfaults and data races
Zero-Cost Abstractions: High-level APIs compile down to efficient machine code
Explicit over Implicit: Clear ownership and error handling following Rust conventions
Modular Architecture: Organized as a workspace of focused, reusable crates
Cross-Platform: Native support for Windows, macOS, and Linux with unified GPU abstraction
Pure Rust: No C/Fortran dependencies in the default build -- the entire stack is 100% Rust

TensorFlow to TenfloweRS Mapping

| TensorFlow Concept | TenfloweRS Implementation | |-------------------|---------------------------| | tf.Tensor | Tensor<T> with static typing | | tf.Operation | Op trait with registered kernels | | tf.Graph | Graph struct with ownership semantics | | tf.Session | Session trait for graph execution | | tf.GradientTape | GradientTape for automatic differentiation | | tf.keras.Layer | Layer trait with builder pattern | | tf.data.Dataset | Iterator-based Dataset trait | | tf.device | Device enum with placement control |

Key Features

Dual Execution Modes: Both eager execution (PyTorch-style) and static computation graphs (TensorFlow-style)
Pure Rust Implementation: No C/C++ dependencies in the core, ensuring memory safety
GPU Support: Cross-platform GPU acceleration via WGPU (Metal, Vulkan, DirectX)
Rust Scientific Stack: Built on NumRS2 and SciRS2 for numerical computing
Python Bindings: PyO3-based FFI crate with 48 passing tests
Tensorboard Integration: Pure Rust implementation with no protobuf dependency
ONNX Support: Import and export models for cross-framework compatibility
Performance: SIMD vectorization, optional BLAS integration, and parallel execution
150+ Research Domains: From transformers and diffusion models to quantum ML and protein structure prediction
Production Ready: 12,949 tests passing, 0 security vulnerabilities, comprehensive docs

Project Status

Current Version: 0.1.0 (Released 2026-03-20)

First release with full-featured ML capabilities across all 6 crates.

v0.1.0 Quality Metrics

Tests: 12,949 passing (100% pass rate)
Code: 1,453 Rust files, ~641K lines of Rust code
Security: 0 vulnerabilities
Clippy: 0 warnings, 0 errors
Rustdoc: Builds clean with -D warnings
TODO markers: 0 remaining

Published Crates

| Crate | Tests | Status | Description | |-------|-------|--------|-------------| | tenflowers-core | 675 | Stable | Core tensor operations and GPU support | | tenflowers-autograd | 334 | Stable | Automatic differentiation engine | | tenflowers-neural | 11,407 | Stable | Neural network layers, models, and 150+ research domains | | tenflowers-dataset | 472 | Stable | Data loading and preprocessing | | tenflowers-ffi | 48 | Stable | Python bindings via PyO3 | | tenflowers | 13 (doc) | Stable | Unified API and prelude |

What Is Included

Core tensor operations fully tested and validated
Automatic differentiation engine with comprehensive gradient support
Neural network layers (Dense, Conv2D, BatchNorm, Dropout, Attention, RNN, GNN, Transformers, and many more)
Training utilities (optimizers including SGD, Adam, AdamW, LAMB, Lion, Muon; loss functions; training loops; LR schedulers)
Data loading pipeline with multi-format support
GPU acceleration via WGPU (cross-platform)
SciRS2/NumRS2 ecosystem integration
Python bindings with PyO3 (48 tests passing)
Tensorboard logging (pure Rust, no protobuf dependency)
Security hardening (zero vulnerabilities)
Comprehensive documentation

tenflowers-neural Feature Coverage

The neural crate alone has 11,407 tests covering:

Core architectures: attention mechanisms (multi-head, flash, ALiBi, RoPE), RNN (LSTM, GRU, bidirectional), transformers (encoder, decoder, efficient variants including RetNet, Mamba-2, GQA), CNN, graph neural networks (GCN, GAT, GraphSAGE, GIN, and advanced variants)

Generative models: normalizing flows, diffusion models, GANs, VAEs, energy-based models, neural rendering (3D Gaussian splatting, NeRF)

Reinforcement learning: policy gradient, actor-critic, PPO, SAC, multi-agent RL, safe RL, inverse RL, reward shaping, world models

Scientific ML: physics-informed neural networks (PINNs), neural ODEs/SDEs, operator learning (FNO, DeepONet, WNO, GNO), differentiable physics, simulation-based inference

Domain-specific: molecular GNN, protein structure prediction, drug discovery, medical imaging, audio models, speech recognition, video understanding, geospatial ML, climate ML, satellite ML, digital pathology, bio ML

Advanced methods: Bayesian deep learning, federated learning, meta-learning, NAS, knowledge distillation, quantum ML, geometric deep learning, causal inference, optimal transport, topological ML, continual learning, active learning, conformal prediction, and many more

Installation

Add TenfloweRS to your Cargo.toml:

[dependencies]
tenflowers-core = "0.1.0"
tenflowers-neural = "0.1.0"

For GPU support:

[dependencies]
tenflowers-core = { version = "0.1.0", features = ["gpu"] }

For the unified API:

[dependencies]
tenflowers = "0.1.0"

Quick Start

Basic Tensor Operations

use tenflowers_core::{Tensor, Device, Context};

// Create a context for eager execution
let ctx = Context::new()?;

// Create tensors
let a = Tensor::<f32>::ones(&[2, 3]);
let b = Tensor::<f32>::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3])?;

// Operations execute immediately in eager mode
let c = a.add(&b)?;
let d = c.matmul(&b.transpose()?)?;

// Move to GPU
let gpu_tensor = a.to(Device::Gpu(0))?;

// Automatic differentiation
let tape = GradientTape::new();
let x = Tensor::variable(vec![1.0, 2.0, 3.0], &[3]);
let y = tape.watch(x.clone());
let z = y.pow(2.0)?;
let grads = tape.gradient(&z, &[&x])?;

Graph Mode (TensorFlow 1.x style)

use tenflowers_core::{Graph, Session, Placeholder};

// Build a computation graph
let graph = Graph::new();
let a = graph.placeholder::<f32>("input_a", &[None, 784])?;
let w = graph.variable("weights", &[784, 10])?;
let b = graph.variable("bias", &[10])?;
let y = a.matmul(&w)?.add(&b)?;

// Create a session and run
let session = Session::new(&graph)?;
session.run(
    &[("input_a", input_tensor)],
    &["output"],
    &mut outputs
)?;

Building a Neural Network

use tenflowers_neural::{Sequential, Dense, Conv2D, Model};
use tenflowers_core::Tensor;

// Define a CNN for image classification
let mut model = Sequential::new(vec![
    Box::new(Conv2D::new(32, (3, 3)).with_activation("relu")),
    Box::new(Conv2D::new(64, (3, 3)).with_activation("relu")),
    Box::new(layers::GlobalAveragePooling2D::new()),
    Box::new(Dense::new(128, true).with_activation("relu")),
    Box::new(layers::Dropout::new(0.5)),
    Box::new(Dense::new(10, true).with_activation("softmax")),
]);

// Compile the model
model.compile(
    optimizer::Adam::new(0.001),
    loss::SparseCategoricalCrossentropy::new(),
    vec![metrics::Accuracy::new()]
)?;

// Train the model
model.fit(
    &train_dataset,
    epochs: 10,
    batch_size: 32,
    validation_data: Some(&val_dataset),
)?;

Data Pipeline

use tenflowers_dataset::{Dataset, DataLoader};

// Create a dataset from tensors
let dataset = Dataset::from_tensor_slices((images, labels))?
    .shuffle(1000)
    .batch(32)
    .prefetch(2);

// Iterate through batches
for (batch_images, batch_labels) in dataset.iter() {
    // Training step
}

Architecture

TenfloweRS follows a modular architecture inspired by TensorFlow:

tenflowers/
├── tenflowers-core/      # Core tensor operations and device management
│   ├── tensor/           # Tensor implementation with device support
│   ├── ops/              # Operation registry and implementations
│   ├── kernels/          # CPU and GPU kernel implementations
│   ├── graph/            # Computation graph representation
│   └── device/           # Device abstraction and management
├── tenflowers-autograd/  # Automatic differentiation engine
│   ├── tape/             # GradientTape for eager mode
│   ├── graph_grad/       # Graph-based backpropagation
│   └── ops/              # Gradient definitions for operations
├── tenflowers-neural/    # Neural network layers, models, and research domains
│   ├── layers/           # Layer implementations (attention, RNN, GNN, etc.)
│   ├── optimizers/       # Training optimizers (SGD, Adam, LAMB, Lion, Muon)
│   ├── rl/               # Rein

Tenflowers

Install / Use

README