Tenflowers
A pure Rust implementation of TensorFlow, providing a full-featured machine learning framework with Rust's safety and performance.
Install / Use
/learn @cool-japan/TenflowersREADME
TenfloweRS
A pure Rust implementation of TensorFlow, providing a full-featured machine learning framework with Rust's safety and performance.
v0.1.0 (2026-03-20)
TenfloweRS v0.1.0 is the first release, with 12,949 tests passing across 6 crates, zero clippy warnings, zero security vulnerabilities, and comprehensive documentation.
Overview
TenfloweRS is a native Rust machine learning framework inspired by TensorFlow, designed to bring the power of deep learning to the Rust ecosystem. It leverages Rust's memory safety, zero-cost abstractions, and excellent performance while maintaining compatibility with the broader ML ecosystem through ONNX support.
Design Principles
TenfloweRS adapts TensorFlow's proven architecture to Rust's strengths:
- Memory Safety First: All operations are memory-safe by design, eliminating segfaults and data races
- Zero-Cost Abstractions: High-level APIs compile down to efficient machine code
- Explicit over Implicit: Clear ownership and error handling following Rust conventions
- Modular Architecture: Organized as a workspace of focused, reusable crates
- Cross-Platform: Native support for Windows, macOS, and Linux with unified GPU abstraction
- Pure Rust: No C/Fortran dependencies in the default build -- the entire stack is 100% Rust
TensorFlow to TenfloweRS Mapping
| TensorFlow Concept | TenfloweRS Implementation |
|-------------------|---------------------------|
| tf.Tensor | Tensor<T> with static typing |
| tf.Operation | Op trait with registered kernels |
| tf.Graph | Graph struct with ownership semantics |
| tf.Session | Session trait for graph execution |
| tf.GradientTape | GradientTape for automatic differentiation |
| tf.keras.Layer | Layer trait with builder pattern |
| tf.data.Dataset | Iterator-based Dataset trait |
| tf.device | Device enum with placement control |
Key Features
- Dual Execution Modes: Both eager execution (PyTorch-style) and static computation graphs (TensorFlow-style)
- Pure Rust Implementation: No C/C++ dependencies in the core, ensuring memory safety
- GPU Support: Cross-platform GPU acceleration via WGPU (Metal, Vulkan, DirectX)
- Rust Scientific Stack: Built on NumRS2 and SciRS2 for numerical computing
- Python Bindings: PyO3-based FFI crate with 48 passing tests
- Tensorboard Integration: Pure Rust implementation with no protobuf dependency
- ONNX Support: Import and export models for cross-framework compatibility
- Performance: SIMD vectorization, optional BLAS integration, and parallel execution
- 150+ Research Domains: From transformers and diffusion models to quantum ML and protein structure prediction
- Production Ready: 12,949 tests passing, 0 security vulnerabilities, comprehensive docs
Project Status
Current Version: 0.1.0 (Released 2026-03-20)
First release with full-featured ML capabilities across all 6 crates.
v0.1.0 Quality Metrics
- Tests: 12,949 passing (100% pass rate)
- Code: 1,453 Rust files, ~641K lines of Rust code
- Security: 0 vulnerabilities
- Clippy: 0 warnings, 0 errors
- Rustdoc: Builds clean with
-D warnings - TODO markers: 0 remaining
Published Crates
| Crate | Tests | Status | Description | |-------|-------|--------|-------------| | tenflowers-core | 675 | Stable | Core tensor operations and GPU support | | tenflowers-autograd | 334 | Stable | Automatic differentiation engine | | tenflowers-neural | 11,407 | Stable | Neural network layers, models, and 150+ research domains | | tenflowers-dataset | 472 | Stable | Data loading and preprocessing | | tenflowers-ffi | 48 | Stable | Python bindings via PyO3 | | tenflowers | 13 (doc) | Stable | Unified API and prelude |
What Is Included
- Core tensor operations fully tested and validated
- Automatic differentiation engine with comprehensive gradient support
- Neural network layers (Dense, Conv2D, BatchNorm, Dropout, Attention, RNN, GNN, Transformers, and many more)
- Training utilities (optimizers including SGD, Adam, AdamW, LAMB, Lion, Muon; loss functions; training loops; LR schedulers)
- Data loading pipeline with multi-format support
- GPU acceleration via WGPU (cross-platform)
- SciRS2/NumRS2 ecosystem integration
- Python bindings with PyO3 (48 tests passing)
- Tensorboard logging (pure Rust, no protobuf dependency)
- Security hardening (zero vulnerabilities)
- Comprehensive documentation
tenflowers-neural Feature Coverage
The neural crate alone has 11,407 tests covering:
Core architectures: attention mechanisms (multi-head, flash, ALiBi, RoPE), RNN (LSTM, GRU, bidirectional), transformers (encoder, decoder, efficient variants including RetNet, Mamba-2, GQA), CNN, graph neural networks (GCN, GAT, GraphSAGE, GIN, and advanced variants)
Generative models: normalizing flows, diffusion models, GANs, VAEs, energy-based models, neural rendering (3D Gaussian splatting, NeRF)
Reinforcement learning: policy gradient, actor-critic, PPO, SAC, multi-agent RL, safe RL, inverse RL, reward shaping, world models
Scientific ML: physics-informed neural networks (PINNs), neural ODEs/SDEs, operator learning (FNO, DeepONet, WNO, GNO), differentiable physics, simulation-based inference
Domain-specific: molecular GNN, protein structure prediction, drug discovery, medical imaging, audio models, speech recognition, video understanding, geospatial ML, climate ML, satellite ML, digital pathology, bio ML
Advanced methods: Bayesian deep learning, federated learning, meta-learning, NAS, knowledge distillation, quantum ML, geometric deep learning, causal inference, optimal transport, topological ML, continual learning, active learning, conformal prediction, and many more
Installation
Add TenfloweRS to your Cargo.toml:
[dependencies]
tenflowers-core = "0.1.0"
tenflowers-neural = "0.1.0"
For GPU support:
[dependencies]
tenflowers-core = { version = "0.1.0", features = ["gpu"] }
For the unified API:
[dependencies]
tenflowers = "0.1.0"
Quick Start
Basic Tensor Operations
use tenflowers_core::{Tensor, Device, Context};
// Create a context for eager execution
let ctx = Context::new()?;
// Create tensors
let a = Tensor::<f32>::ones(&[2, 3]);
let b = Tensor::<f32>::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3])?;
// Operations execute immediately in eager mode
let c = a.add(&b)?;
let d = c.matmul(&b.transpose()?)?;
// Move to GPU
let gpu_tensor = a.to(Device::Gpu(0))?;
// Automatic differentiation
let tape = GradientTape::new();
let x = Tensor::variable(vec![1.0, 2.0, 3.0], &[3]);
let y = tape.watch(x.clone());
let z = y.pow(2.0)?;
let grads = tape.gradient(&z, &[&x])?;
Graph Mode (TensorFlow 1.x style)
use tenflowers_core::{Graph, Session, Placeholder};
// Build a computation graph
let graph = Graph::new();
let a = graph.placeholder::<f32>("input_a", &[None, 784])?;
let w = graph.variable("weights", &[784, 10])?;
let b = graph.variable("bias", &[10])?;
let y = a.matmul(&w)?.add(&b)?;
// Create a session and run
let session = Session::new(&graph)?;
session.run(
&[("input_a", input_tensor)],
&["output"],
&mut outputs
)?;
Building a Neural Network
use tenflowers_neural::{Sequential, Dense, Conv2D, Model};
use tenflowers_core::Tensor;
// Define a CNN for image classification
let mut model = Sequential::new(vec![
Box::new(Conv2D::new(32, (3, 3)).with_activation("relu")),
Box::new(Conv2D::new(64, (3, 3)).with_activation("relu")),
Box::new(layers::GlobalAveragePooling2D::new()),
Box::new(Dense::new(128, true).with_activation("relu")),
Box::new(layers::Dropout::new(0.5)),
Box::new(Dense::new(10, true).with_activation("softmax")),
]);
// Compile the model
model.compile(
optimizer::Adam::new(0.001),
loss::SparseCategoricalCrossentropy::new(),
vec![metrics::Accuracy::new()]
)?;
// Train the model
model.fit(
&train_dataset,
epochs: 10,
batch_size: 32,
validation_data: Some(&val_dataset),
)?;
Data Pipeline
use tenflowers_dataset::{Dataset, DataLoader};
// Create a dataset from tensors
let dataset = Dataset::from_tensor_slices((images, labels))?
.shuffle(1000)
.batch(32)
.prefetch(2);
// Iterate through batches
for (batch_images, batch_labels) in dataset.iter() {
// Training step
}
Architecture
TenfloweRS follows a modular architecture inspired by TensorFlow:
tenflowers/
├── tenflowers-core/ # Core tensor operations and device management
│ ├── tensor/ # Tensor implementation with device support
│ ├── ops/ # Operation registry and implementations
│ ├── kernels/ # CPU and GPU kernel implementations
│ ├── graph/ # Computation graph representation
│ └── device/ # Device abstraction and management
├── tenflowers-autograd/ # Automatic differentiation engine
│ ├── tape/ # GradientTape for eager mode
│ ├── graph_grad/ # Graph-based backpropagation
│ └── ops/ # Gradient definitions for operations
├── tenflowers-neural/ # Neural network layers, models, and research domains
│ ├── layers/ # Layer implementations (attention, RNN, GNN, etc.)
│ ├── optimizers/ # Training optimizers (SGD, Adam, LAMB, Lion, Muon)
│ ├── rl/ # Rein
