FloDl
rust recursive deep learning framework
Install / Use
/learn @fab2s/FloDlREADME
If You Know PyTorch, You Know floDl
<table> <tr><th>PyTorch</th><th>floDl</th></tr> <tr><td>model = nn.Sequential(
nn.Linear(2, 16),
nn.GELU(),
nn.LayerNorm(16),
nn.Linear(16, 2),
)
pred = model(x)
loss = F.mse_loss(pred, target)
loss.backward()
optimizer.step()
</td><td>
let model = FlowBuilder::from(Linear::new(2, 16)?)
.through(GELU)
.through(LayerNorm::new(16)?)
.through(Linear::new(16, 2)?)
.build()?;
let pred = model.forward(&x)?;
let loss = mse_loss(&pred, &target)?;
loss.backward()?;
optimizer.step()?;
</td></tr>
</table>
Same concepts, same names, same GPU kernels underneath. The ? operator
replaces silent failures with compile-time error handling. Drop replaces the
garbage collector. The full migration guide covers
every op, module, and pattern.
New to Rust? Read Rust for PyTorch Users — 10 patterns in 15 minutes.
Getting Started
With Docker (no Rust or libtorch needed):
curl -sL https://flodl.dev/init.sh | sh -s my-project
cd my-project
make build # first build (~5 min, downloads libtorch)
make run # train the template model
Without Docker — Rust 1.85+ and libtorch:
# Auto-detects CPU or CUDA
curl -sL https://raw.githubusercontent.com/fab2s/floDl/main/download-libtorch.sh | sh
cargo add flodl && cargo build
For CUDA: cargo add flodl --features cuda + CUDA toolkit.
Both paths generate an annotated training template. Edit src/main.rs to
build your model:
use flodl::*;
let model = FlowBuilder::from(Linear::new(2, 16)?)
.through(GELU)
.through(LayerNorm::new(16)?)
.also(Linear::new(16, 16)?) // residual connection
.through(Linear::new(16, 2)?)
.build()?;
let params = model.parameters();
let mut optimizer = Adam::new(¶ms, 0.01);
model.train();
for (input_t, target_t) in &batches {
let input = Variable::new(input_t.clone(), true);
let target = Variable::new(target_t.clone(), false);
let pred = model.forward(&input)?;
let loss = mse_loss(&pred, &target)?;
optimizer.zero_grad();
loss.backward()?;
clip_grad_norm(¶ms, 1.0)?;
optimizer.step()?;
}
The Graph Builder
floDl's fluent graph builder lets you describe complex architectures as
readable data flow — no boilerplate, no nn.Module subclassing.
let model = FlowBuilder::from(Linear::new(2, 16)?)
.through(GELU) // activation
.through(LayerNorm::new(16)?) // normalization
.also(Linear::new(16, 16)?) // residual connection
.through(Linear::new(16, 2)?) // output projection
.build()?;
build() returns a Graph that implements Module — you can nest it
inside other graphs. Things get interesting when architectures get complex:
let g = FlowBuilder::from(encoder).tag("encoded")
.split(modules![head_a, head_b, head_c]).merge(MergeOp::Mean)
.loop_body(refinement_block).for_n(3).tag("refined")
.gate(router, modules![expert_a, expert_b]).using(&["encoded"])
.switch(selector, modules![light_path, heavy_path]).using(&["refined"])
.through(StateAdd).using(&["memory"]).tag("memory")
.loop_body(decoder).while_cond(halt_condition, 10)
.through(output_head)
.build()?;
Every construct — split/merge, also, loop_body, gate, switch, map,
tag/using — composes cleanly. Forward references (using before tag) carry
state across calls, enabling recurrent architectures without special-casing.
| Method | What it does |
|--------|-------------|
| from(m).through(m) | Linear chain |
| also(m) | Residual: input + m(input) |
| fork(m) | Side branch: capture output as tag, stream continues |
| split(modules![...]).merge(op) | Parallel branches, merged by Add or Mean |
| tag(name) / using(refs) | Named references — backward or forward (across calls) |
| loop_body(body).for_n(n) | Fixed iteration with BPTT |
| loop_body(body).while_cond / until_cond | Conditional loops |
| gate(router, modules![...]) | Soft routing — weighted combination |
| switch(selector, modules![...]) | Hard routing — only selected branch |
| map(body).each() / .over(tag) / .slices(n) | Element-wise, tagged, or sliced iteration |
| input(names) | Auxiliary graph inputs for multi-input architectures |
See the Graph Builder Tutorial and the full showcase.
Graph Tree: Hierarchical Composition
This is where floDl goes beyond PyTorch. Graphs nest inside graphs with label-path addressing — dot-separated paths that let you reach into any subgraph from the root. Train components independently, compose them into larger architectures, and control training phases declaratively.
// Build components independently
let scan = FlowBuilder::from(scan_net).tag("hidden")
.label("scan").build()?;
let read = FlowBuilder::from(read_net).tag("confidence")
.label("read").build()?;
let encoder = FlowBuilder::from(scan)
.through(read)
.label("encoder").build()?;
// Compose into full model
let model = FlowBuilder::from(encoder)
.through(classifier)
.build()?;
Dotted paths reach anywhere
Every tag and subgraph is addressable through dotted paths from the root:
model.validate_path("encoder")?; // -> Subgraph
model.validate_path("encoder.scan.hidden")?; // -> Tag (three levels deep)
model.validate_path("encoder.read.confidence")?; // -> Tag
Declarative training phases
Freeze and thaw entire subtrees by path — no manual parameter iteration:
// Phase 1: train only the classifier, encoder is frozen
model.freeze("encoder")?;
let fresh_params = model.parameters(); // only unfrozen params
let mut opt = Adam::new(&fresh_params, 1e-3);
// ... train ...
// Phase 2: thaw scan, keep read frozen (it's proven)
model.thaw("encoder.scan")?;
let mut opt = Adam::with_groups()
.group(&model.parameters_at("encoder.scan")?, 1e-4) // low LR
.group(&model.parameters_at("classifier")?, 1e-3)
.build();
Subgraph checkpoints
Train a component standalone, save it, load it into a larger model:
// Pre-trained encoder saved earlier
encoder.save_checkpoint("encoder_v1.fdl.gz")?;
// Load into the composed model — namespace + hash validated
model.load_subgraph_checkpoint("encoder", "encoder_v1.fdl.gz")?;
model.freeze("encoder.read")?; // lock what's proven
Cross-boundary observation
Metrics flow up through the tree automatically:
model.record_at("encoder.scan.loss", scan_loss)?;
model.record_at("encoder.read.accuracy", read_acc)?;
model.record_scalar("total_loss", total)?;
model.flush(&[]); // single call flushes the entire tree
// Trends across boundaries — drive training decisions
if model.trend_at("encoder.scan.loss")?.stalled(10, 1e-4) {
model.thaw("encoder.read")?; // scan stalled, unfreeze read
}
// Monitor sees all metrics with dotted names automatically
monitor.log(epoch, elapsed, &model);
// -> total_loss, encoder.scan.loss, encoder.read.accuracy
This is progressive model composition: each component is trained and validated independently before becoming a building block in a larger architecture. Checkpoints, metrics, and training phases compose just like the graphs themselves.
See the full Graph Tree Tutorial.
The Training Experience
Training Monitor
Drop-in monitor with adaptive ETA, resource tracking, and a live web dashboard — no external dependencies, no separate process.
use flodl::monitor::Monitor;
let mut monitor = Monitor::new(num_epochs);
monitor.serve(3000)?; // optional: live dashboard at http://localhost:3000
for epoch in 0..num_epochs {
let t = std::time::Instant::now();
// ... training ...
monitor.log(epoch, t.elapsed(), &model); // sees entire graph tree
}
monitor.finish();
epoch 1/100 loss=1.5264 [49ms ETA 4.8s]
epoch 10/100 loss=0.3817 [25ms ETA 2.2s] VRAM: 2.1/6.0 GB (82%)
epoch 50/100 loss=0.0023 [24ms ETA 1.2s] VRAM: 2.1/6.0
