Pmetal
Powdered Metal — High performance LLM fine-tuning framework for Apple Silicon
Install / Use
/learn @Epistates/PmetalREADME
PMetal
Powdered Metal — An ML SDK, framework, and application suite for Apple Silicon, written in Rust.
PMetal is a complete machine learning platform for Apple Silicon — from low-level Metal GPU kernels and Apple Neural Engine integration to high-level training APIs, a terminal TUI, and a full desktop GUI. Ship fine-tuned models without leaving the Apple ecosystem.
Use PMetal Your Way
Desktop GUI
<img src="public/pmetal_gui.png" alt="pmetal screenshot showing GUI" style="width: 100%; max-width: 100%; margin: 20px 0;"/>A full Tauri + Svelte desktop application for visual model management, training, and inference.
cd crates/pmetal-gui
bun install && bun tauri dev
10 pages: Dashboard, Models, Datasets, Training, Distillation, GRPO, Inference, Merging, Quantize, and Settings. Download models from HuggingFace, configure LoRA training with live loss metrics, chat with models, merge weights, and quantize — all from the GUI. Training runs in-process with real-time progress updates.
Terminal TUI
<img src="public/pmetal_tui.png" alt="pmetal screenshot showing TUI" style="width: 100%; max-width: 100%; margin: 20px 0;"/>A full-featured terminal control center with 9 tabs.
pmetal tui
| Tab | Description |
|-----|-------------|
| Dashboard | Live loss curves (braille), LR schedule, throughput sparklines, timing breakdown gauges |
| Device | GPU/ANE info, Metal feature detection, memory gauge, kernel tuning, UltraFusion topology |
| Models | Browse cached models, HuggingFace Hub search (S), memory fit estimation, download |
| Datasets | Scan and preview local datasets (JSONL, Parquet, CSV) with line counts |
| Training | Configure and launch SFT/LoRA/QLoRA training runs with sectioned parameter forms |
| Distillation | Configure knowledge distillation (online, offline, progressive, cross-vocab) |
| GRPO | Configure GRPO/DAPO reasoning training with reward functions and sampling params |
| Inference | Interactive chat interface with markdown rendering and generation settings sidebar |
| Jobs | Training run history with log viewer, status tracking, and metadata |
Keybindings: Tab/Shift+Tab to switch tabs, Alt+1-9 for direct access, L to adjust learning rate mid-run, q to quit.
CLI
# LoRA fine-tuning with sequence packing (default)
pmetal train \
--model Qwen/Qwen3-0.6B \
--dataset train.jsonl \
--output ./output \
--lora-r 16 --batch-size 4 --learning-rate 2e-4
# Inference with LoRA adapter
pmetal infer \
--model Qwen/Qwen3-0.6B \
--lora ./output/lora_weights.safetensors \
--prompt "Explain quantum entanglement" \
--chat --show-thinking
# Knowledge distillation
pmetal distill \
--teacher Qwen/Qwen3-4B \
--student Qwen/Qwen3.5-0.8B-Base \
--dataset train.jsonl
# GRPO reasoning training
pmetal grpo \
--model Qwen/Qwen3-0.6B \
--dataset reasoning.jsonl \
--reasoning-rewards
# HuggingFace model search with memory fit
pmetal search "qwen 0.6b" --detailed
# Merge models with SLERP
pmetal merge \
--models model-a model-b \
--method slerp --t 0.5
# Quantize to GGUF
pmetal quantize \
--model ./output \
--output model.gguf --type q4km
# Fuse LoRA into base model
pmetal fuse \
--model Qwen/Qwen3-0.6B \
--lora ./output/lora_weights.safetensors
# Evaluate perplexity
pmetal eval \
--model Qwen/Qwen3-0.6B \
--dataset eval.jsonl
# Start OpenAI-compatible server (requires --features serve)
pmetal serve --model Qwen/Qwen3-0.6B --port 8080
All CLI Commands
| Command | Description |
|---------|-------------|
| train | Fine-tune with LoRA/QLoRA/DoRA (SFT) |
| infer | Interactive inference with chat, tool use, and thinking mode |
| distill | Knowledge distillation (online, offline, progressive) |
| grpo | GRPO/DAPO reasoning training (VLM, speculative, async rewards) |
| rlkd | Reinforcement Learning with Knowledge Distillation |
| embed-train | Sentence-transformer fine-tuning (InfoNCE, Triplet, CoSENT) |
| search | Search HuggingFace Hub with memory fit estimation |
| download | Download a model from HuggingFace Hub |
| merge | Merge two or more models (12 strategies) |
| quantize | GGUF quantization (13 format options) |
| fuse | Fuse LoRA adapter weights into base model |
| eval | Evaluate model perplexity on a dataset |
| serve | OpenAI-compatible inference server (feature-gated) |
| tui | Full TUI control center (9 tabs) |
| dashboard | Real-time training metrics visualization |
| dataset | Dataset utilities: analyze, download, convert |
| ollama | Ollama integration: modelfile, create, templates |
| info | Show device info (GPU, ANE, bandwidth, NAX) |
| memory | Show memory usage and available capacity |
| init | Generate a sample configuration file |
| bench | Benchmark training performance |
| bench-gen | Benchmark generation loop timing |
| bench-ffi | Benchmark FFI overhead |
SDK
PMetal is an embeddable SDK — integrate training, inference, and model operations into your own Rust applications. The easy module provides high-level builders, while the underlying crates (pmetal-trainer, pmetal-models, pmetal-lora, etc.) offer full control over every pipeline stage.
use pmetal::easy;
// Fine-tune with LoRA
let result = easy::finetune("Qwen/Qwen3-0.6B", "train.jsonl")
.lora(16, 32.0)
.learning_rate(2e-4)
.epochs(3)
.output("./output")
.run()
.await?;
// DPO preference optimization
let result = easy::dpo("Qwen/Qwen3-0.6B", "preferences.jsonl")
.dpo_beta(0.1)
.reference_model("Qwen/Qwen3-0.6B")
.run()
.await?;
// Inference
let output = easy::infer("Qwen/Qwen3-0.6B")
.temperature(0.7)
.lora("./output/lora_weights.safetensors")
.generate("What is 2+2?")
.await?;
// Streaming inference
easy::infer("Qwen/Qwen3-0.6B")
.generate_streaming("Tell me a story", |delta| {
print!("{delta}");
true // return false to stop early
})
.await?;
Available builders: easy::finetune(), easy::dpo(), easy::simpo(), easy::orpo(), easy::kto(), easy::infer().
For lower-level control, use the crates directly — pmetal-trainer::TrainingLoop, pmetal-models::DynamicModel, pmetal-lora::DynamicLoraModel, pmetal-distill::Distiller, etc. See the examples/ directory for complete working examples including manual training loop orchestration and ANE-specific workflows.
Python SDK
PMetal exposes a Python extension module via PyO3. Install with maturin develop from crates/pmetal-py.
Quick Start (Easy API)
import pmetal
# Fine-tune with sensible defaults
result = pmetal.finetune(
"Qwen/Qwen3-0.6B",
"train.jsonl",
lora_r=16,
learning_rate=2e-4,
epochs=3,
)
print(f"Loss: {result['final_loss']}, Steps: {result['total_steps']}")
# Inference
text = pmetal.infer("Qwen/Qwen3-0.6B", "What is 2+2?")
print(text)
# Inference with LoRA adapter
text = pmetal.infer(
"Qwen/Qwen3-0.6B",
"Explain quantum entanglement",
lora="./output/lora_weights.safetensors",
)
Full Control
import pmetal
# Configure training components
lora_config = pmetal.LoraConfig(r=16, alpha=32.0)
training_config = pmetal.TrainingConfig(
learning_rate=2e-4,
num_epochs=3,
batch_size=4,
max_seq_len=2048,
)
# Create trainer
trainer = pmetal.Trainer(
model_id="Qwen/Qwen3-0.6B",
lora_config=lora_config,
training_config=training_config,
dataset_path="train.jsonl",
)
trainer.add_callback(pmetal.ProgressCallback())
result = trainer.train()
# Load model for inference
model = pmetal.Model.load("Qwen/Qwen3-0.6B")
print(model.generate("Hello world", temperature=0.7))
Installation
Prebuilt signed binaries are available on the Releases page.
Crates are available on crates.io.
Build from source:
git clone https://github.com/epistates/pmetal.git && cd pmetal
cargo build --release # CLI + TUI
cd crates/pmetal-gui && bun install && bun tauri build # GUI (optional)
Hardware Support
PMetal automatically detects Apple Silicon capabilities at startup and tunes kernel parameters accordingly.
| Chip Family | GPU Family | NAX | ANE | UltraFusion | Status | |-------------|-----------|-----|-----|-------------|--------| | M1 / Pro / Max / Ultra | Apple7 | - | 16 cores | Ultra: 2-die | Fully supported | | M2 / Pro / Max / Ultra | Apple8 | - | 16 cores | Ultra: 2-die | Fully supported | | M3 / Pro / Max / Ultra | Apple9 | - | 16 cores | Ultra: 2-die | Fully supported | | M4 / Pro / Max / Ultra | Apple9 | - | 16 cores | Ultra: 2-die | Fully supported | | M5 / Pro / Max / Ultra | Apple10 | Yes | 16 cores | Ultra: 2-die | Fully supported |
Auto-detected features: GPU family, device tier, core counts, memory bandwidth, dynamic caching, mesh shaders, NAX (M5+), UltraFusion topology (via sysctl hw.packages), ANE availability.
Tier-based kernel tuning: Matrix tile sizes, FlashAttention block sizes, fused kernel threadgroup sizes, and batch multipliers are automatically selected based on device tier (Base/Pro/Max/Ultra) and GPU family. See docs/hardware-support.md for the full tuning matrix.
Architecture
PMetal is organized as a Rust workspace with 18 specialized crates:
pmetal/
├── pmetal-core # Foundation: configs, traits, types, error handling
├── pmetal-metal # Custom Metal GPU k
