PMetal

Powdered Metal — An ML SDK, framework, and application suite for Apple Silicon, written in Rust.

PMetal is a complete machine learning platform for Apple Silicon — from low-level Metal GPU kernels and Apple Neural Engine integration to high-level training APIs, a terminal TUI, and a full desktop GUI. Ship fine-tuned models without leaving the Apple ecosystem.

Use PMetal Your Way

Desktop GUI

A full Tauri + Svelte desktop application for visual model management, training, and inference.

cd crates/pmetal-gui
bun install && bun tauri dev

10 pages: Dashboard, Models, Datasets, Training, Distillation, GRPO, Inference, Merging, Quantize, and Settings. Download models from HuggingFace, configure LoRA training with live loss metrics, chat with models, merge weights, and quantize — all from the GUI. Training runs in-process with real-time progress updates.

Terminal TUI

A full-featured terminal control center with 9 tabs.

pmetal tui

| Tab | Description | |-----|-------------| | Dashboard | Live loss curves (braille), LR schedule, throughput sparklines, timing breakdown gauges | | Device | GPU/ANE info, Metal feature detection, memory gauge, kernel tuning, UltraFusion topology | | Models | Browse cached models, HuggingFace Hub search (S), memory fit estimation, download | | Datasets | Scan and preview local datasets (JSONL, Parquet, CSV) with line counts | | Training | Configure and launch SFT/LoRA/QLoRA training runs with sectioned parameter forms | | Distillation | Configure knowledge distillation (online, offline, progressive, cross-vocab) | | GRPO | Configure GRPO/DAPO reasoning training with reward functions and sampling params | | Inference | Interactive chat interface with markdown rendering and generation settings sidebar | | Jobs | Training run history with log viewer, status tracking, and metadata |

Keybindings: Tab/Shift+Tab to switch tabs, Alt+1-9 for direct access, L to adjust learning rate mid-run, q to quit.

CLI

# LoRA fine-tuning with sequence packing (default)
pmetal train \
  --model Qwen/Qwen3-0.6B \
  --dataset train.jsonl \
  --output ./output \
  --lora-r 16 --batch-size 4 --learning-rate 2e-4

# Inference with LoRA adapter
pmetal infer \
  --model Qwen/Qwen3-0.6B \
  --lora ./output/lora_weights.safetensors \
  --prompt "Explain quantum entanglement" \
  --chat --show-thinking

# Knowledge distillation
pmetal distill \
  --teacher Qwen/Qwen3-4B \
  --student Qwen/Qwen3.5-0.8B-Base \
  --dataset train.jsonl

# GRPO reasoning training
pmetal grpo \
  --model Qwen/Qwen3-0.6B \
  --dataset reasoning.jsonl \
  --reasoning-rewards

# HuggingFace model search with memory fit
pmetal search "qwen 0.6b" --detailed

# Merge models with SLERP
pmetal merge \
  --models model-a model-b \
  --method slerp --t 0.5

# Quantize to GGUF
pmetal quantize \
  --model ./output \
  --output model.gguf --type q4km

# Fuse LoRA into base model
pmetal fuse \
  --model Qwen/Qwen3-0.6B \
  --lora ./output/lora_weights.safetensors

# Evaluate perplexity
pmetal eval \
  --model Qwen/Qwen3-0.6B \
  --dataset eval.jsonl

# Start OpenAI-compatible server (requires --features serve)
pmetal serve --model Qwen/Qwen3-0.6B --port 8080

All CLI Commands

| Command | Description | |---------|-------------| | train | Fine-tune with LoRA/QLoRA/DoRA (SFT) | | infer | Interactive inference with chat, tool use, and thinking mode | | distill | Knowledge distillation (online, offline, progressive) | | grpo | GRPO/DAPO reasoning training (VLM, speculative, async rewards) | | rlkd | Reinforcement Learning with Knowledge Distillation | | embed-train | Sentence-transformer fine-tuning (InfoNCE, Triplet, CoSENT) | | search | Search HuggingFace Hub with memory fit estimation | | download | Download a model from HuggingFace Hub | | merge | Merge two or more models (12 strategies) | | quantize | GGUF quantization (13 format options) | | fuse | Fuse LoRA adapter weights into base model | | eval | Evaluate model perplexity on a dataset | | serve | OpenAI-compatible inference server (feature-gated) | | tui | Full TUI control center (9 tabs) | | dashboard | Real-time training metrics visualization | | dataset | Dataset utilities: analyze, download, convert | | ollama | Ollama integration: modelfile, create, templates | | info | Show device info (GPU, ANE, bandwidth, NAX) | | memory | Show memory usage and available capacity | | init | Generate a sample configuration file | | bench | Benchmark training performance | | bench-gen | Benchmark generation loop timing | | bench-ffi | Benchmark FFI overhead |

SDK

PMetal is an embeddable SDK — integrate training, inference, and model operations into your own Rust applications. The easy module provides high-level builders, while the underlying crates (pmetal-trainer, pmetal-models, pmetal-lora, etc.) offer full control over every pipeline stage.

use pmetal::easy;

// Fine-tune with LoRA
let result = easy::finetune("Qwen/Qwen3-0.6B", "train.jsonl")
    .lora(16, 32.0)
    .learning_rate(2e-4)
    .epochs(3)
    .output("./output")
    .run()
    .await?;

// DPO preference optimization
let result = easy::dpo("Qwen/Qwen3-0.6B", "preferences.jsonl")
    .dpo_beta(0.1)
    .reference_model("Qwen/Qwen3-0.6B")
    .run()
    .await?;

// Inference
let output = easy::infer("Qwen/Qwen3-0.6B")
    .temperature(0.7)
    .lora("./output/lora_weights.safetensors")
    .generate("What is 2+2?")
    .await?;

// Streaming inference
easy::infer("Qwen/Qwen3-0.6B")
    .generate_streaming("Tell me a story", |delta| {
        print!("{delta}");
        true // return false to stop early
    })
    .await?;

Available builders: easy::finetune(), easy::dpo(), easy::simpo(), easy::orpo(), easy::kto(), easy::infer().

For lower-level control, use the crates directly — pmetal-trainer::TrainingLoop, pmetal-models::DynamicModel, pmetal-lora::DynamicLoraModel, pmetal-distill::Distiller, etc. See the examples/ directory for complete working examples including manual training loop orchestration and ANE-specific workflows.

Python SDK

PMetal exposes a Python extension module via PyO3. Install with maturin develop from crates/pmetal-py.

Quick Start (Easy API)

import pmetal

# Fine-tune with sensible defaults
result = pmetal.finetune(
    "Qwen/Qwen3-0.6B",
    "train.jsonl",
    lora_r=16,
    learning_rate=2e-4,
    epochs=3,
)
print(f"Loss: {result['final_loss']}, Steps: {result['total_steps']}")

# Inference
text = pmetal.infer("Qwen/Qwen3-0.6B", "What is 2+2?")
print(text)

# Inference with LoRA adapter
text = pmetal.infer(
    "Qwen/Qwen3-0.6B",
    "Explain quantum entanglement",
    lora="./output/lora_weights.safetensors",
)

Full Control

import pmetal

# Configure training components
lora_config = pmetal.LoraConfig(r=16, alpha=32.0)
training_config = pmetal.TrainingConfig(
    learning_rate=2e-4,
    num_epochs=3,
    batch_size=4,
    max_seq_len=2048,
)

# Create trainer
trainer = pmetal.Trainer(
    model_id="Qwen/Qwen3-0.6B",
    lora_config=lora_config,
    training_config=training_config,
    dataset_path="train.jsonl",
)
trainer.add_callback(pmetal.ProgressCallback())
result = trainer.train()

# Load model for inference
model = pmetal.Model.load("Qwen/Qwen3-0.6B")
print(model.generate("Hello world", temperature=0.7))

Installation

Prebuilt signed binaries are available on the Releases page.

Crates are available on crates.io.

Build from source:

git clone https://github.com/epistates/pmetal.git && cd pmetal
cargo build --release          # CLI + TUI
cd crates/pmetal-gui && bun install && bun tauri build  # GUI (optional)

Hardware Support

PMetal automatically detects Apple Silicon capabilities at startup and tunes kernel parameters accordingly.

| Chip Family | GPU Family | NAX | ANE | UltraFusion | Status | |-------------|-----------|-----|-----|-------------|--------| | M1 / Pro / Max / Ultra | Apple7 | - | 16 cores | Ultra: 2-die | Fully supported | | M2 / Pro / Max / Ultra | Apple8 | - | 16 cores | Ultra: 2-die | Fully supported | | M3 / Pro / Max / Ultra | Apple9 | - | 16 cores | Ultra: 2-die | Fully supported | | M4 / Pro / Max / Ultra | Apple9 | - | 16 cores | Ultra: 2-die | Fully supported | | M5 / Pro / Max / Ultra | Apple10 | Yes | 16 cores | Ultra: 2-die | Fully supported |

Auto-detected features: GPU family, device tier, core counts, memory bandwidth, dynamic caching, mesh shaders, NAX (M5+), UltraFusion topology (via sysctl hw.packages), ANE availability.

Tier-based kernel tuning: Matrix tile sizes, FlashAttention block sizes, fused kernel threadgroup sizes, and batch multipliers are automatically selected based on device tier (Base/Pro/Max/Ultra) and GPU family. See docs/hardware-support.md for the full tuning matrix.

Architecture

PMetal is organized as a Rust workspace with 18 specialized crates:

pmetal/
├── pmetal-core         # Foundation: configs, traits, types, error handling
├── pmetal-metal        # Custom Metal GPU k

Pmetal

Install / Use

README