Psionic
Rust ML stack
Install / Use
/learn @OpenAgentsInc/PsionicREADME
Psionic
Psionic is a Rust-native ML and inference stack.
It owns the machine-facing execution substrate behind local inference, serving, training, distributed execution, artifact truth, and clustered compute. The project is broader than one app or one benchmark lane. It is the crate family that OpenAgents uses for inference, training, cluster bring-up, and execution evidence.
Psionic should be read hardware-first. It owns the admitted hardware strategy
for each lane: backend family, residency mode, topology, serving or training
role, and the capability, refusal, and evidence surfaces that higher layers
consume. Upstream systems such as llama.cpp, vLLM, SGLang, MLX, and
other reference repos are inputs for specific layers or hardware classes, not
the identity of the shipped Psionic stack.
The training side now also carries one bounded gemma4:e4b CUDA adapter-SFT
trainer above the shared adapter substrate: LM-head-only final-hidden-state
supervision, frozen-base semantics, typed export, exact checkpoint resume,
served-base plus tokenizer compatibility checks, and explicit refusal truth for
wider Gemma regions that remain out of scope. The same bounded lane now also
closes the first trainer-to-serving refresh seam: typed Gemma checkpoints plus
exported adapter artifacts can be revalidated into the live CUDA mesh lane
without a process restart, the active served revision is surfaced in response
provenance, stale or mismatched revisions fail closed, and operators can roll
back to the last known-good promoted revision. The same lane is now also
eval-first: it binds one canonical held-out eval pack, one four-split dataset
contract, one short baseline sweep against the untuned base, one overlap and
decontam gate, one canned promoted-checkpoint vibe-review packet, and one
promotion decision that refuses held-out regressions or failed operator review.
Start Here
- System architecture: docs/ARCHITECTURE.md
- Detailed workspace map: docs/WORKSPACE_MAP.md
- Inference and serving: docs/INFERENCE_ENGINE.md
- Inference mesh ownership: docs/INFERENCE_MESH_OWNERSHIP.md
- Mesh lane service mode: docs/MESH_LANE_SERVICE_MODE.md
- Optimizer substrate: docs/OPTIMIZER_SUBSTRATE.md
- Forge-facing eval pack publication: docs/PSION_FORGE_EVAL_PACK_MANIFESTS.md
- Hermes user guide: docs/hermes/README.md
- Training system: docs/TRAIN_SYSTEM.md
- Repo-local library roadmap: docs/ROADMAP.md
- Psion learned-model program: docs/PSION_PROGRAM_MAP.md
- Psion actual-pretraining operator runbook: docs/PSION_ACTUAL_PRETRAINING_RUNBOOK.md
- Psion bounded reference-lane smoke runbook: docs/PSION_LOCAL_FIRST_TRAIN_RUNBOOK.md
Main Tracks
- Inference and local serving
- local GPT-OSS server and benchmark harness
- generic OpenAI-compatible server surfaces
- hardware validation and backend truth
- bounded non-
GptOsslanes includingqwen35, the published densegemma4:e4bCUDA lane, the sparsegemma4:26btopology-publication and refusal lane, and the optional denseGemma 4 31Bvalidation repeat that keeps the same family contract without widening the first claim - the Gemma image or video path now publishes as a processor-owned refusal lane instead of pretending the dense text surface can consume media URLs
- the dense
Gemma 4e2bande4brows now also publish a separate processor-owned audio lane with explicitinput_audiorefusal until the real audio processor lands, while31Band26Bstill fail closed - the generic server now also publishes one first-class
Gemma 4Metal lane contract withbackend = metal,execution_mode = native, andfallback_policy = refuse, and it returns an explicit refusal instead of silently falling back to CPU or CUDA until a real Metal decoder lands - the generic server, routed inventory, and mesh management status now also
publish family-agnostic clustered execution truth so downstream consumers
can tell whether a model is remote-proxied, replicated, split across
machines, or running as a sparse distributed expert row without
gpt_ossspecific heuristics - start with docs/GPT_OSS_LOCAL_SERVING.md
- supporting docs: docs/NON_GPT_OSS_QWEN35_PILOT.md, docs/NON_GPT_OSS_GEMMA4_PILOT.md
- Hermes agent backend
- use Psionic as a real Hermes backend over the OpenAI-compatible
chat.completionspath - start with docs/hermes/README.md
- supporting docs: docs/HERMES_QWEN35_COMPATIBILITY.md, docs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md, docs/HERMES_BACKEND_BENCHMARK.md
- use Psionic as a real Hermes backend over the OpenAI-compatible
- Parameter Golf and distributed training
- single-H100, distributed
8xH100, submission, evidence, and score-path work - start with docs/ROADMAP_PARAMETERGOLF.md
- supporting docs: docs/PARAMETER_GOLF_SINGLE_H100_TRAINER.md, docs/PARAMETER_GOLF_DISTRIBUTED_8XH100.md, docs/PARAMETER_GOLF_RUNPOD_8XH100_RUNBOOK.md
- single-H100, distributed
- Cluster, swarm, and cross-provider compute
- local mixed-hardware swarm, Google dual-node swarm, cross-provider training contracts
- optional mesh coordination adjunct under
/psionic/management/coordination/*for typed status, finding, question, tip, and done packets with TTL, visibility, provenance, search, and redaction semantics outside the inference critical path - expert-family GGUF admission now stays explicit:
psionic-modelscan inspect non-gpt-ossexpert artifacts, carry artifact identity plus expert-topology requirements, and refuse native execution with a machine-checkable topology-contract error instead of collapsing them into a generic unsupported-family bucket psionic-clusternow also owns one native sparse expert-placement contract over explicit expert-host inventory, stable placement digests, typed refusal codes, and reusable sharded execution receipts instead of a sidecar-only MoE control plane; the first specialized lane isgemma4:26bwith64experts,4active experts,family_specific_placement, and a truthful two-host partitioned planning policy- start with docs/ROADMAP_CLUSTER.md
- supporting docs: docs/INFERENCE_MESH_OWNERSHIP.md, docs/MESH_LANE_SERVICE_MODE.md, docs/FIRST_SWARM_TRUSTED_LAN_RUNBOOK.md, docs/PSION_GOOGLE_TWO_NODE_SWARM_RUNBOOK.md, docs/TRAIN_ARTIFACT_STORAGE_REFERENCE.md
- Psion learned-model program
- corpus, tokenizer, pretrain, trusted-cluster, and decentralized contribution work
- start with docs/PSION_PROGRAM_MAP.md
- supporting docs: docs/PSION_ACTUAL_PRETRAINING_RUNBOOK.md, docs/PSION_LOCAL_FIRST_TRAIN_RUNBOOK.md, docs/PSION_PRETRAIN_STAGE.md, docs/PSION_TRUSTED_CLUSTER_RUN.md, docs/PSION_DECENTRALIZED_CONTRIBUTION.md
Psion Training Shortcut
If you want the current top Psion training lane instead of guessing among benchmark-adjacent lanes, run:
./TRAIN
That command now targets the actual Psion pretraining lane and materializes the
retained launch, status, preflight, checkpoint, dashboard, alert, and closeout
surfaces under ~/scratch/psion_actual_pretraining_runs/<run_id>.
Use:
./TRAIN --dry-run
./TRAIN resume --run-root <path>
./TRAIN status --run-root <path>
for plan inspection and operator follow-up on the actual lane.
The older bounded reference pilot still exists as the smoke/reference lane:
./TRAIN --lane reference_pilot --dry-run
./TRAIN --lane reference_pilot --mode local_reference
Tassadar Training Shortcut
If you want the current default Tassadar training lane instead of guessing among older bounded benchmark lanes, run:
./TRAIN_TASSADAR
That command now means the bounded trace-bound article-transformer
weight-production lane that produces the retained
tassadar-article-transformer-trace-bound-trained-v0 family under
fixtures/tassadar/runs/tassadar_article_transformer_weight_production_v1.
The lane contract lives in docs/TASSADAR_DEFAULT_TRAIN_LANE.md.
The operator launcher lives in docs/TASSADAR_TRAIN_LAUNCHER.md.
The bounded default-lane rehearsal lives in docs/TASSADAR_DEFAULT_TRAIN_REHEARSAL.md.
Tassadar Executor Lane
Executor-class research and runtime work for exact computation starts with docs/ROADMAP_TASSADAR.md.
Local GPT-OSS Inference
Psionic ships a dedicated local GPT-OSS server in
crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs. It exposes:
GET /healthGET /v1/modelsPOST /v1/chat/completions
Build it:
cargo build -p psionic-serve --bin psionic-gpt-oss-server --release
Run it on a Linux NVIDIA host:
./target/release/psionic-gpt-oss-server \
-m /path/to/
