Psionic

Psionic is a Rust-native ML and inference stack.

It owns the machine-facing execution substrate behind local inference, serving, training, distributed execution, artifact truth, and clustered compute. The project is broader than one app or one benchmark lane. It is the crate family that OpenAgents uses for inference, training, cluster bring-up, and execution evidence.

Psionic should be read hardware-first. It owns the admitted hardware strategy for each lane: backend family, residency mode, topology, serving or training role, and the capability, refusal, and evidence surfaces that higher layers consume. Upstream systems such as llama.cpp, vLLM, SGLang, MLX, and other reference repos are inputs for specific layers or hardware classes, not the identity of the shipped Psionic stack.

The training side now also carries one bounded gemma4:e4b CUDA adapter-SFT trainer above the shared adapter substrate: LM-head-only final-hidden-state supervision, frozen-base semantics, typed export, exact checkpoint resume, served-base plus tokenizer compatibility checks, and explicit refusal truth for wider Gemma regions that remain out of scope. The same bounded lane now also closes the first trainer-to-serving refresh seam: typed Gemma checkpoints plus exported adapter artifacts can be revalidated into the live CUDA mesh lane without a process restart, the active served revision is surfaced in response provenance, stale or mismatched revisions fail closed, and operators can roll back to the last known-good promoted revision. The same lane is now also eval-first: it binds one canonical held-out eval pack, one four-split dataset contract, one short baseline sweep against the untuned base, one overlap and decontam gate, one canned promoted-checkpoint vibe-review packet, and one promotion decision that refuses held-out regressions or failed operator review.

Start Here

System architecture: docs/ARCHITECTURE.md
Detailed workspace map: docs/WORKSPACE_MAP.md
Inference and serving: docs/INFERENCE_ENGINE.md
Inference mesh ownership: docs/INFERENCE_MESH_OWNERSHIP.md
Mesh lane service mode: docs/MESH_LANE_SERVICE_MODE.md
Optimizer substrate: docs/OPTIMIZER_SUBSTRATE.md
Forge-facing eval pack publication: docs/PSION_FORGE_EVAL_PACK_MANIFESTS.md
Hermes user guide: docs/hermes/README.md
Training system: docs/TRAIN_SYSTEM.md
Repo-local library roadmap: docs/ROADMAP.md
Psion learned-model program: docs/PSION_PROGRAM_MAP.md
Psion actual-pretraining operator runbook: docs/PSION_ACTUAL_PRETRAINING_RUNBOOK.md
Psion bounded reference-lane smoke runbook: docs/PSION_LOCAL_FIRST_TRAIN_RUNBOOK.md

Main Tracks

Inference and local serving
- local GPT-OSS server and benchmark harness
- generic OpenAI-compatible server surfaces
- hardware validation and backend truth
- bounded non-GptOss lanes including qwen35, the published dense gemma4:e4b CUDA lane, the sparse gemma4:26b topology-publication and refusal lane, and the optional dense Gemma 4 31B validation repeat that keeps the same family contract without widening the first claim
- the Gemma image or video path now publishes as a processor-owned refusal lane instead of pretending the dense text surface can consume media URLs
- the dense Gemma 4 e2b and e4b rows now also publish a separate processor-owned audio lane with explicit input_audio refusal until the real audio processor lands, while 31B and 26B still fail closed
- the generic server now also publishes one first-class Gemma 4 Metal lane contract with backend = metal, execution_mode = native, and fallback_policy = refuse, and it returns an explicit refusal instead of silently falling back to CPU or CUDA until a real Metal decoder lands
- the generic server, routed inventory, and mesh management status now also publish family-agnostic clustered execution truth so downstream consumers can tell whether a model is remote-proxied, replicated, split across machines, or running as a sparse distributed expert row without gpt_oss specific heuristics
- start with docs/GPT_OSS_LOCAL_SERVING.md
- supporting docs: docs/NON_GPT_OSS_QWEN35_PILOT.md, docs/NON_GPT_OSS_GEMMA4_PILOT.md
Hermes agent backend
- use Psionic as a real Hermes backend over the OpenAI-compatible chat.completions path
- start with docs/hermes/README.md
- supporting docs: docs/HERMES_QWEN35_COMPATIBILITY.md, docs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md, docs/HERMES_BACKEND_BENCHMARK.md
Parameter Golf and distributed training
- single-H100, distributed 8xH100, submission, evidence, and score-path work
- start with docs/ROADMAP_PARAMETERGOLF.md
- supporting docs: docs/PARAMETER_GOLF_SINGLE_H100_TRAINER.md, docs/PARAMETER_GOLF_DISTRIBUTED_8XH100.md, docs/PARAMETER_GOLF_RUNPOD_8XH100_RUNBOOK.md
Cluster, swarm, and cross-provider compute
- local mixed-hardware swarm, Google dual-node swarm, cross-provider training contracts
- optional mesh coordination adjunct under /psionic/management/coordination/* for typed status, finding, question, tip, and done packets with TTL, visibility, provenance, search, and redaction semantics outside the inference critical path
- expert-family GGUF admission now stays explicit: psionic-models can inspect non-gpt-oss expert artifacts, carry artifact identity plus expert-topology requirements, and refuse native execution with a machine-checkable topology-contract error instead of collapsing them into a generic unsupported-family bucket
- psionic-cluster now also owns one native sparse expert-placement contract over explicit expert-host inventory, stable placement digests, typed refusal codes, and reusable sharded execution receipts instead of a sidecar-only MoE control plane; the first specialized lane is gemma4:26b with 64 experts, 4 active experts, family_specific_placement, and a truthful two-host partitioned planning policy
- start with docs/ROADMAP_CLUSTER.md
- supporting docs: docs/INFERENCE_MESH_OWNERSHIP.md, docs/MESH_LANE_SERVICE_MODE.md, docs/FIRST_SWARM_TRUSTED_LAN_RUNBOOK.md, docs/PSION_GOOGLE_TWO_NODE_SWARM_RUNBOOK.md, docs/TRAIN_ARTIFACT_STORAGE_REFERENCE.md
Psion learned-model program
- corpus, tokenizer, pretrain, trusted-cluster, and decentralized contribution work
- start with docs/PSION_PROGRAM_MAP.md
- supporting docs: docs/PSION_ACTUAL_PRETRAINING_RUNBOOK.md, docs/PSION_LOCAL_FIRST_TRAIN_RUNBOOK.md, docs/PSION_PRETRAIN_STAGE.md, docs/PSION_TRUSTED_CLUSTER_RUN.md, docs/PSION_DECENTRALIZED_CONTRIBUTION.md

Psion Training Shortcut

If you want the current top Psion training lane instead of guessing among benchmark-adjacent lanes, run:

./TRAIN

That command now targets the actual Psion pretraining lane and materializes the retained launch, status, preflight, checkpoint, dashboard, alert, and closeout surfaces under ~/scratch/psion_actual_pretraining_runs/<run_id>.

Use:

./TRAIN --dry-run
./TRAIN resume --run-root <path>
./TRAIN status --run-root <path>

for plan inspection and operator follow-up on the actual lane.

The older bounded reference pilot still exists as the smoke/reference lane:

./TRAIN --lane reference_pilot --dry-run
./TRAIN --lane reference_pilot --mode local_reference

Tassadar Training Shortcut

If you want the current default Tassadar training lane instead of guessing among older bounded benchmark lanes, run:

./TRAIN_TASSADAR

That command now means the bounded trace-bound article-transformer weight-production lane that produces the retained tassadar-article-transformer-trace-bound-trained-v0 family under fixtures/tassadar/runs/tassadar_article_transformer_weight_production_v1.

The lane contract lives in docs/TASSADAR_DEFAULT_TRAIN_LANE.md.

The operator launcher lives in docs/TASSADAR_TRAIN_LAUNCHER.md.

The bounded default-lane rehearsal lives in docs/TASSADAR_DEFAULT_TRAIN_REHEARSAL.md.

Tassadar Executor Lane

Executor-class research and runtime work for exact computation starts with docs/ROADMAP_TASSADAR.md.

Local GPT-OSS Inference

Psionic ships a dedicated local GPT-OSS server in crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs. It exposes:

GET /health
GET /v1/models
POST /v1/chat/completions

Build it:

cargo build -p psionic-serve --bin psionic-gpt-oss-server --release

Run it on a Linux NVIDIA host:

./target/release/psionic-gpt-oss-server \
  -m /path/to/

Psionic

Install / Use

README

Psionic

Start Here

Main Tracks

Psion Training Shortcut

Tassadar Training Shortcut

Tassadar Executor Lane

Local GPT-OSS Inference