SkillAgentSearch skills...

Opencli

OpenCLI bridges the gap between raw MLX models and AI Agents. Convert local Vision, Audio, and 3D models into Standardized Agent Skills via MCP.

Install / Use

/learn @openclirun/Opencli
About this skill

Quality Score

0/100

Supported Platforms

Zed
Claude Code
Cursor

README

OpenCLI

<p align="center"> <img src="assets/image/favicon.svg" alt="OpenCLI logo" width="128" height="128"> </p>

Pipe AI Models to your Terminal. Give Your Agents Hands and Eyes.

OpenCLI is the native Swift/MLX capability engine for the command line. Convert local models into modular Agent Skills. High performance, zero Python, 100% private. Optimized for OpenClaw and MCP.

An agent without sensors is just a chatbox. OpenCLI provides the physical layer for local AI. Built natively with Swift for Apple Silicon, it delivers the cold-start speed and modality support that server-side LLM runners lack.

  • Native OpenClaw & MCP Support
  • Unified Memory Hardware Sensing
  • Zero Python dependencies at runtime

Quick Install (macOS)

brew tap openclirun/opencli
brew install opencli

(Or build from source using Swift Package Manager)


Know Your Hardware, Run Right-Sized Models

OpenCLI features a built-in fit command to instantly evaluate your hardware (RAM/Unified Memory) and score models based on fit, speed, and context limits.

$ opencli fit

Device: Apple M2 | total 16.0 GB | available 4.6 GB | model budget 3.9 GB
GPU: Apple M2 | backend: metal | unified_memory: true

Recommendations by task:
- [asr] Qwen3-ASR 1.7B 4bit | 🟡 Good | score 86.5 | GPU
- [chat] Qwen3 Chat 1.7B 4bit | 🟠 Marginal | score 82.5 | GPU
- [embedding] Qwen3 Embedding 0.6B 4bit DWQ | 🟢 Perfect | score 74.3 | GPU
- [i2i] Qwen Image Edit 2511 | 🔴 TooTight | score 58.9 | CPU+GPU
- [i2t] Qwen3 VL 4B Instruct 3bit | 🟠 Marginal | score 83.9 | GPU
- [i2v] LTX-2 Distilled (I2V) | 🔴 TooTight | score 59.7 | CPU+GPU
- [ocr] DeepSeek OCR | 🟠 Marginal | score 76.0 | GPU
- [rerank] Qwen3 Reranker 0.6B 4bit | 🟢 Perfect | score 71.9 | GPU
- [sr] SeedVR2 3B | 🟠 Marginal | score 77.9 | GPU
- [sts] LFM2.5 Audio 1.5B 6bit | 🟡 Good | score 84.7 | GPU
- [t2i] Qwen Image 2512 | 🔴 TooTight | score 58.7 | CPU+GPU
- [t2m] ACE-Step 1.5 | 🔴 TooTight | score 57.0 | CPU+GPU
- [t2v] LTX-2 Distilled (T2V) | 🔴 TooTight | score 60.3 | CPU+GPU
- [tts] Orpheus 3B 0.1 FT bf16 | 🟠 Marginal | score 85.3 | GPU
- [vad] Sortformer 4SPK v2.1 fp16 | 🟢 Perfect | score 68.3 | GPU

Capabilities & Local Models

OpenCLI focuses on running right-sized, hardware-optimized models that fit perfectly in your Mac's unified memory, bringing true multimodal capabilities directly to your terminal.

👁️ Vision (OCR, VLM, Embeddings)

See everything locally. From structured documents to real-time screen analysis for autonomous agents.

  • Qwen3-VL 4B (Instruct 3bit): Fast and highly capable small multimodal vision.
  • DeepSeek OCR / GLM-OCR: Lightning-fast, accurate local text extraction.
  • Qwen3 Embedding & Reranker (0.6B 4bit): Ultra-efficient perfect fit for local semantic search.
  • SeedVR2 3B: Spatial understanding and super-resolution models.

🎙️ Audio (ASR, TTS, VAD, STS)

Hear and speak natively. Ultra-low latency voice perception and multi-speaker cloned synthesis.

  • Qwen3-ASR (1.7B 4bit) / Parakeet: Native speech-to-text with exceptional speed.
  • Orpheus (3B bf16) / Qwen3-TTS / Pocket TTS: Lightweight, low-latency text-to-speech perfect for instant agent responses.
  • LFM2.5 Audio (1.5B 6bit): Direct Speech-to-Speech (STS) handling.
  • Sortformer (4SPK v2.1 fp16): Perfect-fit Voice Activity Detection (VAD) and speaker diarization.

🪄 Generator (Image, Video, Audio)

Create across dimensions. High-performance local generation for visual assets and 3D meshes.

  • Flux.2 (Klein 4B): Pure Swift implementation of Flux.2 image generation. On-the-fly quantization (qint8/int4) ensures it runs efficiently on standard M-series Macs.
  • Qwen Image 2512 & Image Edit: Advanced Image-to-Image (I2I) and Text-to-Image (T2I) generation.
  • LTX-2 Distilled: Video generation bridging Text-to-Video (T2V) and Image-to-Video (I2V).
  • ACE-Step 1.5: Advanced Text-to-Music/Audio generation.

🧠 LLM (Chat & Coding)

Think and build locally. Private reasoning, instruction following, and coding capabilities optimized for MLX.

  • Qwen3-Instruct (1.7B 4bit): Highly capable reasoning and coding models optimized for Apple Silicon.
  • Llama-Series: Built-in support for standard instruct and chat architectures.

Workflow Examples

Combine OpenCLI commands to build instant multimodal workflows using standard Unix pipes:

# A complete Voice-to-Voice pipeline in one line
opencli asr | opencli chat | opencli tts

Community & Docs

  • Website: opencli.run
  • Documentation: See the docs/ folder for specific model usage (e.g., asr-qwen3.md, t2i-flux2.md).

License

This project is licensed under the MIT License.

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated28d ago
Forks1

Languages

Swift

Security Score

75/100

Audited on Mar 6, 2026

No findings