Opencli
OpenCLI bridges the gap between raw MLX models and AI Agents. Convert local Vision, Audio, and 3D models into Standardized Agent Skills via MCP.
Install / Use
/learn @openclirun/OpencliQuality Score
Category
Development & EngineeringSupported Platforms
README
OpenCLI
<p align="center"> <img src="assets/image/favicon.svg" alt="OpenCLI logo" width="128" height="128"> </p>Pipe AI Models to your Terminal. Give Your Agents Hands and Eyes.
OpenCLI is the native Swift/MLX capability engine for the command line. Convert local models into modular Agent Skills. High performance, zero Python, 100% private. Optimized for OpenClaw and MCP.
An agent without sensors is just a chatbox. OpenCLI provides the physical layer for local AI. Built natively with Swift for Apple Silicon, it delivers the cold-start speed and modality support that server-side LLM runners lack.
- Native OpenClaw & MCP Support
- Unified Memory Hardware Sensing
- Zero Python dependencies at runtime
Quick Install (macOS)
brew tap openclirun/opencli
brew install opencli
(Or build from source using Swift Package Manager)
Know Your Hardware, Run Right-Sized Models
OpenCLI features a built-in fit command to instantly evaluate your hardware (RAM/Unified Memory) and score models based on fit, speed, and context limits.
$ opencli fit
Device: Apple M2 | total 16.0 GB | available 4.6 GB | model budget 3.9 GB
GPU: Apple M2 | backend: metal | unified_memory: true
Recommendations by task:
- [asr] Qwen3-ASR 1.7B 4bit | 🟡 Good | score 86.5 | GPU
- [chat] Qwen3 Chat 1.7B 4bit | 🟠 Marginal | score 82.5 | GPU
- [embedding] Qwen3 Embedding 0.6B 4bit DWQ | 🟢 Perfect | score 74.3 | GPU
- [i2i] Qwen Image Edit 2511 | 🔴 TooTight | score 58.9 | CPU+GPU
- [i2t] Qwen3 VL 4B Instruct 3bit | 🟠 Marginal | score 83.9 | GPU
- [i2v] LTX-2 Distilled (I2V) | 🔴 TooTight | score 59.7 | CPU+GPU
- [ocr] DeepSeek OCR | 🟠 Marginal | score 76.0 | GPU
- [rerank] Qwen3 Reranker 0.6B 4bit | 🟢 Perfect | score 71.9 | GPU
- [sr] SeedVR2 3B | 🟠 Marginal | score 77.9 | GPU
- [sts] LFM2.5 Audio 1.5B 6bit | 🟡 Good | score 84.7 | GPU
- [t2i] Qwen Image 2512 | 🔴 TooTight | score 58.7 | CPU+GPU
- [t2m] ACE-Step 1.5 | 🔴 TooTight | score 57.0 | CPU+GPU
- [t2v] LTX-2 Distilled (T2V) | 🔴 TooTight | score 60.3 | CPU+GPU
- [tts] Orpheus 3B 0.1 FT bf16 | 🟠 Marginal | score 85.3 | GPU
- [vad] Sortformer 4SPK v2.1 fp16 | 🟢 Perfect | score 68.3 | GPU
Capabilities & Local Models
OpenCLI focuses on running right-sized, hardware-optimized models that fit perfectly in your Mac's unified memory, bringing true multimodal capabilities directly to your terminal.
👁️ Vision (OCR, VLM, Embeddings)
See everything locally. From structured documents to real-time screen analysis for autonomous agents.
- Qwen3-VL 4B (Instruct 3bit): Fast and highly capable small multimodal vision.
- DeepSeek OCR / GLM-OCR: Lightning-fast, accurate local text extraction.
- Qwen3 Embedding & Reranker (0.6B 4bit): Ultra-efficient perfect fit for local semantic search.
- SeedVR2 3B: Spatial understanding and super-resolution models.
🎙️ Audio (ASR, TTS, VAD, STS)
Hear and speak natively. Ultra-low latency voice perception and multi-speaker cloned synthesis.
- Qwen3-ASR (1.7B 4bit) / Parakeet: Native speech-to-text with exceptional speed.
- Orpheus (3B bf16) / Qwen3-TTS / Pocket TTS: Lightweight, low-latency text-to-speech perfect for instant agent responses.
- LFM2.5 Audio (1.5B 6bit): Direct Speech-to-Speech (STS) handling.
- Sortformer (4SPK v2.1 fp16): Perfect-fit Voice Activity Detection (VAD) and speaker diarization.
🪄 Generator (Image, Video, Audio)
Create across dimensions. High-performance local generation for visual assets and 3D meshes.
- Flux.2 (Klein 4B): Pure Swift implementation of Flux.2 image generation. On-the-fly quantization (qint8/int4) ensures it runs efficiently on standard M-series Macs.
- Qwen Image 2512 & Image Edit: Advanced Image-to-Image (I2I) and Text-to-Image (T2I) generation.
- LTX-2 Distilled: Video generation bridging Text-to-Video (T2V) and Image-to-Video (I2V).
- ACE-Step 1.5: Advanced Text-to-Music/Audio generation.
🧠 LLM (Chat & Coding)
Think and build locally. Private reasoning, instruction following, and coding capabilities optimized for MLX.
- Qwen3-Instruct (1.7B 4bit): Highly capable reasoning and coding models optimized for Apple Silicon.
- Llama-Series: Built-in support for standard instruct and chat architectures.
Workflow Examples
Combine OpenCLI commands to build instant multimodal workflows using standard Unix pipes:
# A complete Voice-to-Voice pipeline in one line
opencli asr | opencli chat | opencli tts
Community & Docs
- Website: opencli.run
- Documentation: See the
docs/folder for specific model usage (e.g.,asr-qwen3.md,t2i-flux2.md).
License
This project is licensed under the MIT License.
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
