Opencli

OpenCLI bridges the gap between raw MLX models and AI Agents. Convert local Vision, Audio, and 3D models into Standardized Agent Skills via MCP.

Generate Convert Improve

Install / Use

/learn @openclirun/Opencli

About this skill

Quality Score

0/100

README

OpenCLI

Pipe AI Models to your Terminal. Give Your Agents Hands and Eyes.

OpenCLI is the native Swift/MLX capability engine for the command line. Convert local models into modular Agent Skills. High performance, zero Python, 100% private. Optimized for OpenClaw and MCP.

An agent without sensors is just a chatbox. OpenCLI provides the physical layer for local AI. Built natively with Swift for Apple Silicon, it delivers the cold-start speed and modality support that server-side LLM runners lack.

Native OpenClaw & MCP Support
Unified Memory Hardware Sensing
Zero Python dependencies at runtime

Quick Install (macOS)

brew tap openclirun/opencli
brew install opencli

(Or build from source using Swift Package Manager)

Know Your Hardware, Run Right-Sized Models

OpenCLI features a built-in fit command to instantly evaluate your hardware (RAM/Unified Memory) and score models based on fit, speed, and context limits.

$ opencli fit

Device: Apple M2 | total 16.0 GB | available 4.6 GB | model budget 3.9 GB
GPU: Apple M2 | backend: metal | unified_memory: true

Recommendations by task:
- [asr] Qwen3-ASR 1.7B 4bit | 🟡 Good | score 86.5 | GPU
- [chat] Qwen3 Chat 1.7B 4bit | 🟠 Marginal | score 82.5 | GPU
- [embedding] Qwen3 Embedding 0.6B 4bit DWQ | 🟢 Perfect | score 74.3 | GPU
- [i2i] Qwen Image Edit 2511 | 🔴 TooTight | score 58.9 | CPU+GPU
- [i2t] Qwen3 VL 4B Instruct 3bit | 🟠 Marginal | score 83.9 | GPU
- [i2v] LTX-2 Distilled (I2V) | 🔴 TooTight | score 59.7 | CPU+GPU
- [ocr] DeepSeek OCR | 🟠 Marginal | score 76.0 | GPU
- [rerank] Qwen3 Reranker 0.6B 4bit | 🟢 Perfect | score 71.9 | GPU
- [sr] SeedVR2 3B | 🟠 Marginal | score 77.9 | GPU
- [sts] LFM2.5 Audio 1.5B 6bit | 🟡 Good | score 84.7 | GPU
- [t2i] Qwen Image 2512 | 🔴 TooTight | score 58.7 | CPU+GPU
- [t2m] ACE-Step 1.5 | 🔴 TooTight | score 57.0 | CPU+GPU
- [t2v] LTX-2 Distilled (T2V) | 🔴 TooTight | score 60.3 | CPU+GPU
- [tts] Orpheus 3B 0.1 FT bf16 | 🟠 Marginal | score 85.3 | GPU
- [vad] Sortformer 4SPK v2.1 fp16 | 🟢 Perfect | score 68.3 | GPU

Capabilities & Local Models

OpenCLI focuses on running right-sized, hardware-optimized models that fit perfectly in your Mac's unified memory, bringing true multimodal capabilities directly to your terminal.

👁️ Vision (OCR, VLM, Embeddings)

See everything locally. From structured documents to real-time screen analysis for autonomous agents.

Qwen3-VL 4B (Instruct 3bit): Fast and highly capable small multimodal vision.
DeepSeek OCR / GLM-OCR: Lightning-fast, accurate local text extraction.
Qwen3 Embedding & Reranker (0.6B 4bit): Ultra-efficient perfect fit for local semantic search.
SeedVR2 3B: Spatial understanding and super-resolution models.

🎙️ Audio (ASR, TTS, VAD, STS)

Hear and speak natively. Ultra-low latency voice perception and multi-speaker cloned synthesis.

Qwen3-ASR (1.7B 4bit) / Parakeet: Native speech-to-text with exceptional speed.
Orpheus (3B bf16) / Qwen3-TTS / Pocket TTS: Lightweight, low-latency text-to-speech perfect for instant agent responses.
LFM2.5 Audio (1.5B 6bit): Direct Speech-to-Speech (STS) handling.
Sortformer (4SPK v2.1 fp16): Perfect-fit Voice Activity Detection (VAD) and speaker diarization.

🪄 Generator (Image, Video, Audio)

Create across dimensions. High-performance local generation for visual assets and 3D meshes.

Flux.2 (Klein 4B): Pure Swift implementation of Flux.2 image generation. On-the-fly quantization (qint8/int4) ensures it runs efficiently on standard M-series Macs.
Qwen Image 2512 & Image Edit: Advanced Image-to-Image (I2I) and Text-to-Image (T2I) generation.
LTX-2 Distilled: Video generation bridging Text-to-Video (T2V) and Image-to-Video (I2V).
ACE-Step 1.5: Advanced Text-to-Music/Audio generation.

🧠 LLM (Chat & Coding)

Think and build locally. Private reasoning, instruction following, and coding capabilities optimized for MLX.

Qwen3-Instruct (1.7B 4bit): Highly capable reasoning and coding models optimized for Apple Silicon.
Llama-Series: Built-in support for standard instruct and chat architectures.

Workflow Examples

Combine OpenCLI commands to build instant multimodal workflows using standard Unix pipes:

# A complete Voice-to-Voice pipeline in one line
opencli asr | opencli chat | opencli tts

Community & Docs

Website: opencli.run
Documentation: See the docs/ folder for specific model usage (e.g., asr-qwen3.md, t2i-flux2.md).

License

This project is licensed under the MIT License.

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。