Parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

Generate Convert Improve

Install / Use

/learn @GradientHQ/Parallax

About this skill

Quality Score

0/100

README

<div align="center"> <p align="center"> <img src="docs/images/parallax.png" width="720"> <div align="center"> <p style="font-size: 1.3em; font-weight: 600; margin-bottom: 10px;">Trusted by Partners</p> <img src="docs/images/sglang.png" alt="SGLang" height="28" style="margin: 0 20px;"> <img src="docs/images/vllm.png" alt="vLLM" height="30" style="margin: 0 20px;"> <img src="docs/images/qwen.avif" alt="Qwen" height="30" style="margin: 0 20px;"> <img src="docs/images/deepseek.png" alt="DeepSeek" height="30" style="margin: 0 20px;"> <img src="docs/images/kimi.png" alt="Kimi" height="30" style="margin: 0 20px;"> <img src="docs/images/minimax.png" alt="Minimax" height="30" style="margin: 0 10px;"> <img src="docs/images/zai.svg" alt="ZAI" height="30" style="margin: 0 10px;"/> </div> </p>

</div>

News

[2026/2] 🦞 Parallax now supports OpenClaw integration! See Docs
[2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
[2025/10] 🔥 Parallax version 0.0.1 has been released!

About

A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:

Host local LLM on personal devices
Cross-platform support
Pipeline parallel model sharding
Paged KV cache management & continuous batching for Mac
Dynamic request scheduling and routing for high performance

The backend architecture:

P2P communication powered by Lattica
GPU backend powered by SGLang and vLLM
MAC backend powered by MLX LM

User Guide

Contributing

We warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.

Supported Models

| | Provider | HuggingFace Collection | Blog | Description | |:-------------|:-------------|:----------------------------:|:----------------------------:|:----------------------------| |DeepSeek | Deepseek | DeepSeek-V3.2<br>DeepSeek-R1 <br>| Deep Seek AI Launches Revolutionary Language Model | Deep Seek AI is proud to announce the launch of our latest language model, setting new standards in natural language processing and understanding. This breakthrough represents a significant step forward in AI technology, offering unprecedented capabilities in text generation, comprehension, and analysis. | |MiniMax-M2 | MiniMax AI | MiniMax-M2<br>MiniMax-M2.1 | MiniMax M2.1: Significantly Enhanced Multi-Language Programming | MiniMax-M2.1 is an enhanced sparse MoE model (230B parameters, 10B active) built for advanced coding and agentic workflows. It offers state-of-the-art intelligence, delivering efficient, reliable tool use and strong multi-step reasoning. | |GLM | Z AI | GLM-4.7 <br>GLM-4.7-Flash | GLM-4.7: Advancing the Coding Capability | "GLM" is an advanced large language model series from Z AI, including GLM-4.6 and GLM-4.7. These models feature long-context support, strong coding and reasoning performance, enhanced tool-use and agent integration, and competitive results across leading open-source benchmarks. | |Kimi-K2 | Moonshot AI | Kimi-K2 | Kimi K2: Open Agentic Intelligence | "Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Base, Kimi-K2-Instruct and Kimi-K2-Thinking. Kimi K2 Thinking is a state-of-the-art open-source agentic model designed for deep, step-by-step reasoning and dynamic tool use. It features native INT4 quantization and a 256k context window for fast, memory-efficient inference. Uniquely stable in long-horizon tasks, Kimi K2 enables reliable autonomous workflows with consistent performance across hundreds of tool calls. |Qwen | Qwen | Qwen3-Next <br>Qwen3 <br>Qwen2.5| Qwen3-Next: Towards Ultimate Training & Inference Efficiency | The Qwen series is a family of large language models developed by Alibaba's Qwen team. It includes multiple generations such as Qwen2.5, Qwen3, and Qwen3-Next, which improve upon model architecture, efficiency, and capabilities. The models are available in various sizes and instruction-tuned versions, with support for cutting-edge features like long context and quantization. Suitable for a wide range of language tasks and open-source use cases. | |gpt-oss | OpenAI | gpt-oss <br>gpt-oss-safeguard | Introducing gpt-oss-safeguard | gpt-oss are OpenAI’s open-weight GPT models (20B & 120B). The gpt-oss-safeguard variants are reasoning-based safety classification models: developers provide their own policy at inference, and the model uses chain-of-thought to classify content and explain its reasoning. This allows flexible, policy-driven moderation in complex or evolving domains, with open weights under Apache 2.0. | |Meta Llama 3 | Meta | Meta Llama 3 <br>Llama 3.1 <br>Llama 3.2 <br>Llama 3.3 | Introducing Meta Llama 3: The most capable openly available LLM to date | "Meta Llama 3" is Meta's third-generation Llama model, available in sizes such as 8B and 70B parameters. Includes instruction-tuned and quantized (e.g., FP8) variants. |

Related Skills

claude-opus-4-5-migration

83.9k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

339.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

openhue

339.5k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

339.5k

ElevenLabs text-to-speech with mac-style say UX.