38 skills found · Page 1 of 2
open-compress / Claw Compactor14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
diegosouzapw / OmniRouteOmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for reliable, cost-aware inference.
thushan / OllaHigh-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
greynewell / InfermuxRoute inference across LLM providers. Track cost per request.
T-Sunm / Rag OpsThis project applies the core knowledge from the LLMOps module, including the design and implementation of the API Layer, Inference Layer, Observability Layer, Cache Layer, Guardrails Layer, Routing Layer, and the Data Ingestion Pipeline.
ZhenweiAn / Dynamic MoEInference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"
JakeFenley / Koa Zod RouterBuild typesafe routes for Koa with ease. Utilizes Typescript, Zod, and Koa-Router to provide an easy solution to I/O validation and type inference.
jrf0110 / 8trackA service worker router with async middleware and neato type-inference inspired by Koa
nhevers / Mica PluginClaude Code plugin - route compute through MVM nodes on cheap renewable energy. Save tokens, cut inference costs.
shahghasiadil / Laravel Bruno GeneratorGenerate Bruno API collections from Laravel routes with automatic request body inference and environment support
pmh / FunkywebThe clojure web framework with route inference
pmerolla / FomoeFast Opportunistic Mixture-Of-Experts. From-scratch C/HIP MoE inference with multi-tier caching and cache-aware routing. First ever example of running Qwen3.5-397B at 5–9 tok/s on a $2,100 desktop.
bug-ops / ZephRust AI agent where every context token earns its place. Self-learning skills, temporal graph memory, cascade quality routing, OWASP AI security. Hybrid inference: Ollama · Claude · Gemini · OpenAI · GGUF. MCP + ACP. One binary.
lingticio / Llmg🧘 Extensive LLM endpoints, expended capabilities through your favorite protocols, 🕸️ GraphQL, ↔️ gRPC, ♾️ WebSocket. Extended SOTA support for structured data, function calling, instruction mapping, load balancing, grouping, intelli-routing. Advanced tracing and inference tracking.
aiming-lab / CITER[COLM'25] CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
huanyuhello / Awesome Dynamic InferenceA list for dynamic inference research, including: dynamic routing, anytime inference and conditional computation
ialacol / Text Inference BatcherA high performance batching router optimises max throughput for text inference workload
www-norma-dev / IONOS Simple ChatbotIONOS AI Chatbot is a starter pack built around a core ReAct agent with a FastAPI backend and Streamlit frontend for building intelligent conversational AI. It connects to IONOS Hub for inference models and IONOS Studio for fine-tuned models, supports real-time web search, model routing, and tool calling. It’s your European go-to solution for Infra
olwal / Scope AI LanguageGenerative AI plugins for language-driven, real-time video inference and generation. Ollama VLM/LLM pipelines and UDP prompt routing, built on shared libraries for communication and AI services (scope-bus, scope-language).
expresso / RouterExpress router with automatic type inference, validation, and OpenAPI documentation generation