Results for "inference-serving"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

70 skills found · Page 1 of 3

vllm-project / Vllm

74.6k

A high-throughput and memory-efficient inference and serving engine for LLMs

universal

amdblackwellcuda+17

Updated 14m ago

ai-dynamo / Dynamo

6.4k

A Datacenter Scale Distributed Inference Serving Framework

universal

Updated 1h ago

skyzh / Tiny Llm

4.0k

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

universal

courselarge-language-modelllm+5

Updated 25m ago

ModelTC / LightLLM

4.0k

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

universal

deep-learninggptllama+4

Updated 1h ago

containers / Ramalama

2.7k

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

universal

aicontainerscuda+8

Updated 50m ago

awslabs / Multi Model Server

1.0k

Multi Model Server is a tool for serving neural net models for inference

universal

aideep-learninginference+4

Updated 2d ago

SeldonIO / MLServer

881

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more

universal

kfservinglightgbmmachine-learning+4

Updated 3d ago

EricLBuehler / Candle Vllm

627

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

universal

Updated 2h ago

NLPOptimize / Flash Tokenizer

513

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

zed

bertberttokenizercpp+11

Updated 1h ago

hpcaitech / SwiftInfer

480

Efficient AI Inference & Serving

universal

artificial-intelligencedeep-learninggpt+5

Updated 16d ago

cuckoo-network / Cuckoo

408

Cuckoo is a Decentralized AI Model-Serving Platform, starting with GPU-sharing for text-to-image generation and LLM inference.

zed

aiblockchaindecentralized-ai+1

Updated 5d ago

aws / Sagemaker Containers

188

WARNING: This package has been deprecated. Please use the SageMaker Training Toolkit for model training and the SageMaker Inference Toolkit for model serving.

universal

Updated 2mo ago

aws / Sagemaker Pytorch Inference Toolkit

142

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.

universal

Updated 3mo ago

psmarter / Mini Infer

127

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, MoE expert parallelism, OpenAI-compatible serving

universal

continuous-batchingcudainference+13

Updated 16h ago

anyscale / E2e Llm Workflows

121

Fine-tune an LLM to perform batch inference and online serving.

universal

Updated 1mo ago

anyscale / Multimodal AI

107

Multimodal AI workloads: batch inference, model training and online serving.

universal

Updated 1mo ago

SJTU-IPADS / Reef

106

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

universal

Updated 2d ago

yuanmu97 / Secure Transformer Inference

[NDSS 2026] Secure Transformer Inference is a protocol for serving Transformer-based models securely.

universal

Updated 5d ago

stanford-mast / INFaaS

Model-less Inference Serving

universal

Updated 1mo ago

aerlabsAI / AI Inference Resources

Curated collection of AI inference engineering resources — LLM serving, GPU kernels, quantization, distributed inference, and production deployment. Compiled from the AER Labs community.

universal

Updated 2d ago