15 skills found
npuichigo / Openai TrtllmOpenAI compatible API for TensorRT LLM triton backend
NetEase-Media / GrpsDeep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
NetEase-Media / Grps TrtllmHigher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
lightseekorg / SmgEngine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.
SqueezeBits / Torch TRTLLMDitto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
TRT2022 / Trtllm Llama☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
EdVince / Whisper TrtllmWhisper in TensorRT-LLM
bentoml / BentoTRTLLMNo description available
splinter21 / F5 Tts TrtllmNo description available
JohnTDI-cpu / Trtllm Nvfp4 Blackwell FixRunning 30B MoE models in NVFP4 on RTX 5090 (32GB) - C++ runtime patches for TensorRT-LLM v1.2.0rc4
sriharshapy / NVIDIA Triton Trtllm Prometheus K8sNo description available
Jackch-NV / TRTLLM W4afp8 Fp8 Mix InferenceNo description available
kshitizgupta21 / Triton Trtllm GuideInstallation and usage guide for Triton TRT-LLM
Wenhan-Tan / EKS Multinode Triton TRTLLMNo description available
chrjxj / Triton Trtllm Toolsdeployment and test tools for Nvidia TensorRT-LLM and its Triton backend