English | 简体中文

<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://github.com/user-attachments/assets/42b0039f-39e3-4279-afda-6d1865dfbffb" width="500"></a> <a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a> <a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a> <a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a> <a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a> <a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a> <a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a> <a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/installation/nvidia_gpu/"> 安装指导 </a> | <a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/quick_start"> 快速入门 </a> | <a href="https://paddlepaddle.github.io/FastDeploy/zh/supported_models/"> 支持模型列表 </a>

FastDeploy 飞桨大模型高效部署套件

关于

FastDeploy 是基于飞桨（PaddlePaddle）的大语言模型（LLM）与视觉语言模型（VLM）推理部署工具包，提供开箱即用的生产级部署方案，核心技术特性包括：

🚀 负载均衡式PD分解：工业级解决方案，支持上下文缓存与动态实例角色切换，在保障SLO达标和吞吐量的同时优化资源利用率
🔄 统一KV缓存传输：轻量级高性能传输库，支持智能NVLink/RDMA选择
🤝 OpenAI API服务与vLLM兼容：单命令部署，兼容vLLM接口
🧮 全量化格式支持：W8A16、W8A8、W4A16、W4A8、W2A16、FP8等
⏩ 高级加速技术：推测解码、多令牌预测（MTP）及分块预填充
🖥️ 多硬件支持：NVIDIA GPU、昆仑芯XPU、海光DCU、天数智芯GPU、燧原GCU、沐曦GPU、英特尔Gaudi等

要求

操作系统: Linux
Python: 3.10 ~ 3.12

安装

FastDeploy 支持在英伟达（NVIDIA）GPU、昆仑芯（Kunlunxin）XPU、天数（Iluvatar）GPU、燧原（Enflame）GCU、海光（Hygon）DCU 以及其他硬件上进行推理部署。详细安装说明如下：

入门指南

通过我们的文档了解如何使用 FastDeploy：

支持模型列表

通过我们的文档了解如何下载模型，如何支持torch格式等：

模型支持列表

进阶用法

致谢

FastDeploy 依据 Apache-2.0 开源许可证. 进行授权。在开发过程中，我们参考并借鉴了 vLLM 的部分代码，以保持接口兼容性，在此表示衷心感谢。

FastDeploy

Install / Use

README

FastDeploy 飞桨大模型高效部署套件

最新活动

关于

要求

安装

入门指南

支持模型列表

进阶用法

致谢

Related Skills