LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Install / Use
/learn @ModelTC/LightLLMREADME
<div align="center"> </div>
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.
English Docs | 中文文档 | Blogs
Tech Blogs
- [2025/11] 🚀 Prefix KV Cache Transfer between DP rankers is now supported! Check out the technical deep dive in our blog post.
News
- [2025/09] 🔥 LightLLM v1.1.0 release!
- [2025/08] Pre $^3$ achieves the outstanding paper award of ACL2025.
- [2025/05] LightLLM paper on constrained decoding accepted by ACL2025 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: LightLLM Blog
- [2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
- [2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.
Get started
Performance
Learn more in the release blogs: v1.1.0 blog.
FAQ
Please refer to the FAQ for more information.
Projects using LightLLM
We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.
Projects based on LightLLM or referenced LightLLM components:
- LoongServe, Peking University
- vLLM (some LightLLM's kernel used)
- SGLang (some LightLLM's kernel used)
- ParrotServe, Microsoft
- Aphrodite (some LightLLM's kernel used)
- S-LoRA
- OmniKV, Ant Group
- Lab4AI LightLLM+LlamaIndex, Lab4AI LightLLM+Qwen3-8B
- LazyLLM
Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.
Academia works based on or use part of LightLLM:
- ParrotServe (OSDI’24)
- SLoRA (MLSys’24)
- LoongServe (SOSP’24)
- ByteDance’s CXL (Eurosys’24)
- VTC (OSDI’24)
- OmniKV (ICLR’25)
- CaraServe, LoRATEE, FastSwitch ...
Community
For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!
License
This repository is released under the Apache-2.0 license.
Acknowledgement
We learned a lot from the following projects when developing LightLLM.
- Faster Transformer
- Text Generation Inference
- vLLM
- SGLang
- flashinfer
- Flash Attention 1&2
- OpenAI Triton
Citation
We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.
constrained decoding: accepted by ACL2025 and achieved the outstanding paper award.
@inproceedings{
anonymous2025pre,
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - February 2025},
year={2025},
url={https://openreview.net/forum?id=g1aBeiyZEi},
note={under review}
}
Request scheduler: accepted by ASPLOS’25:
@inproceedings{gong2025past,
title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
pages={798--813},
year={2025}
}
Related Skills
diffs
340.5kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.9kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
