Pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.
Install / Use
/learn @PrunaAI/PrunaREADME
<img src="./docs/assets/images/logo.png" alt="Pruna AI Logo" width=400></img>
<img src="./docs/assets/images/element.png" alt="Element" width=10></img> Simply make AI models faster, cheaper, smaller, greener! <img src="./docs/assets/images/element.png" alt="Element" width=10></img>
<br> <br>
<img src="./docs/assets/images/triple_line.png" alt="Pruna AI Logo" width=600, height=30></img>
</div><img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Introduction
Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead. It provides a comprehensive suite of compression algorithms including caching, quantization, pruning, distillation and compilation techniques to make your models:
- Faster: Accelerate inference times through advanced optimization techniques
- Smaller: Reduce model size while maintaining quality
- Cheaper: Lower computational costs and resource requirements
- Greener: Decrease energy consumption and environmental impact
The toolkit is designed with simplicity in mind - requiring just a few lines of code to optimize your models. It supports various model types including LLMs, Diffusion and Flow Matching Models, Vision Transformers, Speech Recognition Models and more.
<img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Installation
Pruna is currently available for installation on Linux, MacOS and Windows. However, some algorithms impose restrictions on the operating system and might not be available on all platforms.
Before installing, ensure you have:
- Python 3.9 or higher
- Optional: CUDA toolkit for GPU support
Option 1: Install Pruna using pip
Pruna is available on PyPI, so you can install it using pip:
pip install pruna
Option 2: Install Pruna from source
You can also install Pruna directly from source by cloning the repository and installing the package in editable mode:
git clone https://github.com/PrunaAI/pruna.git
cd pruna
pip install -e .
<img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Quick Start
Getting started with Pruna is easy-peasy pruna-squeezy!
First, load any pre-trained model. Here's an example using Stable Diffusion:
from diffusers import StableDiffusionPipeline
base_model = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
Then, use Pruna's smash function to optimize your model. Pruna provides a variety of different optimization algorithms, allowing you to combine different algorithms to get the best possible results. You can customize the optimization process using SmashConfig:
from pruna import smash, SmashConfig
# Create and smash your model
smash_config = SmashConfig(["deepcache", "stable_fast"])
smashed_model = smash(model=base_model, smash_config=smash_config)
Your model is now optimized and you can use it as you would use the original model:
smashed_model("An image of a cute prune.").images[0]
<br>
You can then use our evaluation interface to measure the performance of your model:
from pruna.evaluation.task import Task
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.data.pruna_datamodule import PrunaDataModule
datamodule = PrunaDataModule.from_string("LAION256")
datamodule.limit_datasets(10)
task = Task("image_generation_quality", datamodule=datamodule)
eval_agent = EvaluationAgent(task)
eval_agent.evaluate(smashed_model)
This was the minimal example, but you are looking for the maximal example? You can check out our documentation for an overview of all supported algorithms as well as our tutorials for more use-cases and examples.
<img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Algorithm Overview
Since Pruna offers a broad range of optimization algorithms, the following table provides a high-level overview of all methods available in Pruna. For a detailed description of each algorithm, have a look at our documentation.
| Technique | Description | Speed | Memory | Quality |
|--------------|-----------------------------------------------------------------------------------------------|:-----:|:------:|:-------:|
| batcher | Groups multiple inputs together to be processed simultaneously, improving computational efficiency and reducing processing time. | ✅ | ❌ | ➖ |
| cacher | Stores intermediate results of computations to speed up subsequent operations. | ✅ | ➖ | ➖ |
| compiler | Optimizes the model with instructions for specific hardware. | ✅ | ➖ | ➖ |
| distiller | Trains a smaller, simpler model to mimic a larger, more complex model. | ✅ | ✅ | ❌ |
| quantizer | Reduces the precision of weights and activations, lowering memory requirements. | ✅ | ✅ | ❌ |
| pruner | Removes less important or redundant connections and neurons, resulting in a sparser, more efficient network. | ✅ | ✅ | ❌ |
| recoverer | Restores the performance of a model after compression. | ➖ | ➖ | ✅ |
| factorizer | Factorization batches several small matrix multiplications into one large fused operation. | ✅ | ➖ | ➖ |
| enhancer | Enhances the model output by applying post-processing algorithms such as denoising or upscaling. | ❌ | ➖ | ✅ |
| distributer | Distributes the inference, the model or certain calculations across multiple devices. | ✅ | ❌ | ➖ |
| kernel | Kernels are specialized GPU routines that speed up parts of the computation. | ✅ | ➖ | ➖ |
✅ (improves), ➖ (approx. the same), ❌ (worsens)
<br><br>
<p align="center"><img src="./docs/assets/images/single_line.png" alt="Pruna AI Logo" width=600, height=30></img></p> <br><img src="./docs/assets/images/pruna_sad.png" alt="Pruna Sad" width=20></img> FAQ and Troubleshooting
If you can not find an answer to your question or problem in our documentation, in our FAQs or in an existing issue, we are happy to help you! You can either get help from the Pruna community on Discord, join our Office Hours or open an issue on GitHub.
<img src="./docs/assets/images/pruna_heart.png" alt="Pruna Heart" width=20></img> Contributors
The Pruna package was made with 💜 by the Pruna AI team and our amazing contributors. Contribute to the repository to become part of the Pruna family!
<img src="./docs/assets/images/pruna_emotional.png" alt="Pruna Emotional" width=20></img> Citation
If you use Pruna in your research, feel free to cite the project! 💜
@misc{pruna,
title = {Efficient Machine Learning with Pruna},
year = {2023},
note = {Software available from pruna.ai},
url={https://www.pruna.ai/}
}
<br>
<p align="center"><img src="./docs/assets/images/triple_line.png" alt="Pruna AI Logo" width=600, height=30></img></p>Related Skills
claude-opus-4-5-migration
96.8kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
344.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
50.4k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.7kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
