4 skills found
intel / Neural CompressorSOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
mit-han-lab / Smoothquant[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
ModelTC / LightCompress[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
AniZpZ / AutoSmoothQuantAn easy-to-use package for implementing SmoothQuant for LLMs