Results for "model-quantization"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

427 skills found · Page 1 of 15

bitsandbytes-foundation / Bitsandbytes

8.1k

Accessible large language models via k-bit quantization for PyTorch.

universal

llmmachine-learningpytorch+2

Updated 5m ago

Lightning-AI / Lit Llama

6.1k

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

universal

Updated 1d ago

city96 / ComfyUI GGUF

3.4k

GGUF Quantization support for native ComfyUI models

universal

Updated 4h ago

thu-ml / SageAttention

3.3k

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

zed

attentioncudaefficient-attention+9

Updated 16h ago

intel / Neural Compressor

2.6k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

universal

auto-tuningawqfp4+14

Updated 2h ago

quic / Aimet

2.6k

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

universal

auto-mlcompressiondeep-learning+8

Updated 3h ago

Efficient-ML / Awesome Model Quantization

2.3k

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

zed

awesomebinarized-neural-networksbinary-network+7

Updated 17h ago

microsoft / Olive

2.3k

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

universal

Updated 1d ago

666DZY666 / Micronet

2.3k

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

universal

batch-normalization-fusebnnconvolutional-networks+17

Updated 10d ago

NVIDIA / Model Optimizer

2.3k

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

universal

Updated 6h ago

horseee / Awesome Efficient LLM

2.0k

A curated list for Efficient Large Language Models

universal

compressionefficient-llmknowledge-distillation+5

Updated 3h ago

mit-han-lab / Smoothquant

1.6k

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

universal

Updated 22h ago

tensorflow / Model Optimization

1.6k

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

zed

compressiondeep-learningkeras+11

Updated 7h ago

Vahe1994 / AQLM

1.3k

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852

universal

Updated 1d ago

jakerdliu / OpenTrit CHN

1.3k

• OpenTrit, an open-source cross-framework mixed ternary toolkit, supports one-click conversion of mixed ternary models between PyTorch and TensorFlow. It encapsulates heterogeneous computing power scheduling and quantization optimization, addressing the issues of "framework dependency and poor usability" present in existing ternary tools.

universal

Updated 5m ago

ModelCloud / GPTQModel

1.1k

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

universal

gptqoptimumpeft+4

Updated 12h ago

HarryR / Z80ai

1.1k

Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export them as CP/M .COM binaries, and chat with your vintage computer.

zed

chatbotcode-golfcpm+8

Updated 1d ago

Geekgineer / YOLOs CPP

920

Cross-Platform Production-ready C++ inference engine for YOLO models (v5-v12, YOLO26). Unified API for detection, segmentation, pose estimation, OBB, and classification. Built on ONNX Runtime and OpenCV. Optimized for CPU/GPU with quantization support.

zed

Updated 2h ago

ModelTC / MQBench

862

Model Quantization Benchmark

universal

Updated 7d ago

guan-yuan / Awesome AutoML And Lightweight Models

856

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

zed

architecture-searchautomated-feature-engineeringautoml+12

Updated 4d ago