1,691 skills found · Page 4 of 57
google / QkerasQKeras: a quantization deep learning library for Tensorflow Keras
yahoo / LopqTraining of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
DerryHub / BEVFormer TensorrtBEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
slavabarkov / TidyOffline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
csyhhu / Awesome Deep Neural Network CompressionSummary, Code for Deep Neural Network Quantization
ilyakurdyukov / Jpeg QuantsmoothJPEG artifacts removal based on quantization coefficients.
cedrickchee / Awesome Ml Model CompressionAwesome machine learning model compression research papers, quantization, tools, and learning material.
Jermmy / Pytorch Quantization DemoA simple network quantization demo using pytorch from scratch.
xlite-dev / Awesome DiT Inference📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
BUG1989 / Caffe Int8 Convert ToolsGenerate a quantization parameter file for ncnn framework int8 inference
HaloTrouvaille / YOLO Multi Backbones AttentionModel Compression—YOLOv3 with multi lightweight backbones(ShuffleNetV2 HuaWei GhostNet), attention, prune and quantization
leeoniya / RgbQuant.jscolor quantization lib
Zhen-Dong / HAWQQuantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
okuvshynov / SlowllamaFinetune llama2-70b and codellama on MacBook Air without quantization
modelscope / FunCodecFunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
mightydeveloper / Deep Compression PyTorchPyTorch implementation of 'Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding' by Song Han, Huizi Mao, William J. Dally
SqueezeAILab / KVQuant[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
iuliaturc / Gguf DocsDocs for GGUF quantization (unofficial)
PotatoSpudowski / FastLLaMafastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.