46 skills found · Page 1 of 2
Lightning-AI / Lit LlamaImplementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
intel / Neural CompressorSOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
666DZY666 / Micronetmicronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
DerryHub / BEVFormer TensorrtBEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
BUG1989 / Caffe Int8 Convert ToolsGenerate a quantization parameter file for ncnn framework int8 inference
PINTO0309 / Tflite2tensorflowGenerate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite, ONNX, OpenVINO, Myriad Inference Engine blob and .pb from .tflite. Support for building environments with Docker. It is possible to directly access the host PC GUI and the camera to verify the operation. NVIDIA GPU (dGPU) support. Intel iHD GPU (iGPU) support. Supports inverse quantization of INT8 quantization model.
TNTWEN / OpenVINO YOLOV4This is implementation of YOLOv4,YOLOv4-relu,YOLOv4-tiny,YOLOv4-tiny-3l,Scaled-YOLOv4 and INT8 Quantization in OpenVINO2021.3
willard-yuan / CvtCVT, a Computer Vision Toolkit.
jundaf2 / INT8 Flash Attention FMHA QuantizationNo description available
NJU-Jet / SR Mobile QuantizationWinner solution of mobile AI (CVPRW 2021).
GiorgosXou / NeuralNetworksA header-only neural network library for microcontrollers, with partial bare-metal & native-os support.
clovaai / FrostnetFrostNet: Towards Quantization-Aware Network Architecture Search
jahongir7174 / YOLOv8 QatQuantization Aware Training
TianzhongSong / Tensorflow Quantization TestTensorflow quantization (float32-->int8) inference test
xuanandsix / Tensorrt Int8 Quantization Piplinea simple pipline of int8 quantization based on tensorrt.
caslabai / Yolov3tiny Tensorflow Int8 Quantizedyolov3_tiny implement on tensoeflow for int8 quantization (tflite)
BoumedineBillal / Esp32 P4 Vehicle ClassifierProduction-ready vehicle classification on ESP32-P4 with MobileNetV2 INT8 quantization. 3 optimized variants: 70ms-459ms latency. Hardware-validated, ready-to-flash projects included.
Howell-Yang / Onnx2trt将端上模型部署过程中,常见的问题以及解决办法记录并汇总,希望能给其他人带来一点帮助。
GiorgosXou / ATTiny85 MNIST RNN EEPROMATtiny85 arduino example, running an RNN MNIST model via the (internal) 512-Byte EEPROM with ~95% accuracy
xigh / Herbert RsLocal LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.