138 skills found · Page 1 of 5
RangiLyu / NanodetNanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
Lightning-AI / Lit LlamaImplementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
PINTO0309 / PINTO Model ZooA repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
intel / Neural CompressorSOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
ppogg / YOLOv5 Lite🍅🍅🍅YOLOv5-Lite: Evolved from yolov5 and the size of model is only 900+kb (int8) and 1.7M (fp16). Reach 15 FPS on the Raspberry Pi 4B~
666DZY666 / Micronetmicronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
RWKV / Rwkv.cppINT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
CaoWGG / TensorRT CenterNettensorrt5 , centernet , centerface, deform conv, int8
grimoire / Mmdetection To Tensorrtconvert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.
DerryHub / BEVFormer TensorrtBEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
BUG1989 / Caffe Int8 Convert ToolsGenerate a quantization parameter file for ncnn framework int8 inference
intel / Neural SpeedAn innovative library for efficient LLM inference via low-bit quantization
AlexeyAB / Yolo2 LightLight version of convolutional neural network Yolo v3 & v2 for objects detection with a minimum of dependencies (INT8-inference, BIT1-XNOR-inference)
yaof20 / Flash RLImplementation for FP8/INT8 Rollout for RL training without performence drop.
clancylian / RetinafaceReimplement RetinaFace use C++ and TensorRT
PINTO0309 / Tflite2tensorflowGenerate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite, ONNX, OpenVINO, Myriad Inference Engine blob and .pb from .tflite. Support for building environments with Docker. It is possible to directly access the host PC GUI and the camera to verify the operation. NVIDIA GPU (dGPU) support. Intel iHD GPU (iGPU) support. Supports inverse quantization of INT8 quantization model.
dseditor / QwenASRMiniTool基於OpenVino-int8權重,精簡的QwenASR小工具,用於即時辨識以及字幕轉換使用
TNTWEN / OpenVINO YOLOV4This is implementation of YOLOv4,YOLOv4-relu,YOLOv4-tiny,YOLOv4-tiny-3l,Scaled-YOLOv4 and INT8 Quantization in OpenVINO2021.3
Wulingtian / Yolov5 Tensorrt Int8 Toolstensorrt int8 量化yolov5 onnx模型
maggiez0138 / Swin Transformer TensorRTThis project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.