Results for "gpu-inference"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

291 skills found · Page 1 of 10

lyogavin / Airllm

14.8k

AirLLM 70B inference with single 4GB GPU

universal

chinese-llmchinese-nlpfinetune+10

Updated 20m ago

NVIDIA / TensorRT LLM

13.3k

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

universal

blackwellcudallm-serving+2

Updated 1h ago

NVIDIA / TensorRT

12.9k

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

universal

deep-learninggpu-accelerationinference+2

Updated 1h ago

intel / Ipex Llm

8.7k

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

universal

gpullmpytorch+1

Updated 12h ago

aidlearning / AidLearning FrameWork

5.8k

🔥🔥🔥AidLearning is a powerful AIOT development platform, AidLearning builds a linux env supporting GUI, deep learning and visual IDE on Android...Now Aid supports CPU+GPU+NPU for inference with high performance acceleration...Linux on Android or HarmonyOS

vscode copilot

aiosaiotandroid+16

Updated 16h ago

NVIDIA / DALI

5.7k

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

zed

audio-processingdata-augmentationdata-processing+12

Updated 16h ago

gpustack / Gpustack

4.8k

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

universal

ascendcudadeepseek+15

Updated 4h ago

facebookincubator / AITemplate

4.7k

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

zed

Updated 1d ago

turboderp-org / Exllamav2

4.5k

A fast inference library for running LLMs locally on modern consumer-class GPUs

universal

Updated 5h ago

NVIDIA / TransformerEngine

3.3k

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

universal

cudadeep-learningfp4+6

Updated 43m ago

dstackai / Dstack

2.1k

Control plane for agents and engineers to provision compute and run training and inference across NVIDIA, AMD, TPU, and Tenstorrent GPUs—on clouds, Kubernetes, and bare-metal clusters.

universal

agent-skillsagentic-orchestrationamd+15

Updated 20h ago

xororz / Local Dream

2.0k

Run Stable Diffusion on Android Devices with Snapdragon NPU acceleration. Also supports CPU/GPU inference.

universal

img2imginpaintingstable-diffusion+2

Updated 26m ago

XiongjieDai / GPU Benchmarks On LLM Inference

1.9k

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

universal

Updated 4d ago

ELS-RD / Transformer Deploy

1.7k

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

universal

deep-learningdeploymentinference+3

Updated 1d ago

beam-cloud / Beta9

1.6k

Ultrafast serverless GPU inference, sandboxes, and background jobs

universal

autoscalercloudruncuda+15

Updated 21h ago

Tencent / TurboTransformers

1.5k

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

universal

albertbertdecoder+9

Updated 1d ago

mit-han-lab / Torchsparse

1.5k

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

universal

accelerationpytorch

Updated 1d ago

chengzeyi / Stable Fast

1.3k

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

universal

cudadeeplearnngdiffusers+7

Updated 17h ago

BabitMF / Bmf

1.0k

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.

universal

aiarmbmf+17

Updated 3d ago

Geekgineer / YOLOs CPP

929

Cross-Platform Production-ready C++ inference engine for YOLO models (v5-v12, YOLO26). Unified API for detection, segmentation, pose estimation, OBB, and classification. Built on ONNX Runtime and OpenCV. Optimized for CPU/GPU with quantization support.

zed

Updated 1d ago