Results for "gpu-kernel"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

254 skills found · Page 1 of 9

NVIDIA / Open Gpu Kernel Modules

16.8k

NVIDIA Linux open GPU kernel module source

universal

Updated 8h ago

raspberrypi / Firmware

5.5k

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.

universal

Updated 1d ago

tile-ai / Tilelang

5.4k

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

universal

Updated 19h ago

NVIDIA / Cutile Python

2.0k

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

universal

cutilegpukernel+4

Updated 4h ago

Liu-xiandong / How To Optimize In GPU

1.3k

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

universal

elementwisegpu-accelerationhigh-performance-computing+4

Updated 1d ago

getkeops / Keops

1.2k

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

universal

Updated 5d ago

openai / Blocksparse

1.1k

Efficient GPU kernels for block-sparse matrix multiplication and convolution

universal

Updated 23d ago

ScalingIntelligence / KernelBench

889

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

universal

benchmarkcodegenevaluation+3

Updated 10h ago

RightNow-AI / Autokernel

861

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

zed

autoresearchcudagpu+3

Updated 2h ago

NVIDIA / TileGym

685

Helpful kernel tutorials and examples for tile-based GPU programming

universal

Updated 1d ago

a-hamdi / GPU

581

100 days of building GPU kernels!

universal

Updated 2d ago

perplexityai / Pplx Kernels

564

Perplexity GPU Kernels

universal

Updated 1d ago

NVIDIA / Nvshmem

493

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmers to perform one-sided communication from within CUDA kernels and on CUDA streams.

universal

communciationscppcuda+3

Updated 2h ago

microsoft / Antares

465

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.

universal

Updated 8d ago

ekondis / Mixbench

452

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)

universal

benchmarkcudagpu+4

Updated 10d ago

aSurgingRiver / WebView

429

Efficient UE browser uses CEF open source kernel; When the frame rate is 60 per second and the resolution is 4K, a single GPU is rendered, and the UE and browser will not lose frames. 8K frame rate does not decrease under multi GPU binding.

universal

uewebbrowserwebui

Updated 2d ago

yzhaiustc / Optimizing SGEMM On NVIDIA Turing GPUs

409

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

universal

cudagemmnvidia+1

Updated 4d ago

facebookincubator / Dynolog

367

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

universal

Updated 3d ago

meta-pytorch / KernelAgent

334

Autonomous GPU Kernel Generation & Optimization via Deep Agents

universal

Updated 1d ago

BrutPitt / GlslSmartDeNoise

309

Fast GPU deNoise spatial filter, with circular gaussian kernel, full configurable

universal

denoisedenoise-imagesdenoiser+8

Updated 17h ago