254 skills found · Page 1 of 9
NVIDIA / Open Gpu Kernel ModulesNVIDIA Linux open GPU kernel module source
raspberrypi / FirmwareThis repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
tile-ai / TilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
NVIDIA / Cutile PythoncuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Liu-xiandong / How To Optimize In GPUThis is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
getkeops / KeopsKErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
openai / BlocksparseEfficient GPU kernels for block-sparse matrix multiplication and convolution
ScalingIntelligence / KernelBenchKernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
RightNow-AI / AutokernelAutoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
NVIDIA / TileGymHelpful kernel tutorials and examples for tile-based GPU programming
a-hamdi / GPU100 days of building GPU kernels!
perplexityai / Pplx KernelsPerplexity GPU Kernels
NVIDIA / NvshmemNVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmers to perform one-sided communication from within CUDA kernels and on CUDA streams.
microsoft / AntaresAntares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
ekondis / MixbenchA GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
aSurgingRiver / WebViewEfficient UE browser uses CEF open source kernel; When the frame rate is 60 per second and the resolution is 4K, a single GPU is rendered, and the UE and browser will not lose frames. 8K frame rate does not decrease under multi GPU binding.
yzhaiustc / Optimizing SGEMM On NVIDIA Turing GPUsOptimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
facebookincubator / DynologDynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
meta-pytorch / KernelAgentAutonomous GPU Kernel Generation & Optimization via Deep Agents
BrutPitt / GlslSmartDeNoiseFast GPU deNoise spatial filter, with circular gaussian kernel, full configurable