25 skills found
NERSC / TimemoryModular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
yao-jz / Intra Kernel ProfilerRegion-level profiling for CUDA kernels with trace, NVBit, CUPTI, and an interactive Explorer.
psmarter / CUDA PracticeCUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.
flashinfer-ai / Cubloatya size profiler for cuda binary
NVIDIA / Cuda ProfilerTools and extensions for CUDA profiling
ProjectPhysX / PTXprofilerA simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
harrism / Nsys EasyEasier, quicker command-line CUDA profiling
HAWAIILAB / Cuda FluxCUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
cea-hpc / HARPSmall tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
loveSunning / FastCudaFastCuda is a handwritten CUDA operator library featuring progressive GEMM and Reduce kernels, cuBLAS benchmarking, and C/C++/Python interfaces for learning, profiling, and performance optimization.
cwpearson / CuptiProfile how CUDA applications create and modify data in memory.
enp1s0 / CULiPLibrary for profiling the execution time of CUDA official library functions
RightNow-AI / Gpu ProfilerOpen-source web-based GPU performance visualization tool that transforms NVIDIA profiling data into interactive insights for CUDA engineers. Features timeline views, flame graphs, heatmaps, and AI-powered bottleneck detection.
bekli23 / GpuCrackerHigh-performance modular tool for BIP39 mnemonic recovery and custom AKM (Advanced Key Mapping) profile verification using CUDA, OpenCL, and Vulkan.
Jorgedavyd / Nsight.nvimA developer oriented Neovim framework for CUDA performance profiling and analysis.
Enigmatisms / TachyonTachyon: AI-empowered CUDA kernel profiler and self-evolving tool with end-to-end or post profiling (NCU-report) analysis. Metrics & PTX/SASS & Source code backtracing supported!
Kobzol / Cuda ProfileInstrumentation based profiler for CUDA (master thesis)
JamesTheZ / CudaProfA profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.
Kirrito-k423 / Putilsputils is a utility library for debugging and profiling distributed AI model training workloads, with specialized support for NPU (Neural Processing Unit) and CUDA environments.
cwpearson / OpenvprofSome open CUDA profiling/tracing tools