4 skills found
Liu-xiandong / How To Optimize In GPUThis is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
loveSunning / FastCudaFastCuda is a handwritten CUDA operator library featuring progressive GEMM and Reduce kernels, cuBLAS benchmarking, and C/C++/Python interfaces for learning, profiling, and performance optimization.
yzhaiustc / Optimizing SGEMV On NVIDIA GPUsAn implementation of SGEMV with performance comparable to cuBLAS.
efocht / Sgemv IntrinsicsOptimized matrix-vector multiply for fp32 and bf16 dense matrix for SX-Aurora Vector Engine VE1 and VE2