TheGEMMCoreProject
SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
Install / Use
/learn @nikhiledm97/TheGEMMCoreProjectREADME
TheGEMMCoreProject
(Deprecated) SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations.
Note: Although these modules are performing the same "operations", they're by no means really emulating the actual microarchitecture executing CUDA Core/Tensor Core/MXU instructions. Think of this as an introductory educational repo for FP arithmetic digital design. You could however use these modules as a quick alternative to say a prototype FPU in your FPGA design.
If you're interested in going deeper, I highly recommend checking out my work on the Vortex GPGPU's Tensor Core Unit (TCU) extension's DRL Floating Point RTL backend for a significantly more researched, optimized and realistic microarchitecture implementation.
