24 skills found
facebookincubator / AITemplateAITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
gpgpu-sim / Gpgpu Sim DistributionGPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
enp1s0 / OzIMMUFP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
wmmae / Wmma ExtensionAn extension library of WMMA API (Tensor Core API)
stillwater-sc / RISC V TensorCoreTransactional Verilog design and Verilator Testbench for a RISC-V TensorCore Vector co-processor for reproducible linear algebra
YukeWang96 / TC GNN ATC23Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
nox-410 / Tvm.tlAn extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
c3sr / Tcu ScopeNo description available
gty111 / GEMM MMAOptimize GEMM with tensorcore step by step
natu4u / GSOC TensorCoreTensorCore Vector Processor for Deep Learning - Google Summer of Code Project
nikhiledm97 / TheGEMMCoreProjectSystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
ahennequ / Cuda Tensorcores Register MappingNo description available
HicrestLaboratory / SPARTASParse AcceleRation on Tensor Architecture
enp1s0 / CuMpSGEMMFast SGEMM emulation on Tensor Cores
khcs / Fp16 Demo TfExamples for mixed-precision training for utilizing TensorCores in NVIDIA Volta GPUs
zartbot / Tensorcore GemmTensorCore GEMM Optimization
vishalmehta1991 / PictcParticle in Cell using TensorCore
robbwu / TensorsvmFast Kernel SVM on TensorCore enabled GPU
enp1s0 / Tsqr GpuImplementation of TSQR, an efficient QR factorization algorithm for tall skinny matrices, on TensorCores
enp1s0 / Tsqr TcTSQR on TensorCores