28 skills found
facebookincubator / AITemplateAITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
gpgpu-sim / Gpgpu Sim DistributionGPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Zhen-Dong / HAWQQuantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
nicolaswilde / Cuda Tensorcore HgemmNo description available
wzsh / Wmma Tensorcore SampleMatrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
enp1s0 / OzIMMUFP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
wmmae / Wmma ExtensionAn extension library of WMMA API (Tensor Core API)
ai-bond / Flash Attention V100Implementation of FlashAttention-2 for Nvidia Tesla V100
stillwater-sc / RISC V TensorCoreTransactional Verilog design and Verilator Testbench for a RISC-V TensorCore Vector co-processor for reproducible linear algebra
nox-410 / Tvm.tlAn extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
JuliaMath / TensorCore.jlLightweight package for sharing tensor-algebra definitions
gty111 / GEMM MMAOptimize GEMM with tensorcore step by step
YukeWang96 / QGTC PPoPP22Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
natu4u / GSOC TensorCoreTensorCore Vector Processor for Deep Learning - Google Summer of Code Project
nikhiledm97 / TheGEMMCoreProjectSystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
ahennequ / Cuda Tensorcores Register MappingNo description available
enp1s0 / CuMpSGEMMFast SGEMM emulation on Tensor Cores
khcs / Fp16 Demo TfExamples for mixed-precision training for utilizing TensorCores in NVIDIA Volta GPUs
zartbot / Tensorcore GemmTensorCore GEMM Optimization
vishalmehta1991 / PictcParticle in Cell using TensorCore