84 skills found · Page 1 of 3
cupy / CupyNumPy & SciPy for GPU
NVIDIA / NcclOptimized primitives for collective multi-GPU communication
NVIDIA / Nccl TestsNCCL Tests
chelsea0x3b / CudarcSafe rust wrapper around CUDA toolkit
ncclient / NcclientPython library for NETCONF clients
LambdaLabsML / Distributed Training GuideBest practices & guides on how to write distributed pytorch training code
huggingface / Llm Training HandbookAn open collection of methodologies to help with successful training of large language models.
limingth / NCCLNew Concept C Language
huggingface / Large Language Model Training PlaybookAn open collection of implementation tips, tricks and resources for training large language models
jinbooooom / AI Infra Hpchpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
FZJ-JSC / Tutorial Multi GpuEfficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Bluefog-Lib / BluefogDistributed and decentralized training framework for PyTorch over graph
Mellanox / Nccl Rdma Sharp PluginsRDMA and SHARP plugins for nccl library
aws / Aws Ofi NcclThis is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
microsoft / MsrfluteFederated Learning Utilities and Tools for Experimentation
lucasdelimanogueira / PyNorchRecreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
microsoft / NPKitNCCL Profiling Kit
coreweave / Nccl TestsNVIDIA NCCL Tests for Distributed Training
google / Nccl FastsocketNCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
autoscriptlabs / Nccl Mesh PluginNo description available