606 skills found · Page 1 of 21
rayon-rs / RayonRayon: A data parallelism library for Rust
crossbeam-rs / CrossbeamTools for concurrent programming in Rust
puma / PumaA Ruby/Rack web server built for parallelism
uxlfoundation / OneTBBoneAPI Threading Building Blocks (oneTBB)
xlite-dev / Awesome LLM Inference📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
sshnet / SSH.NETSSH.NET is a Secure Shell (SSH) library for .NET, optimized for parallelism.
anthonynsimon / BildImage processing algorithms in pure Go
deepseek-ai / DualPipeA bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
STEllAR-GROUP / HpxThe C++ Standard Library for Parallelism and Concurrency
huggingface / NanotronMinimalistic large language model 3D-parallelism training
xdit-project / XDiTxDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
jofpin / TurbitBuild applications, scripts, and automations powered by high-performance multicore computing using Node.js
jOOQ / JOOLjOOλ - The Missing Parts in Java 8 jOOλ improves the JDK libraries in areas where the Expert Group's focus was elsewhere. It adds tuple support, function support, and a lot of additional functionality around sequential Streams. The JDK 8's main efforts (default methods, lambdas, and the Stream API) were focused around maintaining backwards compatibility and implementing a functional API for parallelism.
huggingface / PicotronMinimalistic 4D-parallelism distributed training framework for education purpose
SciML / SciMLBookParallel Computing and Scientific Machine Learning (SciML): Methods and Applications (MIT 18.337J/6.338J)
AdaptiveCpp / AdaptiveCppCompiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
zwang4 / Awesome Machine Learning In CompilersMust read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
tensorflow / MeshMesh TensorFlow: Model Parallelism Made Easier
nerevu / RikoA Python stream processing engine modeled after Yahoo! Pipes
gpgpu-sim / Gpgpu Sim DistributionGPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.