63 skills found · Page 1 of 3
ace-step / ACE Step 1.5The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
mratsim / ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
eyalroz / Cuda Api WrappersThin, unified, C++-flavored wrappers for the CUDA APIs
hughperkins / CorianderBuild NVIDIA® CUDA™ code for OpenCL™ 1.2 devices
mp3guy / ICPCUDASuper fast implementation of ICP in CUDA for compute capable devices 3.5 or higher
shinpei0208 / GdevFirst-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
harrism / HemiSimple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
OpenBMB / CPM.cuCPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and quantization.
helmut-hoffer-von-ankershoffen / JetsonHelmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
GaohaoZhou-ops / JetsonYoloROSThis repository implements Yolo functionality using TensorRT and CUDA acceleration on Nvidia Jetson devices and the ROS framework.
PatWie / Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
cupbop / CuPBoPA framework that support executing unmodified CUDA source code on non-NVIDIA devices.
salykova / Sgemm.cuHigh-Performance FP32 GEMM on CUDA devices
bamos / SetGPUSmall Python library to automatically set CUDA_VISIBLE_DEVICES to the least loaded device on multi-GPU systems.
cihangirtezcan / CUDA AESBreakthrough AES Performance on CUDA Devices
CPFL / GdevFirst-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
EmbeddedLLM / EmbeddedllmEmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
al42and / Cuda SmiSimple utility to show nVidia GPU memory usage wrt. CUDA device IDs.
HPSCIL / CuFSDAFcuFSDAF is an enhanced FSDAF algorithm parallelized using GPUs. In cuFSDAF, the TPS interpolator is replaced by a modified Inverse Distance Weighted (IDW) interpolator. Besides, computationally intensive procedures are parallelized using the Compute Unified Device Architecture (CUDA), a parallel computing framework for GPUs. Moreover, an adaptive domain-decomposition method is developed to adjust the size of sub-domains according to hardware properties adaptively and ensure the accuracy at the edges of sub-domains.
Duffy12150557 / Security Research ToolkitSecurity Research Toolkit — Video and image analysis tool for neural inpainting and AI-generated content detection with SORA signature extraction, temporal consistency analysis, CNN artifact detection, CPU/CUDA device selection, multi-format support, and colorama-styled terminal interface