27 skills found
google / HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
jfalcou / EveExpressive Vector Engine - SIMD in C++ Goes Brrrr
lilohuang / PyTurboJPEGPyTurboJPEG is a high-performance Python wrapper for libjpeg-turbo, offering native support for both x86 and ARM architectures.
zeam-vm / PelemayPelemay is a native compiler for Elixir, which generates SIMD instructions. It has a plan to generate for GPU code.
mohammad-ghaderi / Cat Dog Asm CnnA Convolutional Neural Network implemented entirely from scratch in x86-64 assembly using AVX-512, performing cat vs dog image classification without any ML frameworks or libraries.
dnbaker / SketchC++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
gyrdym / Ml LinalgSIMD-based linear algebra and statistics for data science with dart
Applied-Scientific-Research / Omega2DTwo-dimensional flow solver with GUI using vortex particle and boundary element methods
mattkretz / Vir Simdimprove the usage experience of std::experimental::simd (Parallelism TS 2)
PatwinchIR / Ultra SortDSL for SIMD Sorting on AVX2 & AVX512
fzqneo / ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Tugbars / Savitzky Golay FilterHigh-performance Savitzky-Golay filter in C: batch, streaming, and 2D image processing. Embedded-friendly with coefficient export for MCUs. MATLAB-validated.
MarioSieg / CoriumCorium is a modern scripting language which combines simple, safe and efficient programming.
Applied-Scientific-Research / Omega3DGPU-accelerated 3D vortex methods solver with easy GUI
pleiszenburg / Gravitationn-body-simulation performance test suite
satishphd / Teaching Intel Intrinsics For SIMD ParallelismTeaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class
ZL-Su / MatriceA portable modern C++ primitive performance library for 3D Vision & Photo-Mechanics.
ATTron / OmaRuntime SIMD dispatch for Zig. Compile once per CPU level, pick the best at startup
igmonk / Clj VapiVectorised array operations leveraging Java Vector API for SIMD parallelism
mapleyustat / Fast Detection Of Overlapping Communities Via Online Tensor MethodsWe present a fast tensor-based approach for detecting hidden overlapping communities under the Mixed Membership Stochastic Blockmodel (MMSB). We present two implementations, viz., a GPU-based implementation which exploits the parallelism of SIMD architectures and a CPU-based implementation for larger datasets, wherein the GPU memory does not suffice.