SkillAgentSearch skills...

Highway

Performance-portable, length-agnostic SIMD with runtime dispatch

Install / Use

/learn @google/Highway

README

Efficient and performance-portable vector software

Highway is a C++ library that provides portable SIMD/vector intrinsics.

Documentation

Previously licensed under Apache 2, now dual-licensed as Apache 2 / BSD-3.

Why

We are passionate about high-performance software. We see major untapped potential in CPUs (servers, mobile, desktops). Highway is for engineers who want to reliably and economically push the boundaries of what is possible in software.

How

CPUs provide SIMD/vector instructions that apply the same operation to multiple data items. This can reduce energy usage e.g. fivefold because fewer instructions are executed. We also often see 5-10x speedups.

Highway makes SIMD/vector programming practical and workable according to these guiding principles:

Does what you expect: Highway is a C++ library with carefully-chosen functions that map well to CPU instructions without extensive compiler transformations. The resulting code is more predictable and robust to code changes/compiler updates than autovectorization.

Works on widely-used platforms: Highway supports seven architectures; the same application code can target various instruction sets, including those with 'scalable' vectors (size unknown at compile time). Highway only requires C++17 (language features, not necessarily the library) and supports four families of compilers. If you want to use Highway on other platforms, please raise an issue.

Flexible to deploy: Applications using Highway can run on heterogeneous clouds or client devices, choosing the best available instruction set at runtime. Alternatively, developers may choose to target a single instruction set without any runtime overhead. In both cases, the application code is the same except for swapping HWY_STATIC_DISPATCH with HWY_DYNAMIC_DISPATCH plus one line of code. See also @kfjahnke's introduction to dispatching.

Suitable for a variety of domains: Highway provides an extensive set of operations, used for image processing (floating-point), compression, video analysis, linear algebra, cryptography, sorting and random generation. We recognise that new use-cases may require additional ops and are happy to add them where it makes sense (e.g. no performance cliffs on some architectures). If you would like to discuss, please file an issue.

Rewards data-parallel design: Highway provides tools such as Gather, MaskedLoad, and FixedTag to enable speedups for legacy data structures. However, the biggest gains are unlocked by designing algorithms and data structures for scalable vectors. Helpful techniques include batching, structure-of-array layouts, and aligned/padded allocations.

We recommend these resources for getting started:

Examples

Online demos using Compiler Explorer:

We observe that Highway is referenced in the following open source projects, found via sourcegraph.com. Most are GitHub repositories. If you would like to add your project or link to it directly, feel free to raise an issue or contact us via the below email.

Other

  • Evaluation of C++ SIMD Libraries: "Highway excelled with a strong performance across multiple SIMD extensions [..]. Thus, Highway may currently be the most suitable SIMD library for many software projects."
  • zimt: C++11 template library to process n-dimensional arrays with multi-threaded SIMD code
  • vectorized Quicksort (paper)

If you'd like to get Highway, in addition to cloning from this GitHub repository or using it as a Git submodule, you can also find it in the following package managers or repositories:

  • alpinelinux
  • conan-io
  • conda-forge
  • DragonFlyBSD,
  • fd00/yacp
  • freebsd
  • getsolus/packages
  • ghostbsd
  • microsoft/vcpkg
  • MidnightBSD
  • MSYS2
  • NetBSD
  • openSUSE
  • opnsense
  • Xilinx/Vitis_Libraries
  • xmake-io/xmake-repo

See also the list at https://repology.org/project/highway-simd-library/versions .

Current status

Targets

Highway supports 27 targets, listed in alphabetical order of platform:

  • Any: EMU128, SCALAR;
  • Armv7+: NEON_WITHOUT_AES, NEON, NEON_BF16, SVE, SVE2, SVE_256, SVE2_128;
  • IBM Z: Z14, Z15;
  • LoongArch: LSX, LASX;
  • POWER: PPC8 (v2.07), PPC9 (v3.0), PPC10 (v3.1B, not yet supported due to compiler bugs, see #1207; also requires QEMU 7.2);
  • RISC-V: RVV (1.0);
  • WebAssembly: WASM, WASM_EMU256 (a 2x unrolled version of wasm128, enabled if HWY_WANT_WASM2 is defined. This will remain supported until it is potentially superseded by a future version of WASM.);
  • x86:
    • SSE2
    • SSSE3 (~Intel Core)
    • SSE4 (~Nehalem, also includes AES + CLMUL).
    • AVX2 (~Haswell, also includes BMI2 + F16 + FMA)
    • AVX3 (~Skylake, AVX-512F/BW/CD/DQ/VL)
    • AVX3_DL (~Icelake, includes BitAlg + CLMUL + GFNI + VAES + VBMI + VBMI2 + VNNI + VPOPCNT),
    • AVX3_ZEN4 (AVX3_DL plus BF16, optimized for AMD Zen4; requires opt-in by defining HWY_WANT_AVX3_ZEN4 if compiling for static dispatch, but enabled by default for runtime dispatch),
    • AVX3_SPR (~Sapphire Rapids, includes AVX-512FP16)
    • AVX10_2 (~Diamond Rapids)

Our policy is that unless otherwise specified, targets will remain supported as long as they can be (cross-)compiled with currently supported Clang or GCC, and tested using QEMU. If the target can be compiled with LLVM trunk and tested using our version of QEMU without extra flags, then it is eligible for inclusion in our continuous testing infrastructure. Otherwise, the target will be manually tested before releases with selected versions/configurations of Clang and GCC.

SVE was initially tested using farm_sve (see acknowledgments).

Versioning

Highway releases aim to follow the semver.org system (MAJOR.MINOR.PATCH), incrementing MINOR after backward-compatible additions and PATCH after backward-compatible fixes. We recommend using releases (rather than the Git tip) because they are tested more extensively, see below.

The current version 1.0 signals an increased focus on backwards compatibility. Applications using documented functionality will remain compatible with future updates that have the same major version number.

Testing

Continuous integration tests build with a recent version of Clang (running on native x86, or QEMU for RISC-V and Arm) and MSVC 2019 (v19.28, running on native x86).

Before releases, we also test on x86 with Clang and GCC, and Armv7/8 via GCC cross-compile. See the testing process for details.

Related modules

The contrib directory contains SIMD-related utilities: an image class with aligned rows, a math library (16 functions already implemented, mostly trigonomet

View on GitHub
GitHub Stars5.4k
CategoryDevelopment
Updated1h ago
Forks420

Languages

C++

Security Score

85/100

Audited on Mar 31, 2026

No findings