OneDNN
oneAPI Deep Neural Network Library (oneDNN)
Install / Use
/learn @uxlfoundation/OneDNNREADME
oneAPI Deep Neural Network Library (oneDNN)
oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform performance library of basic building blocks for deep learning applications. oneDNN project is part of the UXL Foundation and is an implementation of the oneAPI specification for oneDNN component.
The library is optimized for Intel 64/AMD64 architecture based processors, Arm(R) 64-bit Architecture (AArch64)-based processors, and Intel Graphics. oneDNN has experimental support for the following architectures: NVIDIA* GPU, AMD* GPU, OpenPOWER* Power ISA (PPC64), IBMz* (s390x), and RISC-V.
oneDNN is intended for deep learning applications and framework developers interested in improving application performance on CPUs and GPUs.
Deep learning practitioners should use one of the applications enabled with oneDNN:
- Apache SINGA
- DeepLearning4J*
- Flashlight*
- llama.cpp
- ONNX Runtime
- OpenNMT CTranslate2
- OpenVINO(TM) toolkit
- PaddlePaddle*
- PyTorch*
- Tensorflow*
Table of Contents
- Documentation
- System Requirements
- Installation
- Validated Configurations
- Governance
- Support
- Contributing
- License
- Security
- Trademark Information
Documentation
- oneDNN Developer Guide and Reference explains the programming model, supported functionality, implementation details, and includes annotated examples.
- API Reference provides a comprehensive reference of the library API.
- Release Notes explain the new features, performance optimizations, and improvements implemented in each version of oneDNN.
System Requirements
oneDNN supports platforms based on the following architectures:
- Intel 64 or AMD64,
- Arm 64-bit Architecture (AArch64).
- OpenPOWER / IBM Power ISA.
- IBMz z/Architecture (s390x).
- RISC-V 64-bit (RV64).
WARNING
Power ISA (PPC64), IBMz (s390x), and RISC-V (RV64) support is experimental with limited testing validation.
The library is optimized for the following CPUs:
- Intel 64/AMD64 architecture
- Intel Atom(R) processor (at least Intel SSE4.1 support is required)
- Intel Core(TM) processor (at least Intel SSE4.1 support is required)
- Intel Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
- Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, Ice Lake, Sapphire Rapids, and Emerald Rapids)
- Intel Xeon CPU Max Series (formerly Sapphire Rapids HBM)
- Intel Core Ultra processors (formerly Meteor Lake, Arrow Lake, Lunar Lake, and Panther Lake)
- Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)
- future Intel Core processor with Intel AVX10.2 instruction set support (code name Nova Lake)
- future Intel Xeon processor with Intel AVX10.2 instruction set support (code name Diamond Rapids)
- AArch64 architecture
- Arm Neoverse(TM) N1 and V1 processors
On a CPU based on Intel 64 or on AMD64 architecture, oneDNN detects the instruction set architecture (ISA) at runtime and uses just-in-time (JIT) code generation to deploy the code optimized for the latest supported ISA. Future ISAs may have initial support in the library disabled by default and require the use of run-time controls to enable them. See CPU dispatcher control for more details.
WARNING
On macOS, applications that use oneDNN may need to request special entitlements if they use the hardened runtime. See the Linking Guide for more details.
The library is optimized for the following GPUs:
- Intel discrete GPUs:
- Intel Iris Xe MAX Graphics (formerly DG1)
- Intel Arc(TM) A-Series Graphics (formerly Alchemist)
- Intel Data Center GPU Flex Series (formerly Arctic Sound)
- Intel Data Center GPU Max Series (formerly Ponte Vecchio)
- Intel Arc B-Series Graphics and Intel Arc Pro B-Series Graphics (formerly Battlemage)
- Intel Graphics integrated with:
- 11th-14th Generation Intel Core Processors
- Intel Graphics for Intel Core Ultra Series 1 processors (formerly Meteor Lake)
- Intel Graphics for Intel Core Ultra Series 2 processors (formerly Arrow Lake and Lunar Lake)
- Intel Graphics for Intel Core Ultra Series 3 processors (formerly Panther Lake)
Requirements for Building from Source
oneDNN supports systems meeting the following requirements:
- Operating system with Intel 64/AMD64, AArch 64, PPC64, or s390x architecture support
- C++ compiler with C++11 standard support
- CMake 3.13 or later
The following tools are required to build oneDNN documentation:
- Doxygen 1.8.5 or later
- Doxyrest 2.1.2 or later
- Sphinx 6.2.1 or later
- sphinx-book-theme 1.1.4 or later
- sphinx-copybutton 0.5.2 or later
- graphviz 2.40.1
Configurations of CPU and GPU engines may introduce additional build time dependencies.
CPU Engine
oneDNN CPU engine is used to execute primitives on Intel 64/AMD64 based processors, 64-bit Arm Architecture (AArch64) processors, 64-bit Power ISA (PPC64) processors, IBMz (s390x), and compatible devices.
The CPU engine is built by default but can be disabled at build time by setting
ONEDNN_CPU_RUNTIME to NONE. In this case, GPU engine must be enabled.
The CPU engine can be configured to use the OpenMP, TBB or SYCL runtime.
The following additional requirements apply:
- OpenMP runtime requires C++ compiler with OpenMP 2.0 or later standard support
- TBB runtime requires Threading Building Blocks (TBB) 2017 or later.
- SYCL runtime requires
Some implementations rely on OpenMP 4.0 SIMD extensions. For the best performance results on Intel Architecture Processors we recommend using the Intel C++ Compiler.
On a CPU based on Arm AArch64 architecture, oneDNN CPU engine can be built with Arm Compute Library (ACL) integration. ACL is an open-source library for machine learning applications and provides AArch64 optimized implementations of core functions. This functionality currently requires that ACL is downloaded and built separately. See [Build from Source] section of the Developer Guide for details. The minimum supported version of ACL is 52.4.0.
GPU Engine
oneDNN GPU engine is used to execute primitives on various accelerators
including Intel integrated and discrete GPUs, NVIDIA GPUs, AMD GPUs, and
other devices supporting SYCL programming language. The GPU engine is disabled
in the default build configuration and can be enabled by setting
ONEDNN_GPU_RUNTIME build option to value other than NONE. Target accelerator
vendor must be selected at build time using ONEDNN_GPU_VENDOR build option.
WARNING
Linux will reset GPU when kernel runtime exceeds several seconds. The user can prevent this behavior by [disabling hangcheck] for Intel GPU driver. Windows has built-in [timeout detection and recovery] mechanism that results in similar behavior. The user can prevent this behavior by increasing the [TdrDelay] value.
The following additional requirements apply for Intel integrated and discrete GPUs:
- With OpenCL(TM) runtime:
- OpenCL SDK (with OpenCL 1.2 support)
- Intel Graphics Driver with support for OpenCL C 2.0, Intel subgroups support, and USM extensions support
- With SYCL runtime:
- Intel oneAPI DPC++/C++ Compiler
- OpenCL SDK (with OpenCL 3.0 support)
- [oneAPI Level Zero]
- Intel Graphics Driver with support for OpenCL C 2.0, Intel subgroup
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
last30days-skill
4.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

