CHaiDNN
HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs
Install / Use
/learn @Xilinx/CHaiDNNREADME
Introduction
CHaiDNN is a Xilinx Deep Neural Network library for acceleration of deep neural networks on Xilinx UltraScale MPSoCs. It is designed for maximum compute efficiency at 6-bit integer data type. It also supports 8-bit integer data type.
The design goal of CHaiDNN is to achieve best accuracy with maximum performance. The inference on CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point based on the precision parameters specified by the user. The precision parameters can vary a lot depending upon the network, datasets, or even across layers in the same network. Accuracy of a network depends on the precision parameters used to represent the feature maps and trained parameters. Well-crafted precision parameters are expected to give accuracy similar to accuracy obtained from a single precision model.
What's new in CHaiDNN-v2
-
4x GOPS compared to CHaiDNN-v1 (2017.4) (Performance numbers)
-
2x MAC on DSPs at int6
-
Double-Pumped DSPs allowing the DSPs to be clocked at twice the core clock (Some configs can go upto 350/700Mhz)
-
Introducing DietChai - A miniature version of CHai for smaller MPSoC/ Zynq devices
-
128, 256, 512, 1024 DSP design configs verified for ZU9
-
Support for URAM
-
128, 256, 512 DSP configs verified for ZU7
-
ModelZoo of 6 networks at int8 and int6 precision
-
Support for two quantization modes - Dynamic fixed point and Xilinx Quantizer
-
Enhanced API to enable better hardware- software partitioning for users
-
Support for software custom layer plug-ins
-
Fully Connected layers on CPU
-
More documentation
Performance Benchmarks(fps)
<table> <tr> <th>Network</th> <th>Xilinx CHai w/ 1024DSP @ 250/500MHz (Measured on <a href="https://www.xilinx.com/products/boards-and-kits/zcu104.html">ZU9</a>)</th> <th>Nvidia Jetson TX2 @ 1.3GHz*</th> </tr> <tr> <td>GoogleNet-6bit w/o FC</td> <td width="40%" align="center">220</td> <td rowspan="4" align="center">Googlenet-16FP: 201</td> </tr> <tr> <td>GoogleNet-6bit w/ FC</td> <td width="40%" align="center">207</td> </tr> <tr> <td>GoogleNet-8bit w/o FC </td> <td width="40%" align="center">151</td> </tr> <tr> <td>GoogleNet-8bit w/ FC</td> <td width="40%" align="center">145</td> </tr> <tr> <td>Alexnet-6bit w/o FC</td> <td width="40%" align="center">606</td> <td rowspan="4" align="center">Alexnet-16FP: 250</td> </tr> <tr> <td>Alexnet-6bit w/ FC</td> <td width="40%" align="center">10</td> </tr> <tr> <td>Alexnet-8bit w/o FC</td> <td width="40%" align="center">390</td> </tr> <tr> <td>Alexnet-8bit w/ FC</td> <td width="40%" align="center">10</td> </tr> </table><sup>* Source: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/</sup>
Hardware and Software Requirements
The CHaiDNN library is designed to work with Zynq UltraScale+ MPSoCs. The library has been verified on zcu102 and zcu104 boards. Xilinx SDSoC 2018.2 Development Environment is required to work with the library.
How to Download the Repository
To get a local copy of the CHaiDNN repository, configure git-lfs and then, clone this repository to the local system with the following command:
git clone https://github.com/Xilinx/CHaiDNN.git CHaiDNN
Where CHaiDNN is the name of the directory where the repository will be stored on the local system. This command needs to be executed only once to retrieve the latest version of the CHaiDNN library.
CHaiDNN/
|
|-- CONTRIBUTING.md
|-- LICENSE
|-- README.md
|-- SD_Card
| |-- lib
| |-- cblas
| |-- images
| |-- opencv
| |-- protobuf
| |-- zcu102
| `-- zcu104
|-- design
| |-- build
| |-- conv
| |-- deconv
| |-- pool
| `-- wrapper
|-- docs
| |-- API.md
| |-- BUILD_USING_SDX_GUI.md
| |-- CONFIGURABLE_PARAMS.md
| |-- CUSTOM_PLATFORM_GEN.md
| |-- HW_SW_PARTITIONING.md
| |-- MODELZOO.md
| |-- PERFORMANCE_SNAPSHOT.md
| |-- QUANTIZATION.md
| |-- RUN_NEW_NETWORK.md
| |-- SOFTWARE_LAYER_PLUGIN.md
| |-- SUPPORTED_LAYERS.md
| `-- images
|-- software
| |-- bufmgmt
| |-- checkers
| |-- common
| |-- custom
| |-- example
| |-- imageread
| |-- include
| |-- init
| |-- interface
| |-- scheduler
| |-- scripts
| |-- swkernels
| `-- xtract
`-- tools
|-- SETUP_TOOLS.md
`-- tools.zip
</details>
Run Inference
<details> <summary><strong>Using Pre-built binaries</strong></summary> <a name="Pre-built"></a>To run inference on example networks, follow these steps:
-
Download the example network 6-bit GoogleNet with Xilinx Quantization scheme. More networks are available as part of the ModelZoo.
-
Place the downloaded and unzipped contents at "
SD_Card/models" directory. CreateSD_Card/modelsdirectory if not present already. -
Copy the required contents of "
SD_Card" folder into a SD-Card.- opencv
- protobuf
- cblas
- images
- bit-stream, boot loader, lib & executables (either from
SD_Card/zcu102orSD_Card/zcu104)
-
Insert the SD-Card and power ON the board.
:pushpin: NOTE: A serial port emulator (Teraterm/Minicom) is required to interface the user commands to the board
-
Attach a USB-UART cable from the board to the host PC. Set the UART serial port to
Baud rate: 115200 Data: 8 bit Parity: none Stop: 1 bit Flow control: none -
After boot sequence, set LD_LIBRARY_PATH env variable.
export OPENBLAS_NUM_THREADS=2 export LD_LIBRARY_PATH=lib/:opencv/arm64/lib/:protobuf/arm64/lib:cblas/arm64/lib -
Create a folder "
out" inside the network directory to save the outputssh cd /mnt mkdir models/<network>/out -
Execute "
*.elf" file to run inference- The format for running these example networks is described below:
./<example network>.elf <quantization scheme> <bit width> <img1_path> <img2_path> - For GoogleNet 6-bit inference with Xilinx quantization scheme execute the following
./googlenet.elf Xilinx 6 images/camel.jpg images/goldfish.JPEG
- The format for running these example networks is described below:
-
Sync after execution
cd / sync umount /mnt -
Output will be written into text file inside respective output folders.
Ex : models/<network>/out
:pushpin: NOTE: Failing to run
syncmight corrupt the file system and cause crash on subsequent runs.
</details> <details> <summary><strong>Build from Source</strong></summary>:pushpin: NOTE: For running inference on a new network, please follow the instructions in Run new Network using CHaiDNN.
CHaiDNN can be built using Makefiles OR using SDx IDE. The below steps describe how to build CHaiDNN using Makefiles. For steps to build using SDx IDE, see the instructions in Build using SDx IDE.
<details> <summary><strong>Build CHaiDNN Hardware</strong></summary>Please follow the steps to build the design for zcu102 (ZU9 device based board)
-
Please generate a custom platform with 1x and 2x clocks using the steps described here. With Chai-v2, we now have the DSPs operating at twice the frequency of the rest of the core.
-
Go
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
