CHaiDNN

HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs

Generate Convert Improve

Install / Use

/learn @Xilinx/CHaiDNN

About this skill

Quality Score

0/100

README

<table style="width:100%"> <tr> <th width="100%" colspan="6"><img src="https://www.xilinx.com/content/dam/xilinx/imgs/press/media-kits/corporate/xilinx-logo.png" width="30%"/><h1>CHaiDNN-v2</h2> </th> </tr> <tr> <th rowspan="6" width="17%">Analysis and Eval</th> </tr> <tr> <td align="center" colspan="2"><a href="./docs/SUPPORTED_LAYERS.md">Supported Layers</a></td> <td align="center" colspan="2"><a href="./docs/PERFORMANCE_SNAPSHOT.md">Performance/Resource Utilization</a></td> </tr> <tr></tr> <tr> <td align="center" colspan="4"><a href="./docs/PERFORMANCE_EVAL.md">Performance Eval</a></td> </tr> <tr></tr> <tr></tr> <tr><th colspan="6"></th></tr> <tr></tr> <tr> <th rowspan="7" width="17%">Design and Development</th> </tr> <tr> <td align="center"><a href="./docs/API.md">API Reference</a></td> <td align="center"><a href="./docs/QUANTIZATION.md">Quantization User Guide for CHaiDNN</a></td> <td align="center"><a href="./docs/MODELZOO.md">Model Zoo</a></td> <td align="center"><a href="./docs/RUN_NEW_NETWORK.md">Running Inference on new Network</a></td> </tr> <tr></tr> <tr> <td align="center"><a href="./docs/BUILD_USING_SDX_GUI.md">Creating SDx GUI Project</a></td> <td align="center"><a href="./docs/CONFIGURABLE_PARAMS.md">Configurable Parameters</a></td> <td align="center"><a href="./docs/CUSTOM_PLATFORM_GEN.md">Custom Platform Generation</a></td> <td align="center"><a href="./docs/SOFTWARE_LAYER_PLUGIN.md">Software Layer Plugin</a></td> </tr> <tr></tr> <tr> <td align="center" colspan="2"><a href="https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug1027-sdsoc-user-guide.pdf">SDSoC Environment User Guide</a></td> <td align="center" colspan="2"><a href="./docs/HW_SW_PARTITIONING.md">Hardware-Software Partitioning for Performance</a></td> </tr> </table>

Introduction

CHaiDNN is a Xilinx Deep Neural Network library for acceleration of deep neural networks on Xilinx UltraScale MPSoCs. It is designed for maximum compute efficiency at 6-bit integer data type. It also supports 8-bit integer data type.

The design goal of CHaiDNN is to achieve best accuracy with maximum performance. The inference on CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point based on the precision parameters specified by the user. The precision parameters can vary a lot depending upon the network, datasets, or even across layers in the same network. Accuracy of a network depends on the precision parameters used to represent the feature maps and trained parameters. Well-crafted precision parameters are expected to give accuracy similar to accuracy obtained from a single precision model.

What's new in CHaiDNN-v2

4x GOPS compared to CHaiDNN-v1 (2017.4) (Performance numbers)
2x MAC on DSPs at int6
Double-Pumped DSPs allowing the DSPs to be clocked at twice the core clock (Some configs can go upto 350/700Mhz)
Introducing DietChai - A miniature version of CHai for smaller MPSoC/ Zynq devices
128, 256, 512, 1024 DSP design configs verified for ZU9
Support for URAM
128, 256, 512 DSP configs verified for ZU7
ModelZoo of 6 networks at int8 and int6 precision
Support for two quantization modes - Dynamic fixed point and Xilinx Quantizer
Enhanced API to enable better hardware- software partitioning for users
Support for software custom layer plug-ins
Fully Connected layers on CPU
More documentation

Performance Benchmarks(fps)

<table> <tr> <th>Network</th> <th>Xilinx CHai w/ 1024DSP @ 250/500MHz (Measured on <a href="https://www.xilinx.com/products/boards-and-kits/zcu104.html">ZU9</a>)</th> <th>Nvidia Jetson TX2 @ 1.3GHz*</th> </tr> <tr> <td>GoogleNet-6bit w/o FC</td> <td width="40%" align="center">220</td> <td rowspan="4" align="center">Googlenet-16FP: 201</td> </tr> <tr> <td>GoogleNet-6bit w/ FC</td> <td width="40%" align="center">207</td> </tr> <tr> <td>GoogleNet-8bit w/o FC </td> <td width="40%" align="center">151</td> </tr> <tr> <td>GoogleNet-8bit w/ FC</td> <td width="40%" align="center">145</td> </tr> <tr> <td>Alexnet-6bit w/o FC</td> <td width="40%" align="center">606</td> <td rowspan="4" align="center">Alexnet-16FP: 250</td> </tr> <tr> <td>Alexnet-6bit w/ FC</td> <td width="40%" align="center">10</td> </tr> <tr> <td>Alexnet-8bit w/o FC</td> <td width="40%" align="center">390</td> </tr> <tr> <td>Alexnet-8bit w/ FC</td> <td width="40%" align="center">10</td> </tr> </table>

<sup>* Source: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/</sup>

Hardware and Software Requirements

The CHaiDNN library is designed to work with Zynq UltraScale+ MPSoCs. The library has been verified on zcu102 and zcu104 boards. Xilinx SDSoC 2018.2 Development Environment is required to work with the library.

How to Download the Repository

To get a local copy of the CHaiDNN repository, configure git-lfs and then, clone this repository to the local system with the following command:

git clone https://github.com/Xilinx/CHaiDNN.git CHaiDNN

Where CHaiDNN is the name of the directory where the repository will be stored on the local system. This command needs to be executed only once to retrieve the latest version of the CHaiDNN library.

<details> <summary><big><strong>GitHub Repository Structure</strong></big></summary>

CHaiDNN/
|
|-- CONTRIBUTING.md
|-- LICENSE
|-- README.md
|-- SD_Card
|   |-- lib
|   |-- cblas
|   |-- images
|   |-- opencv
|   |-- protobuf
|   |-- zcu102
|   `-- zcu104
|-- design
|   |-- build
|   |-- conv
|   |-- deconv
|   |-- pool
|   `-- wrapper
|-- docs
|   |-- API.md
|   |-- BUILD_USING_SDX_GUI.md
|   |-- CONFIGURABLE_PARAMS.md
|   |-- CUSTOM_PLATFORM_GEN.md
|   |-- HW_SW_PARTITIONING.md
|   |-- MODELZOO.md
|   |-- PERFORMANCE_SNAPSHOT.md
|   |-- QUANTIZATION.md
|   |-- RUN_NEW_NETWORK.md
|   |-- SOFTWARE_LAYER_PLUGIN.md
|   |-- SUPPORTED_LAYERS.md
|   `-- images
|-- software
|   |-- bufmgmt
|   |-- checkers
|   |-- common
|   |-- custom
|   |-- example
|   |-- imageread
|   |-- include
|   |-- init
|   |-- interface
|   |-- scheduler
|   |-- scripts
|   |-- swkernels
|   `-- xtract
`-- tools
    |-- SETUP_TOOLS.md
    `-- tools.zip

</details>

Run Inference

<details> <summary><strong>Using Pre-built binaries</strong></summary> <a name="Pre-built"></a>

To run inference on example networks, follow these steps:

Download the example network 6-bit GoogleNet with Xilinx Quantization scheme. More networks are available as part of the ModelZoo.
Place the downloaded and unzipped contents at "SD_Card/models" directory. Create SD_Card/models directory if not present already.
Copy the required contents of "SD_Card" folder into a SD-Card.
- opencv
- protobuf
- cblas
- images
- bit-stream, boot loader, lib & executables (either from SD_Card/zcu102 or SD_Card/zcu104)
Insert the SD-Card and power ON the board.

:pushpin: NOTE: A serial port emulator (Teraterm/Minicom) is required to interface the user commands to the board
Attach a USB-UART cable from the board to the host PC. Set the UART serial port to
```
Baud rate: 115200
Data: 8 bit
Parity: none
Stop: 1 bit
Flow control: none
```

After boot sequence, set LD_LIBRARY_PATH env variable.

export OPENBLAS_NUM_THREADS=2
export LD_LIBRARY_PATH=lib/:opencv/arm64/lib/:protobuf/arm64/lib:cblas/arm64/lib

Create a folder "out" inside the network directory to save the outputs sh cd /mnt mkdir models/<network>/out
Execute "*.elf" file to run inference
- The format for running these example networks is described below:
```
./<example network>.elf <quantization scheme> <bit width> <img1_path> <img2_path>
```
- For GoogleNet 6-bit inference with Xilinx quantization scheme execute the following
```
./googlenet.elf Xilinx 6 images/camel.jpg images/goldfish.JPEG
```
Sync after execution
```
cd /
sync
umount /mnt
```
Output will be written into text file inside respective output folders.
```
Ex : models/<network>/out
```

:pushpin: NOTE: Failing to run sync might corrupt the file system and cause crash on subsequent runs.

:pushpin: NOTE: For running inference on a new network, please follow the instructions in Run new Network using CHaiDNN.

</details> <details> <summary><strong>Build from Source</strong></summary>

CHaiDNN can be built using Makefiles OR using SDx IDE. The below steps describe how to build CHaiDNN using Makefiles. For steps to build using SDx IDE, see the instructions in Build using SDx IDE.

<details> <summary><strong>Build CHaiDNN Hardware</strong></summary>

Please follow the steps to build the design for zcu102 (ZU9 device based board)

Please generate a custom platform with 1x and 2x clocks using the steps described here. With Chai-v2, we now have the DSPs operating at twice the frequency of the rest of the core.
Go

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。