SkillAgentSearch skills...

CHaiDNN

HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs

Install / Use

/learn @Xilinx/CHaiDNN

README

<table style="width:100%"> <tr> <th width="100%" colspan="6"><img src="https://www.xilinx.com/content/dam/xilinx/imgs/press/media-kits/corporate/xilinx-logo.png" width="30%"/><h1>CHaiDNN-v2</h2> </th> </tr> <tr> <th rowspan="6" width="17%">Analysis and Eval</th> </tr> <tr> <td align="center" colspan="2"><a href="./docs/SUPPORTED_LAYERS.md">Supported Layers</a></td> <td align="center" colspan="2"><a href="./docs/PERFORMANCE_SNAPSHOT.md">Performance/Resource Utilization</a></td> </tr> <tr></tr> <tr> <td align="center" colspan="4"><a href="./docs/PERFORMANCE_EVAL.md">Performance Eval</a></td> </tr> <tr></tr> <tr></tr> <tr><th colspan="6"></th></tr> <tr></tr> <tr> <th rowspan="7" width="17%">Design and Development</th> </tr> <tr> <td align="center"><a href="./docs/API.md">API Reference</a></td> <td align="center"><a href="./docs/QUANTIZATION.md">Quantization User Guide for CHaiDNN</a></td> <td align="center"><a href="./docs/MODELZOO.md">Model Zoo</a></td> <td align="center"><a href="./docs/RUN_NEW_NETWORK.md">Running Inference on new Network</a></td> </tr> <tr></tr> <tr> <td align="center"><a href="./docs/BUILD_USING_SDX_GUI.md">Creating SDx GUI Project</a></td> <td align="center"><a href="./docs/CONFIGURABLE_PARAMS.md">Configurable Parameters</a></td> <td align="center"><a href="./docs/CUSTOM_PLATFORM_GEN.md">Custom Platform Generation</a></td> <td align="center"><a href="./docs/SOFTWARE_LAYER_PLUGIN.md">Software Layer Plugin</a></td> </tr> <tr></tr> <tr> <td align="center" colspan="2"><a href="https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug1027-sdsoc-user-guide.pdf">SDSoC Environment User Guide</a></td> <td align="center" colspan="2"><a href="./docs/HW_SW_PARTITIONING.md">Hardware-Software Partitioning for Performance</a></td> </tr> </table>

Introduction

CHaiDNN is a Xilinx Deep Neural Network library for acceleration of deep neural networks on Xilinx UltraScale MPSoCs. It is designed for maximum compute efficiency at 6-bit integer data type. It also supports 8-bit integer data type.

The design goal of CHaiDNN is to achieve best accuracy with maximum performance. The inference on CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point based on the precision parameters specified by the user. The precision parameters can vary a lot depending upon the network, datasets, or even across layers in the same network. Accuracy of a network depends on the precision parameters used to represent the feature maps and trained parameters. Well-crafted precision parameters are expected to give accuracy similar to accuracy obtained from a single precision model.

What's new in CHaiDNN-v2

  • 4x GOPS compared to CHaiDNN-v1 (2017.4) (Performance numbers)

  • 2x MAC on DSPs at int6

  • Double-Pumped DSPs allowing the DSPs to be clocked at twice the core clock (Some configs can go upto 350/700Mhz)

  • Introducing DietChai - A miniature version of CHai for smaller MPSoC/ Zynq devices

  • 128, 256, 512, 1024 DSP design configs verified for ZU9

  • Support for URAM

  • 128, 256, 512 DSP configs verified for ZU7

  • ModelZoo of 6 networks at int8 and int6 precision

  • Support for two quantization modes - Dynamic fixed point and Xilinx Quantizer

  • Enhanced API to enable better hardware- software partitioning for users

  • Support for software custom layer plug-ins

  • Fully Connected layers on CPU

  • More documentation

Performance Benchmarks(fps)

<table> <tr> <th>Network</th> <th>Xilinx CHai w/ 1024DSP @ 250/500MHz (Measured on <a href="https://www.xilinx.com/products/boards-and-kits/zcu104.html">ZU9</a>)</th> <th>Nvidia Jetson TX2 @ 1.3GHz*</th> </tr> <tr> <td>GoogleNet-6bit w/o FC</td> <td width="40%" align="center">220</td> <td rowspan="4" align="center">Googlenet-16FP: 201</td> </tr> <tr> <td>GoogleNet-6bit w/ FC</td> <td width="40%" align="center">207</td> </tr> <tr> <td>GoogleNet-8bit w/o FC </td> <td width="40%" align="center">151</td> </tr> <tr> <td>GoogleNet-8bit w/ FC</td> <td width="40%" align="center">145</td> </tr> <tr> <td>Alexnet-6bit w/o FC</td> <td width="40%" align="center">606</td> <td rowspan="4" align="center">Alexnet-16FP: 250</td> </tr> <tr> <td>Alexnet-6bit w/ FC</td> <td width="40%" align="center">10</td> </tr> <tr> <td>Alexnet-8bit w/o FC</td> <td width="40%" align="center">390</td> </tr> <tr> <td>Alexnet-8bit w/ FC</td> <td width="40%" align="center">10</td> </tr> </table>

<sup>* Source: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/</sup>

Hardware and Software Requirements

The CHaiDNN library is designed to work with Zynq UltraScale+ MPSoCs. The library has been verified on zcu102 and zcu104 boards. Xilinx SDSoC 2018.2 Development Environment is required to work with the library.

How to Download the Repository

To get a local copy of the CHaiDNN repository, configure git-lfs and then, clone this repository to the local system with the following command:

git clone https://github.com/Xilinx/CHaiDNN.git CHaiDNN

Where CHaiDNN is the name of the directory where the repository will be stored on the local system. This command needs to be executed only once to retrieve the latest version of the CHaiDNN library.

<details> <summary><big><strong>GitHub Repository Structure</strong></big></summary>
CHaiDNN/
|
|-- CONTRIBUTING.md
|-- LICENSE
|-- README.md
|-- SD_Card
|   |-- lib
|   |-- cblas
|   |-- images
|   |-- opencv
|   |-- protobuf
|   |-- zcu102
|   `-- zcu104
|-- design
|   |-- build
|   |-- conv
|   |-- deconv
|   |-- pool
|   `-- wrapper
|-- docs
|   |-- API.md
|   |-- BUILD_USING_SDX_GUI.md
|   |-- CONFIGURABLE_PARAMS.md
|   |-- CUSTOM_PLATFORM_GEN.md
|   |-- HW_SW_PARTITIONING.md
|   |-- MODELZOO.md
|   |-- PERFORMANCE_SNAPSHOT.md
|   |-- QUANTIZATION.md
|   |-- RUN_NEW_NETWORK.md
|   |-- SOFTWARE_LAYER_PLUGIN.md
|   |-- SUPPORTED_LAYERS.md
|   `-- images
|-- software
|   |-- bufmgmt
|   |-- checkers
|   |-- common
|   |-- custom
|   |-- example
|   |-- imageread
|   |-- include
|   |-- init
|   |-- interface
|   |-- scheduler
|   |-- scripts
|   |-- swkernels
|   `-- xtract
`-- tools
    |-- SETUP_TOOLS.md
    `-- tools.zip

</details>

Run Inference

<details> <summary><strong>Using Pre-built binaries</strong></summary> <a name="Pre-built"></a>

To run inference on example networks, follow these steps:

  1. Download the example network 6-bit GoogleNet with Xilinx Quantization scheme. More networks are available as part of the ModelZoo.

  2. Place the downloaded and unzipped contents at "SD_Card/models" directory. Create SD_Card/models directory if not present already.

  3. Copy the required contents of "SD_Card" folder into a SD-Card.

    • opencv
    • protobuf
    • cblas
    • images
    • bit-stream, boot loader, lib & executables (either from SD_Card/zcu102 or SD_Card/zcu104)
  4. Insert the SD-Card and power ON the board.

    :pushpin: NOTE: A serial port emulator (Teraterm/Minicom) is required to interface the user commands to the board

  5. Attach a USB-UART cable from the board to the host PC. Set the UART serial port to

    Baud rate: 115200
    Data: 8 bit
    Parity: none
    Stop: 1 bit
    Flow control: none
    
  6. After boot sequence, set LD_LIBRARY_PATH env variable.

    export OPENBLAS_NUM_THREADS=2
    export LD_LIBRARY_PATH=lib/:opencv/arm64/lib/:protobuf/arm64/lib:cblas/arm64/lib
    
  7. Create a folder "out" inside the network directory to save the outputs sh cd /mnt mkdir models/<network>/out

  8. Execute "*.elf" file to run inference

    • The format for running these example networks is described below:
      ./<example network>.elf <quantization scheme> <bit width> <img1_path> <img2_path>
      
    • For GoogleNet 6-bit inference with Xilinx quantization scheme execute the following
      ./googlenet.elf Xilinx 6 images/camel.jpg images/goldfish.JPEG
      
  9. Sync after execution

    cd /
    sync
    umount /mnt
    
  10. Output will be written into text file inside respective output folders.

    Ex : models/<network>/out
    

:pushpin: NOTE: Failing to run sync might corrupt the file system and cause crash on subsequent runs.

:pushpin: NOTE: For running inference on a new network, please follow the instructions in Run new Network using CHaiDNN.

</details> <details> <summary><strong>Build from Source</strong></summary>

CHaiDNN can be built using Makefiles OR using SDx IDE. The below steps describe how to build CHaiDNN using Makefiles. For steps to build using SDx IDE, see the instructions in Build using SDx IDE.

<details> <summary><strong>Build CHaiDNN Hardware</strong></summary>

Please follow the steps to build the design for zcu102 (ZU9 device based board)

  1. Please generate a custom platform with 1x and 2x clocks using the steps described here. With Chai-v2, we now have the DSPs operating at twice the frequency of the rest of the core.

  2. Go

Related Skills

View on GitHub
GitHub Stars338
CategoryDevelopment
Updated7d ago
Forks152

Languages

C++

Security Score

85/100

Audited on Apr 1, 2026

No findings