Ztachip
Opensource software/hardware platform to build edge AI solutions deployed on FPGA or custom ASIC hardware.
Install / Use
/learn @ztachip/ZtachipREADME
Introduction
Ztachip is a Multicore, Data-Aware, Embedded RISC-V AI Accelerator for Edge Inferencing running on low-end FPGA devices or custom ASIC.
Acceleration provided by ztachip can be up to 20-50x compared with a non-accelerated RISCV implementation on many vision/AI tasks. ztachip performs also better when compared with a RISCV that is equipped with vector extension.
An innovative tensor processor hardware is implemented to accelerate a wide range of different tasks from many common vision tasks such as edge-detection, optical-flow, motion-detection, color-conversion to executing TensorFlow AI models. This is one key difference of ztachip when compared with other accelerators that tend to accelerate only a narrow range of applications only (for example convolution neural network only).
A new tensor programming paradigm is introduced to allow programmers to leverage the massive processing/data parallelism enabled by ztachip tensor processor.

Features
Hardware
Ztachip consists of the following functional units tied via an AXI Bus to a VexRicsv CPU, a DRAM and other peripherals as follows
- The Mcore, a Scheduling Processor
- A Dataplane, to stream the next data and instruction to the Tensor Engine .
- A Scratch-Pad Memory to temporarily hold data
- A Stream Processor to manage data IO
- Tensor Engine with 28x Pcores that can be configured to act like a systolic array to perform in memory compute each containing a Scalar and Vector ALU, with 16 Threads of execution on private memory.
Software
The software provided consists of
- Ztachip DSL C-like compiler
- AI vision libraries
- Application examples
- Micropython port and examples
Demo
Documentation
Code structure
.
├── Documentation Overview on HW/SW and programmer's guide for ztachip, pcore, visionai and tensor
├── HW Hardware
│ ├── examples Reference Design: Integration of Vexriscv, Ztachip, DDR3, VGA, Camera, LEDs & Buttons
│ ├── platform Memory IP depenedencies for different FPGA synthesis (e.g. XIlinx, Altera) or ASIC
│ ├── simulation RTL Simulation
│ └── src RTL of Ztachip's top design, Scalar/Vector ALU, Dataplane, Pcore, SoC integration etc
├── LICENSE.md
├── micropython Micropython Support
│ ├── examples edge_detection, image_classification, motion_detect, object_detect, point_of_interest etc
│ ├── micropython micropython
│ └── ztachip_port ztachip micropython port
├── README.md
├── SW Software
│ ├── apps AI kernel libraries of canny edge detector, harris corner, neural nets, optical flow etc
│ ├── base C runtime zero, Ztachip application libraries and other utilities
│ ├── compiler Ztachip C-like DSL compiler that generates instructions for the tensor processor
│ ├── fs File for data inference to be downloaded together with the build image
│ ├── linker.ld linker script for Ztachip
│ ├── makefile Main project makefile
│ ├── makefile.kernels Kernel makefile
│ ├── makefile.sim Makefile to test Kernels
│ ├── sim C source to test kernels
│ └── src SW Main (visionai and unit test entry points), SoC drivers and Zta's micropython API
│ This is a good place to learn on how to use ztachip prebuilt vision and AI stack.
└── tools openocd and vexriscv interface descriptions
In HW/platform, a generic implementation is also provided for simulation environment. Any FPGA/ASIC can be supported with the appropriate implementation of this wrapper layer. Choose the appropriate sub-folder that corresponds to your FPGA target.
Also, in SW/apps, many prebuilt acceleration functions are provided to provide programmers with a fast path to leverage ztachip acceleration. This folder is also a good place to learn on how to program your own custom acceleration functions.
SW build procedure
There are several demos available which demonstrate various capabilities of ztchip. Choose to build one of the 3 demos described below.
Prerequisites (Ubuntu)
sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev python3-pip
pip3 install numpy
Download and build RISCV tool chain
The build below is a pretty long.
export PATH=/opt/riscv/bin:$PATH
git clone https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
./configure --prefix=/opt/riscv --with-arch=rv32im --with-abi=ilp32
sudo make
Download ztachip
git clone https://github.com/ztachip/ztachip.git
Build procedure for demo #1 - AI+Vision
This demo demonstrates many vision and AI capabilities using a native C/C++ library interface
This demo is shown in this video
export PATH=/opt/riscv/bin:$PATH
cd ztachip
cd SW/compiler
make clean all
cd ../fs
python3 bin2c.py
cd ..
make clean all -f makefile.kernels
make clean all
Build procedure for demo #2 - AI+Vision+Micropython
This example is similar to example 1 except that the program is using a Python programming interface
This demo is shown in this video
You are required to complete first the build procedure for demo #1 above. Then follow with a micropython build below.
git clone https://github.com/micropython/micropython.git
cd micropython/ports
cp -avr <ztachip installation folder>/micropython/ztachip_port .
cd ztachip_port
export PATH=/opt/riscv/bin:$PATH
export ZTACHIP=<ztachip installation folder>
make clean
make
Build procedure for demo #3 - LLM chatbot
This demo demonstrates a LLM chatbot running SmolLM2 model. SmolLM2 is based LLAMA architecture but trained by HuggingFace team.
Update the following variable in SW/makefile
LLM_TEST=yes
Then proceed with similar build procedure of demo #1.
Quantizing LLM model required by demo #3
Demo #3 requires a quantized LLM model to be prepared. Follow the steps below.
- Download SmolLM2-135M-Instruct from HuggingFace
git clone git@hf.co:HuggingFaceTB/SmolLM2-135M-Instruct
-
Install llama.cpp
-
From llama.cpp installation, convert the downloaded model to GGUF format (FP32). GGUF format is the LLM format used by the popular Ollama inferencing engine.
cd <llama_cpp-install-folder>
python convert_hf_to_gguf.py <model-download-folder>/SmolLM2-135M-Instruct --outfile SmolLM2-135M-Instruct.gguf --outtype f32
- Quantize the model to ztachip ZUF format.
export PATH=/opt/riscv/bin:$PATH
cd ztachip/SW
make clean all -f makefile.quant
./build/quant ZTA Q4 SmolLM2-135M-Instruct.gguf SMOLLM2.ZUF
- SMOLLM2.ZUF will be transfered from PC to FPGA board over Ethernet. A TFTP server is required to run on a PC that is connecting to the ArtyBoard by Ethernet. PC Ethernet interface is expected to be configured with an ip address=10.10.10.10
FPGA build procedure
-
Download Xilinx Vivado Webpack free edition.
-
Create the project file, build FPGA image and program it to flash as described in FPGA build procedure
Running the demos.
The following demos are demonstrated on the ArtyA7-100T FPGA development board.
-
Image classification with TensorFlow's Mobinet
-
Object detection with TensorFlow's SSD-Mobinet
-
Edge detection using Canny algorithm
-
Point-of-interest using Harris-Corner algorithm
-
Motion detection
-
Multi-tasking with ObjectDetection, edge detection, Harris-Corner, Motion Detection running at same time
To run the demo, press button0 to switch between different AI/vision applications.
Preparing hardware
Reference design example required the hardware components below...
If camera module shown above not available, you may substitute with any other OV7670 module. This is a popular low-end camera so it should be widely available.
Attach the VGA and Camera modules to Arty-A7 board according to picture below

Connect camera_module to Arty board according to picture below

Open serial port
If you are runni
Related Skills
tmux
348.5kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
diffs
348.5kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
terraform-provider-genesyscloud
Terraform Provider Genesyscloud
blogwatcher
348.5kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.

