TeraFly

[DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs

Generate Convert Improve

Install / Use

/learn @zjnyly/TeraFly

About this skill

Quality Score

0/100

README

🚀 Terafly: A Multi-Node FPGA-Based Accelerator for Efficient Cooperative LLM Inference

Terafly enables high-throughput, low-latency inference of Large Language Models (LLMs) by leveraging a multi-node FPGA architecture optimized for cooperative execution.

Demo

💡 Highlight

We provide HLS kernels that can be rapidly customized for research purposes, enabling efficient experimentation and algorithm validation on FPGAs.

🔍 Overview

Terafly is designed to maximize memory bandwidth and computational efficiency on FPGA platforms—specifically targeting embedded and datacenter FPGAs like the Xilinx Alveo U50lv. It supports end-to-end LLM inference with minimal host intervention, and includes tooling for weight packing, hardware generation, and interactive demo deployment.

📚 Related Work

If you're exploring FPGA-based LLM acceleration, you might also be interested in:

llama-fpga

⚙️ Prerequisites

To ensure compatibility, we recommend replicating our experimental environment:

| Component | Version / Configuration | |------------------|---------------------------------------------| | OS | Ubuntu 18.04 | | Shell | xilinx-u50lv-gen3x4-xdma-base_2 | | XRT | 2023.2 | | Vitis HLS & Vivado | 2023.2 |

💡 Ensure your Alveo U50lv card is properly flashed with the matching shell.

📂 Code Structure

⚡ Quick Start

Follow these steps to quickly set up and run the Terafly accelerator.

1. Download Model Weights

Download the pre-packed model weights (OPT-1.3B) from the provided link: Model Weights Download (Password: bcbf).

2. Compile and Program the FPGA

Navigate to the optimized code directory and run the compilation command. This will automatically generate the xclbin file and program your Alveo card.

cd OPT-1.3b_optimize/
make run

3. Run the Benchmark (`lambada`)

Compile and execute the host-side application to run the lambada benchmark.

Note: Check tokenizer_predict_eigen.cpp to verify that the code correctly loads the packed data.

cd tokenizer/
sh ./command.sh

4. Run the Web Demo

You can also interact with the LLM via a WebUI interface:

Start the Python server (requires python==3.6).
Open the web interface in your browser: LLM-demo-gui/llm-gui/web/index.html. (Please open the HTML file directly in your browser to chat with the LLM.)

cd LLM-demo-gui/alveo
(python==3.6) python client-v3.py

📝 Citation

If you find Terafly or LoopLynx useful in your research or project, please cite our papers. We appreciate your interest in our work!

@ARTICLE{Terafly,
  author={Zheng, Jianing and Chen, Gang and Huang, Libo and Lou, Xin and Zheng, Wei-shi},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
  title={Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs},
  year={2025},
  volume={},
  number={},
  pages={1-1}}

@inproceedings{LoopLynx,
  author         = {Jianing Zheng and Gang Chen},
  title          = {LoopLynx: {A} Scalable Dataflow Architecture for Efficient {LLM} Inference},
  booktitle      = {Design, Automation {\&} Test in Europe Conference, {DATE} 2025, Lyon, France, March 31 - April 2, 2025},
  pages          = {1--7},
  publisher      = {{IEEE}},
  year           = {2025}}

Related Skills

diffs

339.3k

Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.

openpencil

1.8k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-pro-max-skill

53.4k

An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms

Figma-Context-MCP

14.0k

MCP server to provide Figma layout information to AI coding agents like Cursor

zjnyly

View profile

View on GitHub

GitHub Stars32

CategoryDesign

Updated12h ago

Forks1

zjnyly/TeraFly

Languages

C++

Security Score

90/100

Audited on Mar 28, 2026

No findings