SkillAgentSearch skills...

FlagTree

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

Install / Use

/learn @flagos-ai/FlagTree
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img width="2182" height="602" alt="github+banner-20260130" src=".github/assets/banner-20260130.png" /> [中文版|English]

<div align="right"> <a href="https://www.linkedin.com/company/flagos-community" target="_blank"> <img src="./docs/assets/Linkedin.png" alt="LinkIn" width="32" height="32" /> </a> <a href="https://www.youtube.com/@FlagOS_Official" target="_blank"> <img src="./docs/assets/youtube.png" alt="YouTube" width="32" height="32" /> </a> <a href="https://x.com/FlagOS_Official" target="_blank"> <img src="./docs/assets/x.png" alt="X" width="32" height="32" /> </a> <a href="https://www.facebook.com/flagosglobalcommunity" target="_blank"> <img src="./docs/assets/Facebook.png" alt="Facebook" width="32" height="32" /> </a> <a href="https://discord.com/invite/ubqGuFMTNE" target="_blank"> <img src="./docs/assets/discord.png" alt="Discord" width="32" height="32" /> </a> </div>

FlagTree is part of FlagOS, a fully open-source system software stack designed to unify the model–system–chip layers and foster an open and collaborative ecosystem. It enables a "develop once, run anywhere" workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among AI chipset-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads.

FlagTree is an open source, unified compiler for multiple AI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. For upstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration.

Each backend is based on different versions of Triton, and therefore resides in different protected branches. All these protected branches have equal status. CI/CD runners are provisioned for every backend listed in the table.

|Branch|Vendor|Backend|Triton<br>version|Build<br>from source|Source-free<br>Installation| |:-----|:-----|:------|:----------------|:-------------------|:--------------------------| |main|NVIDIA<br>AMD<br>x86_64 cpu<br>ILUVATAR(天数智芯)<br>Moore Threads(摩尔线程)<br>KLX<br>MetaX(沐曦股份)<br>HYGON(海光信息)|nvidia<br>amd<br>triton-shared<br>iluvatar<br>mthreads<br>xpu<br>metax<br>hcu|3.1<br>3.1<br>3.1<br>3.1<br>3.1<br>3.0<br>3.1<br>3.0|nvidia<br>amd<br>-<br>iluvatar<br>mthreads<br>xpu<br>-<br>hcu|Installation| |triton_v3.2.x|NVIDIA<br>AMD<br>Huawei Ascend(华为昇腾)<br>Cambricon(寒武纪)|nvidia<br>amd<br>ascend<br>cambricon|3.2|nvidia<br>amd<br>ascend<br>-|Installation| |triton_v3.3.x|NVIDIA<br>AMD<br>x86_64 cpu<br>ARM China(安谋科技)<br>Tsingmicro(清微智能)<br>Enflame(燧原)|nvidia<br>amd<br>triton-shared<br>aipu<br>tsingmicro<br>enflame|3.3|nvidia<br>amd<br>-<br>aipu<br>tsingmicro<br>enflame|Installation| |triton_v3.4.x|NVIDIA<br>AMD<br>Sunrise(曦望芯科)|nvidia<br>amd<br>sunrise|3.4|nvidia<br>amd<br>sunrise|Installation| |triton_v3.5.x|NVIDIA<br>AMD<br>Enflame(燧原)|nvidia<br>amd<br>enflame|3.5|nvidia<br>amd<br>enflame|Installation| |triton_v3.6.x|NVIDIA<br>AMD|nvidia<br>amd|3.6|nvidia<br>amd|Installation|

FlagTree’s extension components are currently available on some backends:

|Branch|Backend|Triton version|Extension components| |:-----|:------|:-------------|:-------------------| |triton_v3.6.x|nvidia|3.6|TLE-Lite<br>TLE-Struct GPU<br>TLE-Raw<br>HINTS| |triton_v3.2.x|ascend|3.2|TLE-Struct DSA<br>FLIR<br>HINTS| |triton_v3.3.x|tsingmicro|3.3|TLE-Lite<br>TLE-Struct DSA<br>FLIR| |triton_v3.3.x|aipu|3.3|FLIR<br>HINTS|

TLE (Triton Language Extensions)

Triton provides strong productivity for kernel development, but heterogeneous AI chips and deeper performance tuning scenarios need more explicit control over distributed execution, memory access patterns, and hardware-specific primitives. TLE extends Triton in a layered way to bridge this gap while keeping compatibility with existing Triton workflows.

Key advantages of TLE:

  • Progressive abstraction from portable usage to hardware-oriented tuning (Lite / Struct / Raw).
  • Better coverage for multi-device, architecture-specific, and backend lowering scenarios.
  • Lower migration cost from existing Triton kernels while preserving optimization headroom.

For detailed design, APIs, and examples, please refer to the TLE Wiki and TLE-Raw Wiki.

Latest News

  • 2026/03/13 Added enflame GCU400 backend integration (based on Triton 3.5), and added CI/CD.
  • 2026/01/23 Added sunrise backend integration (based on Triton 3.4), and added CI/CD.
  • 2026/01/08 Add wiki pages for new features HINTS, TLE, TLE-Raw.
  • 2025/12/24 Support pull and install Wheel.
  • 2025/12/08 Added enflame GCU300 backend integration (based on T

Related Skills

View on GitHub
GitHub Stars237
CategoryCustomer
Updated8h ago
Forks52

Languages

C++

Security Score

95/100

Audited on Apr 8, 2026

No findings