FlagTree
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.
Install / Use
/learn @flagos-ai/FlagTreeREADME
<img width="2182" height="602" alt="github+banner-20260130" src=".github/assets/banner-20260130.png" /> [中文版|English]
<div align="right"> <a href="https://www.linkedin.com/company/flagos-community" target="_blank"> <img src="./docs/assets/Linkedin.png" alt="LinkIn" width="32" height="32" /> </a> <a href="https://www.youtube.com/@FlagOS_Official" target="_blank"> <img src="./docs/assets/youtube.png" alt="YouTube" width="32" height="32" /> </a> <a href="https://x.com/FlagOS_Official" target="_blank"> <img src="./docs/assets/x.png" alt="X" width="32" height="32" /> </a> <a href="https://www.facebook.com/flagosglobalcommunity" target="_blank"> <img src="./docs/assets/Facebook.png" alt="Facebook" width="32" height="32" /> </a> <a href="https://discord.com/invite/ubqGuFMTNE" target="_blank"> <img src="./docs/assets/discord.png" alt="Discord" width="32" height="32" /> </a> </div>FlagTree is part of FlagOS, a fully open-source system software stack designed to unify the model–system–chip layers and foster an open and collaborative ecosystem. It enables a "develop once, run anywhere" workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among AI chipset-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads.
FlagTree is an open source, unified compiler for multiple AI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. For upstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration.
Each backend is based on different versions of Triton, and therefore resides in different protected branches. All these protected branches have equal status. CI/CD runners are provisioned for every backend listed in the table.
|Branch|Vendor|Backend|Triton<br>version|Build<br>from source|Source-free<br>Installation| |:-----|:-----|:------|:----------------|:-------------------|:--------------------------| |main|NVIDIA<br>AMD<br>x86_64 cpu<br>ILUVATAR(天数智芯)<br>Moore Threads(摩尔线程)<br>KLX<br>MetaX(沐曦股份)<br>HYGON(海光信息)|nvidia<br>amd<br>triton-shared<br>iluvatar<br>mthreads<br>xpu<br>metax<br>hcu|3.1<br>3.1<br>3.1<br>3.1<br>3.1<br>3.0<br>3.1<br>3.0|nvidia<br>amd<br>-<br>iluvatar<br>mthreads<br>xpu<br>-<br>hcu|Installation| |triton_v3.2.x|NVIDIA<br>AMD<br>Huawei Ascend(华为昇腾)<br>Cambricon(寒武纪)|nvidia<br>amd<br>ascend<br>cambricon|3.2|nvidia<br>amd<br>ascend<br>-|Installation| |triton_v3.3.x|NVIDIA<br>AMD<br>x86_64 cpu<br>ARM China(安谋科技)<br>Tsingmicro(清微智能)<br>Enflame(燧原)|nvidia<br>amd<br>triton-shared<br>aipu<br>tsingmicro<br>enflame|3.3|nvidia<br>amd<br>-<br>aipu<br>tsingmicro<br>enflame|Installation| |triton_v3.4.x|NVIDIA<br>AMD<br>Sunrise(曦望芯科)|nvidia<br>amd<br>sunrise|3.4|nvidia<br>amd<br>sunrise|Installation| |triton_v3.5.x|NVIDIA<br>AMD<br>Enflame(燧原)|nvidia<br>amd<br>enflame|3.5|nvidia<br>amd<br>enflame|Installation| |triton_v3.6.x|NVIDIA<br>AMD|nvidia<br>amd|3.6|nvidia<br>amd|Installation|
FlagTree’s extension components are currently available on some backends:
|Branch|Backend|Triton version|Extension components| |:-----|:------|:-------------|:-------------------| |triton_v3.6.x|nvidia|3.6|TLE-Lite<br>TLE-Struct GPU<br>TLE-Raw<br>HINTS| |triton_v3.2.x|ascend|3.2|TLE-Struct DSA<br>FLIR<br>HINTS| |triton_v3.3.x|tsingmicro|3.3|TLE-Lite<br>TLE-Struct DSA<br>FLIR| |triton_v3.3.x|aipu|3.3|FLIR<br>HINTS|
TLE (Triton Language Extensions)
Triton provides strong productivity for kernel development, but heterogeneous AI chips and deeper performance tuning scenarios need more explicit control over distributed execution, memory access patterns, and hardware-specific primitives. TLE extends Triton in a layered way to bridge this gap while keeping compatibility with existing Triton workflows.
Key advantages of TLE:
- Progressive abstraction from portable usage to hardware-oriented tuning (
Lite/Struct/Raw). - Better coverage for multi-device, architecture-specific, and backend lowering scenarios.
- Lower migration cost from existing Triton kernels while preserving optimization headroom.
For detailed design, APIs, and examples, please refer to the TLE Wiki and TLE-Raw Wiki.
Latest News
- 2026/03/13 Added enflame GCU400 backend integration (based on Triton 3.5), and added CI/CD.
- 2026/01/23 Added sunrise backend integration (based on Triton 3.4), and added CI/CD.
- 2026/01/08 Add wiki pages for new features HINTS, TLE, TLE-Raw.
- 2025/12/24 Support pull and install Wheel.
- 2025/12/08 Added enflame GCU300 backend integration (based on T
Related Skills
openhue
351.8kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
351.8kElevenLabs text-to-speech with mac-style say UX.
weather
351.8kGet current weather and forecasts via wttr.in or Open-Meteo
casdoor
13.3kAn open-source AI-first Identity and Access Management (IAM) /AI MCP & agent gateway and auth server with web UI supporting OpenClaw, MCP, OAuth, OIDC, SAML, CAS, LDAP, SCIM, WebAuthn, TOTP, MFA, Face ID, Google Workspace, Azure AD
