LiteRT
LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization
Install / Use
/learn @google-ai-edge/LiteRTREADME
LiteRT
<p align="center"> <img src="./g3doc/sources/litert_logo.png" alt="LiteRT Logo"/> </p>Google's on-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization
📖 Get Started | 🤝 Contributing | 📜 License | 🛡 Security Policy | 📄 Documentation
Build Status
| Nightly Builds | Continuous Builds | Other Builds |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| <br>
<br>
|
<br>
<br>
|
|
Description
LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI.
LiteRT features advanced GPU/NPU acceleration, delivers superior ML & GenAI performance, making on-device ML inference easier than ever.
🌟 What's New
-
🆕 New LiteRT Compiled Model API: Streamline development with automated accelerator selection, true async execution, and efficient I/O buffer handling.
- Automated accelerator selection vs explicit delegate creation
- Async execution for faster overall execution time
- Easy NPU runtime and model distribution
- Efficient I/O buffer handling
-
🤖 Unified NPU Acceleration: Offer seamless access to NPUs from major chipset providers with a consistent developer experience. LiteRT NPU, previously under Early access program is available to all users: https://ai.google.dev/edge/litert/next/npu
-
⚡ Best-in-class GPU Performance: Use state-of-the-art GPU acceleration for on-device ML. The new buffer interoperability enables zero-copy and minimizes latency across various GPU buffer types.
-
🧠 Superior Generative AI inference: Enable the simplest integration with the best performance for GenAI models.
💻 Platforms Supported
LiteRT is designed for cross-platform deployment on a wide range of hardware.
| Platform | CPU Support | GPU Support | NPU Support | | ---------- | ----------- | --------------------- | ----------------------------------------------------------------- | | 🤖 Android | ✅ | ✅OpenCL<br>✅OpenGL | Google Tensor*<br>✅ Qualcomm<br>✅ MediaTek<br>S.LSI*<br>Intel* | | 🍎 iOS | ✅ | ✅ Metal | ANE* | | 🐧 Linux | ✅ | ✅ WebGPU | N/A | | 🍎 macOS | ✅ | ✅ WebGPU<br>✅ Metal | ANE* | | 💻 Windows | ✅ | ✅ WebGPU | Intel* | | 🌐 Web | ✅ | ✅ WebGPU | Coming soon | | 🧩 IoT | ✅ | ✅ WebGPU | Broadcom*<br>Raspberry Pi* |
*Coming soon
Model Coverage and Performance
Coming soon...
🏁 Installation
For a comprehensive guide to setting up your application with LiteRT, see the Get Started guide.
You can build LiteRT from source:
- Start a docker daemon.
- Run
build_with_docker.shunderdocker_build/
The script automatically creates a Linux Docker image, which allows you to build artifacts for Linux and Android (through cross compilation). See build instructions in CMake build instructions and Bazel build instructions for more information on how to build runtime libraries with the docker container.
For more information about using docker interactive shell or building different
targets, please refer to docker_build/README.md.
🗺 Choose Your Adventure
Every developer's path is different. Here are a few common journeys to help you get started based on your goals:
1. 🔄 I have a PyTorch model...
- Goal: Convert a model from PyTorch to run on LiteRT.
- Path1 (classic models): Use the
LiteRT Torch Converter to
transform your PyTorch model into the
.tfliteformat, and use AI Edge Quantizer to optimize the model for optimal performance under resource constraints. From there, you can deploy it using the standard LiteRT runtime. - Path2 (LLMs): Use LiteRT Generative Torch API to reauthor and convert your PyTorch LLMs into Apache format, and deploy it using LiteRT LM.
2. 🌱 I'm new to on-device ML...
- Goal: Run a pre-trained model (like image segmentation) in a mobile app for the first time.
- Path1 (Beginner dev): Follow step-by-step instructions via Android Studio to create a Real-time segmentation App for CPU/GPU/NPU inference. Source code link.
- Path2 (Experienced dev): Start with the Get Started guide, find a pre-trained .tflite model on Kaggle Models, and use the standard LiteRT runtime to integrate it into your Android or iOS app.
3. ⚡ I need to maximize performance...
- Goal: Accelerate an existing model to run faster and more efficiently on-device.
- Path:
- Explore the LiteRT API to easily leverage hardware acceleration.
- For working with Generative AI: Dive into LiteRT LM, our specialized solution for running GenAI models.
4. 🧠 I'm working with Generative AI...
- Goal: Deploy a large language model (LLM) or diffusion model on a mobile device.
- Path: Dive into LiteRT LM, our specialized solution for running GenAI models. You'll focus on model quantization and optimizations specific to large model architectures.
🗺 Roadmap
Our commitment is to make LiteRT the best runtime for any on-device ML deployment. Our product strategies are:
- Expanding Hardware Acceleration: Broadening our support for NPUs and improving performance across all major hardware accelerators.
- Generative AI Optimizations: Introducing new optimizations and features specifically for the next wave of on-device generative AI models.
- Improving Developer Tools: Building better tools for debugging, profiling, and op
Related Skills
tmux
337.7kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
blogwatcher
337.7kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
prd
Raito Bitcoin ZK client web portal.
product
Cloud-agnostic Kubernetes infrastructure with Terraform & Helm for homelabs, edge, and production clusters.
