Torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
Install / Use
/learn @pytorch/TorchchatREADME
Chat with LLMs Everywhere
torchchat is a small codebase showcasing the ability to run large language models (LLMs) seamlessly. With torchchat, you can run LLMs using Python, within your own (C/C++) application (desktop or server) and on iOS and Android.
[!IMPORTANT] torchchat is no longer under active development. Please see this post for more details
Updates
February 3, 2025: torchchat has support for DeepSeek R1 Distill: 8B!
September 25, 2024: torchchat has multimodal support for Llama3.2 11B!
To try it out, finish the Installation section below, then hop over to our multimodal guide to learn more.
What can you do with torchchat?
- Run models via PyTorch / Python
- Run models on desktop/server without python
- Run models on mobile
- Evaluate a model
Highlights
- [New!!] Multimodal Support for Llama 3.2 11B
- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
- PyTorch-native execution with performance
- Supports popular hardware and OS
- Linux (x86)
- Mac OS (M1/M2/M3)
- Android (Devices that support XNNPACK)
- iOS 17+ and 8+ Gb of RAM (iPhone 15 Pro+ or iPad with Apple Silicon)
- Multiple data types including: float32, float16, bfloat16
- Multiple quantization schemes
- Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)
Models
The following models are supported by torchchat and have associated aliases.
| Model | Mobile Friendly | Notes |
|------------------|---|---------------------|
|meta-llama/Meta-Llama-3.2-3B-Instruct|✅|Tuned for chat. Alias to llama3.2-3b.|
|meta-llama/Meta-Llama-3.2-3B|✅|Best for generate. Alias to llama3.2-3b-base.|
|meta-llama/Llama-Guard-3-1B|✅|Tuned for classification. Alias to llama3-1b-guard.|
|meta-llama/Meta-Llama-3.2-1B-Instruct|✅|Tuned for chat. Alias to llama3.2-1b.|
|meta-llama/Meta-Llama-3.2-1B|✅|Best for generate. Alias to llama3.2-1b-base.|
|meta-llama/Llama-3.2-11B-Vision-Instruct||Multimodal (Image + Text). Tuned for chat. Alias to llama3.2-11B.|
|meta-llama/Llama-3.2-11B-Vision||Multimodal (Image + Text). Tuned for generate. Alias to llama3.2-11B-base.|
|meta-llama/Meta-Llama-3.1-8B-Instruct|✅|Tuned for chat. Alias to llama3.1.|
|meta-llama/Meta-Llama-3.1-8B|✅|Best for generate. Alias to llama3.1-base.|
|meta-llama/Meta-Llama-3-8B-Instruct|✅|Tuned for chat. Alias to llama3.|
|meta-llama/Meta-Llama-3-8B|✅|Best for generate. Alias to llama3-base.|
|meta-llama/Llama-2-7b-chat-hf|✅|Tuned for chat. Alias to llama2.|
|meta-llama/Llama-2-13b-chat-hf||Tuned for chat. Alias to llama2-13b-chat.|
|meta-llama/Llama-2-70b-chat-hf||Tuned for chat. Alias to llama2-70b-chat.|
|meta-llama/Llama-2-7b-hf|✅|Best for generate. Alias to llama2-base.|
|meta-llama/CodeLlama-7b-Python-hf|✅|Tuned for Python and generate. Alias to codellama.|
|meta-llama/CodeLlama-34b-Python-hf|✅|Tuned for Python and generate. Alias to codellama-34b.|
|mistralai/Mistral-7B-v0.1|✅|Best for generate. Alias to mistral-7b-v01-base.|
|mistralai/Mistral-7B-Instruct-v0.1|✅|Tuned for chat. Alias to mistral-7b-v01-instruct.|
|mistralai/Mistral-7B-Instruct-v0.2|✅|Tuned for chat. Alias to mistral.|
|tinyllamas/stories15M|✅|Toy model for generate. Alias to stories15M.|
|tinyllamas/stories42M|✅|Toy model for generate. Alias to stories42M.|
|tinyllamas/stories110M|✅|Toy model for generate. Alias to stories110M.|
|openlm-research/open_llama_7b|✅|Best for generate. Alias to open-llama.|
| ibm-granite/granite-3b-code-instruct-128k |✅| Alias to granite-code and granite-code-3b.|
| ibm-granite/granite-8b-code-instruct-128k |✅| Alias to granite-code-8b.|
| ibm-granite/granite-3.0-2b-instruct |✅| Alias to granite3-2b and granite3.|
| ibm-granite/granite-3.0-8b-instruct |✅| Alias to granite3-8b.|
| ibm-granite/granite-3.1-2b-instruct |✅| Alias to granite3.1-2b and granite3.1.|
| ibm-granite/granite-3.1-8b-instruct |✅| Alias to granite3.1-8b.|
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B |✅| Alias to deepseek-r1:8b.|
Installation
The following steps require that you have Python 3.10 installed.
[!TIP] torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.
git clone https://github.com/pytorch/torchchat.git
cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels
[shell default]: mkdir exportedModels; ./install/install_requirements.sh
Commands
The interfaces of torchchat are leveraged through Python Commands and Native Runners. While the Python Commands are enumerable in the --help menu, the latter are explored in their respective sections.
python3 torchchat.py --help
# Output
usage: torchchat [-h] {chat,browser,generate,export,eval,download,list,remove,where,server} ...
positional arguments:
{chat,browser,generate,export,eval,download,list,remove,where,server}
The specific command to run
chat Chat interactively with a model via the CLI
generate Generate responses from a model given a prompt
browser Chat interactively with a model in a locally hosted browser
export Export a model artifact to AOT Inductor or ExecuTorch
download Download model artifacts
list List all supported models
remove Remove downloaded model artifacts
where Return directory containing downloaded model artifacts
server [WIP] Starts a locally hosted REST server for model interaction
eval Evaluate a model via lm-eval
options:
-h, --help show this help message and exit
Python Inference (chat, generate, browser, server)
- These commands represent different flavors of performing model inference in a Python enviroment.
- Models are constructed either from CLI args or from loading exported artifacts.
Exporting (export)
- This command generates model artifacts that are consumed by Python Inference or Native Runners.
- More information is provided in the AOT Inductor and ExecuTorch sections.
Inventory Management (download, list, remove, where)
- These commands are used to manage and download models.
- More information is provided in the Download Weights section.
Evaluation (eval)
- This command test model fidelity via EleutherAI's lm_evaluation_harness.
- More information is provided in the Evaluation section.
Download Weights
Most models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account.
Create a Hugging Face user access token as documented here with the write role.
Log into Hugging Face:
huggingface-cli login
Take a look at the availa
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
