Torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Generate Convert Improve

Install / Use

/learn @pytorch/Torchchat

About this skill

Quality Score

0/100

README

Chat with LLMs Everywhere

torchchat is a small codebase showcasing the ability to run large language models (LLMs) seamlessly. With torchchat, you can run LLMs using Python, within your own (C/C++) application (desktop or server) and on iOS and Android.

[!IMPORTANT] torchchat is no longer under active development. Please see this post for more details

Updates

February 3, 2025: torchchat has support for DeepSeek R1 Distill: 8B!

September 25, 2024: torchchat has multimodal support for Llama3.2 11B!

To try it out, finish the Installation section below, then hop over to our multimodal guide to learn more.

What can you do with torchchat?

Run models via PyTorch / Python
Run models on desktop/server without python
- Use AOT Inductor for faster execution
- Running in c++ using the runner
Run models on mobile
- Deploy and run on iOS
- Deploy and run on Android
Evaluate a model

Highlights

[New!!] Multimodal Support for Llama 3.2 11B
Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
PyTorch-native execution with performance
Supports popular hardware and OS
- Linux (x86)
- Mac OS (M1/M2/M3)
- Android (Devices that support XNNPACK)
- iOS 17+ and 8+ Gb of RAM (iPhone 15 Pro+ or iPad with Apple Silicon)
Multiple data types including: float32, float16, bfloat16
Multiple quantization schemes
Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)

Models

The following models are supported by torchchat and have associated aliases.

| Model | Mobile Friendly | Notes | |------------------|---|---------------------| |meta-llama/Meta-Llama-3.2-3B-Instruct|✅|Tuned for chat. Alias to llama3.2-3b.| |meta-llama/Meta-Llama-3.2-3B|✅|Best for generate. Alias to llama3.2-3b-base.| |meta-llama/Llama-Guard-3-1B|✅|Tuned for classification. Alias to llama3-1b-guard.| |meta-llama/Meta-Llama-3.2-1B-Instruct|✅|Tuned for chat. Alias to llama3.2-1b.| |meta-llama/Meta-Llama-3.2-1B|✅|Best for generate. Alias to llama3.2-1b-base.| |meta-llama/Llama-3.2-11B-Vision-Instruct||Multimodal (Image + Text). Tuned for chat. Alias to llama3.2-11B.| |meta-llama/Llama-3.2-11B-Vision||Multimodal (Image + Text). Tuned for generate. Alias to llama3.2-11B-base.| |meta-llama/Meta-Llama-3.1-8B-Instruct|✅|Tuned for chat. Alias to llama3.1.| |meta-llama/Meta-Llama-3.1-8B|✅|Best for generate. Alias to llama3.1-base.| |meta-llama/Meta-Llama-3-8B-Instruct|✅|Tuned for chat. Alias to llama3.| |meta-llama/Meta-Llama-3-8B|✅|Best for generate. Alias to llama3-base.| |meta-llama/Llama-2-7b-chat-hf|✅|Tuned for chat. Alias to llama2.| |meta-llama/Llama-2-13b-chat-hf||Tuned for chat. Alias to llama2-13b-chat.| |meta-llama/Llama-2-70b-chat-hf||Tuned for chat. Alias to llama2-70b-chat.| |meta-llama/Llama-2-7b-hf|✅|Best for generate. Alias to llama2-base.| |meta-llama/CodeLlama-7b-Python-hf|✅|Tuned for Python and generate. Alias to codellama.| |meta-llama/CodeLlama-34b-Python-hf|✅|Tuned for Python and generate. Alias to codellama-34b.| |mistralai/Mistral-7B-v0.1|✅|Best for generate. Alias to mistral-7b-v01-base.| |mistralai/Mistral-7B-Instruct-v0.1|✅|Tuned for chat. Alias to mistral-7b-v01-instruct.| |mistralai/Mistral-7B-Instruct-v0.2|✅|Tuned for chat. Alias to mistral.| |tinyllamas/stories15M|✅|Toy model for generate. Alias to stories15M.| |tinyllamas/stories42M|✅|Toy model for generate. Alias to stories42M.| |tinyllamas/stories110M|✅|Toy model for generate. Alias to stories110M.| |openlm-research/open_llama_7b|✅|Best for generate. Alias to open-llama.| | ibm-granite/granite-3b-code-instruct-128k |✅| Alias to granite-code and granite-code-3b.| | ibm-granite/granite-8b-code-instruct-128k |✅| Alias to granite-code-8b.| | ibm-granite/granite-3.0-2b-instruct |✅| Alias to granite3-2b and granite3.| | ibm-granite/granite-3.0-8b-instruct |✅| Alias to granite3-8b.| | ibm-granite/granite-3.1-2b-instruct |✅| Alias to granite3.1-2b and granite3.1.| | ibm-granite/granite-3.1-8b-instruct |✅| Alias to granite3.1-8b.| | deepseek-ai/DeepSeek-R1-Distill-Llama-8B |✅| Alias to deepseek-r1:8b.|

Installation

The following steps require that you have Python 3.10 installed.

[!TIP] torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.

git clone https://github.com/pytorch/torchchat.git
cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels

[shell default]: mkdir exportedModels; ./install/install_requirements.sh

Commands

The interfaces of torchchat are leveraged through Python Commands and Native Runners. While the Python Commands are enumerable in the --help menu, the latter are explored in their respective sections.

python3 torchchat.py --help

# Output
usage: torchchat [-h] {chat,browser,generate,export,eval,download,list,remove,where,server} ...

positional arguments:
  {chat,browser,generate,export,eval,download,list,remove,where,server}
                        The specific command to run
    chat                Chat interactively with a model via the CLI
    generate            Generate responses from a model given a prompt
    browser             Chat interactively with a model in a locally hosted browser
    export              Export a model artifact to AOT Inductor or ExecuTorch
    download            Download model artifacts
    list                List all supported models
    remove              Remove downloaded model artifacts
    where               Return directory containing downloaded model artifacts
    server              [WIP] Starts a locally hosted REST server for model interaction
    eval                Evaluate a model via lm-eval

options:
  -h, --help            show this help message and exit

Python Inference (chat, generate, browser, server)

These commands represent different flavors of performing model inference in a Python enviroment.
Models are constructed either from CLI args or from loading exported artifacts.

Exporting (export)

This command generates model artifacts that are consumed by Python Inference or Native Runners.
More information is provided in the AOT Inductor and ExecuTorch sections.

Inventory Management (download, list, remove, where)

These commands are used to manage and download models.
More information is provided in the Download Weights section.

Evaluation (eval)

This command test model fidelity via EleutherAI's lm_evaluation_harness.
More information is provided in the Evaluation section.

Download Weights

Most models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account. Create a Hugging Face user access token as documented here with the write role.

Log into Hugging Face:

huggingface-cli login

Take a look at the availa

Related Skills

node-connect

350.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。