SkillAgentSearch skills...

Torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Install / Use

/learn @pytorch/Torchchat
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Chat with LLMs Everywhere

torchchat is a small codebase showcasing the ability to run large language models (LLMs) seamlessly. With torchchat, you can run LLMs using Python, within your own (C/C++) application (desktop or server) and on iOS and Android.

[!IMPORTANT] torchchat is no longer under active development. Please see this post for more details

Updates

February 3, 2025: torchchat has support for DeepSeek R1 Distill: 8B!

September 25, 2024: torchchat has multimodal support for Llama3.2 11B!

To try it out, finish the Installation section below, then hop over to our multimodal guide to learn more.

What can you do with torchchat?

Highlights

  • [New!!] Multimodal Support for Llama 3.2 11B
  • Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
  • PyTorch-native execution with performance
  • Supports popular hardware and OS
    • Linux (x86)
    • Mac OS (M1/M2/M3)
    • Android (Devices that support XNNPACK)
    • iOS 17+ and 8+ Gb of RAM (iPhone 15 Pro+ or iPad with Apple Silicon)
  • Multiple data types including: float32, float16, bfloat16
  • Multiple quantization schemes
  • Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)

Models

The following models are supported by torchchat and have associated aliases.

| Model | Mobile Friendly | Notes | |------------------|---|---------------------| |meta-llama/Meta-Llama-3.2-3B-Instruct|✅|Tuned for chat. Alias to llama3.2-3b.| |meta-llama/Meta-Llama-3.2-3B|✅|Best for generate. Alias to llama3.2-3b-base.| |meta-llama/Llama-Guard-3-1B|✅|Tuned for classification. Alias to llama3-1b-guard.| |meta-llama/Meta-Llama-3.2-1B-Instruct|✅|Tuned for chat. Alias to llama3.2-1b.| |meta-llama/Meta-Llama-3.2-1B|✅|Best for generate. Alias to llama3.2-1b-base.| |meta-llama/Llama-3.2-11B-Vision-Instruct||Multimodal (Image + Text). Tuned for chat. Alias to llama3.2-11B.| |meta-llama/Llama-3.2-11B-Vision||Multimodal (Image + Text). Tuned for generate. Alias to llama3.2-11B-base.| |meta-llama/Meta-Llama-3.1-8B-Instruct|✅|Tuned for chat. Alias to llama3.1.| |meta-llama/Meta-Llama-3.1-8B|✅|Best for generate. Alias to llama3.1-base.| |meta-llama/Meta-Llama-3-8B-Instruct|✅|Tuned for chat. Alias to llama3.| |meta-llama/Meta-Llama-3-8B|✅|Best for generate. Alias to llama3-base.| |meta-llama/Llama-2-7b-chat-hf|✅|Tuned for chat. Alias to llama2.| |meta-llama/Llama-2-13b-chat-hf||Tuned for chat. Alias to llama2-13b-chat.| |meta-llama/Llama-2-70b-chat-hf||Tuned for chat. Alias to llama2-70b-chat.| |meta-llama/Llama-2-7b-hf|✅|Best for generate. Alias to llama2-base.| |meta-llama/CodeLlama-7b-Python-hf|✅|Tuned for Python and generate. Alias to codellama.| |meta-llama/CodeLlama-34b-Python-hf|✅|Tuned for Python and generate. Alias to codellama-34b.| |mistralai/Mistral-7B-v0.1|✅|Best for generate. Alias to mistral-7b-v01-base.| |mistralai/Mistral-7B-Instruct-v0.1|✅|Tuned for chat. Alias to mistral-7b-v01-instruct.| |mistralai/Mistral-7B-Instruct-v0.2|✅|Tuned for chat. Alias to mistral.| |tinyllamas/stories15M|✅|Toy model for generate. Alias to stories15M.| |tinyllamas/stories42M|✅|Toy model for generate. Alias to stories42M.| |tinyllamas/stories110M|✅|Toy model for generate. Alias to stories110M.| |openlm-research/open_llama_7b|✅|Best for generate. Alias to open-llama.| | ibm-granite/granite-3b-code-instruct-128k |✅| Alias to granite-code and granite-code-3b.| | ibm-granite/granite-8b-code-instruct-128k |✅| Alias to granite-code-8b.| | ibm-granite/granite-3.0-2b-instruct |✅| Alias to granite3-2b and granite3.| | ibm-granite/granite-3.0-8b-instruct |✅| Alias to granite3-8b.| | ibm-granite/granite-3.1-2b-instruct |✅| Alias to granite3.1-2b and granite3.1.| | ibm-granite/granite-3.1-8b-instruct |✅| Alias to granite3.1-8b.| | deepseek-ai/DeepSeek-R1-Distill-Llama-8B |✅| Alias to deepseek-r1:8b.|

Installation

The following steps require that you have Python 3.10 installed.

[!TIP] torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.

git clone https://github.com/pytorch/torchchat.git
cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels

[shell default]: mkdir exportedModels; ./install/install_requirements.sh

Commands

The interfaces of torchchat are leveraged through Python Commands and Native Runners. While the Python Commands are enumerable in the --help menu, the latter are explored in their respective sections.

python3 torchchat.py --help
# Output
usage: torchchat [-h] {chat,browser,generate,export,eval,download,list,remove,where,server} ...

positional arguments:
  {chat,browser,generate,export,eval,download,list,remove,where,server}
                        The specific command to run
    chat                Chat interactively with a model via the CLI
    generate            Generate responses from a model given a prompt
    browser             Chat interactively with a model in a locally hosted browser
    export              Export a model artifact to AOT Inductor or ExecuTorch
    download            Download model artifacts
    list                List all supported models
    remove              Remove downloaded model artifacts
    where               Return directory containing downloaded model artifacts
    server              [WIP] Starts a locally hosted REST server for model interaction
    eval                Evaluate a model via lm-eval

options:
  -h, --help            show this help message and exit

Python Inference (chat, generate, browser, server)

  • These commands represent different flavors of performing model inference in a Python enviroment.
  • Models are constructed either from CLI args or from loading exported artifacts.

Exporting (export)

  • This command generates model artifacts that are consumed by Python Inference or Native Runners.
  • More information is provided in the AOT Inductor and ExecuTorch sections.

Inventory Management (download, list, remove, where)

  • These commands are used to manage and download models.
  • More information is provided in the Download Weights section.

Evaluation (eval)

Download Weights

Most models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account. Create a Hugging Face user access token as documented here with the write role.

Log into Hugging Face:

huggingface-cli login

Take a look at the availa

Related Skills

View on GitHub
GitHub Stars3.6k
CategoryDevelopment
Updated22h ago
Forks247

Languages

Python

Security Score

100/100

Audited on Apr 6, 2026

No findings