ARCHIVAL NOTICE

This repository has been archived due to a lack of time and resources for continued development. If you are interested in continuing the development of this project, or obtaining the crate name, please contact @philpax.

There are several high-quality alternatives for inference of LLMs and other models in Rust. We recommend that you consider using one of these libraries instead of llm; they have been kept up-to-date and are more likely to be actively maintained.

A selection is presented below. Note that this is not an exhaustive list, and the best solution for you may have changed since this list was compiled:

Ratchet: a wgpu-based ML inference library with a focus on web support and efficient inference
Candle-based libraries (i.e. pure Rust outside of platform support libraries):
- mistral.rs: supports quantized models for popular LLM architectures, Apple Silicon + CPU + CUDA support, and is designed to be easy to use
- kalosm: simple interface for language, audio and image models
- candle-transformers: first-party Candle library for inference of a wide variety of transformer-based models, similar to Hugging Face Transformers. Relatively low-level, so some knowledge of ML will be required.
- callm: supports Llama, Mistral, Phi 3 and Qwen 2
llama.cpp wrappers (i.e. not pure Rust, but at the frontier of open-source compiled LLM inference):
- drama_llama: high-level Rust-idiomatic wrapper around llama.cpp
- llm_client: also supports other external LLM APIs
- llama_cpp: safe, high-level Rust bindings
- llama-cpp-2: lightly-wrapped raw bindings that follow the C++ API closely
Aggregators of external LLM APIs:
- allms: type-safe interactions for OpenAI, Anthropic, Mistral, Gemini and more in future. Attempts to share a common interface for all APIs.
- llmclient: Rust client for Gemini, OpenAI, Anthropic and Mistral.

The original README follows.

`llm` - Large Language Models for Everyone, in Rust

llm is an ecosystem of Rust libraries for working with large language models - it's built on top of the fast, efficient GGML library for machine learning.

A llama riding a crab, AI-generated

Image by @darthdeus, using Stable Diffusion

Current State

This library is no longer actively maintained. For reference, the following is the state of the project as of the last update.

There are currently four available versions of llm (the crate and the CLI):

The released version 0.1.1 on crates.io. This version is very out of date and does not include support for the most recent models.
The main branch of this repository. This version can reliably infer GGMLv3 models, but does not support GGUF, and uses an old version of GGML.
The gguf branch of this repository; this is a version of main that supports inferencing with GGUF, but does not support any models other than Llama, requires the use of a Hugging Face tokenizer, and does not support quantization. It also uses an old version of GGML.
The develop branch of this repository. This is a from-scratch re-port of llama.cpp to synchronize with the latest version of GGML, and to support all models and GGUF. This will not be completed due to the archival of the project.

Overview

The primary entrypoint for developers is the llm crate, which wraps llm-base and the supported model crates. Documentation for released version is available on Docs.rs.

For end-users, there is a CLI application, llm-cli, which provides a convenient interface for interacting with supported models. Text generation can be done as a one-off based on a prompt, or interactively, through REPL or chat modes. The CLI can also be used to serialize (print) decoded models, quantize GGML files, or compute the perplexity of a model. It can be downloaded from the latest GitHub release or by installing it from crates.io.

llm is powered by the ggml tensor library, and aims to bring the robustness and ease of use of Rust to the world of large language models. At present, inference is only on the CPU, but we hope to support GPU inference in the future through alternate backends.

Currently, the following models are supported:

BLOOM
GPT-2
GPT-J
GPT-NeoX (includes StableLM, RedPajama, and Dolly 2.0)
LLaMA (includes Alpaca, Vicuna, Koala, GPT4All, and Wizard)
MPT

See getting models for more information on how to download supported models.

Using `llm` in a Rust Project

This project depends on Rust v1.65.0 or above and a modern C toolchain.

The llm crate exports llm-base and the model crates (e.g. bloom, gpt2 llama).

Add llm to your project by listing it as a dependency in Cargo.toml. To use the version of llm you see in the main branch of this repository, add it from GitHub (although keep in mind this is pre-release software):

[dependencies]
llm = { git = "https://github.com/rustformers/llm" , branch = "main" }

To use a released version, add it from crates.io by specifying the desired version:

[dependencies]
llm = "0.1"

By default, llm builds with support for remotely fetching the tokenizer from Hugging Face's model hub. To disable this, disable the default features for the crate, and turn on the models feature to get llm without the tokenizer:

[dependencies]
llm = { version = "0.1", default-features = false, features = ["models"] }

NOTE: To improve debug performance, exclude the transitive ggml-sys dependency from being built in debug mode:

[profile.dev.package.ggml-sys]
opt-level = 3

Leverage Accelerators with `llm`

The llm library is engineered to take advantage of hardware accelerators such as cuda and metal for optimized performance.

To enable llm to harness these accelerators, some preliminary configuration steps are necessary, which vary based on your operating system. For comprehensive guidance, please refer to Acceleration Support in our documentation.

Using `llm` from Other Languages

Bindings for this library are available in the following languages:

Python: LLukas22/llm-rs-python
Node: Atome-FE/llama-node

Using the `llm` CLI

The easiest way to get started with llm-cli is to download a pre-built executable from a released version of llm, but the releases are currently out of date and we recommend you install from source instead.

Installing from Source

To install the main branch of llm with the most recent features to your Cargo bin directory, which rustup is likely to have added to your PATH, run:

cargo install --git https://github.com/rustformers/llm llm-cli

The CLI application can then be run through llm. See also features and acceleration support to turn features on as required. Note that GPU support (CUDA, OpenCL, Metal) will not work unless you build with the relevant feature.

Installing with `cargo`

Note that the currently published version is out of date and does not include support for the most recent models. We currently recommend that you install from source.

To install the most recently released version of llm to your Cargo bin directory, which rustup is likely to have added to your PATH, run:

cargo install llm-cli

The CLI application can then be run through llm. See also features to turn features on as required.

Features

By default, llm builds with support for remotely fetching the tokenizer from Hugging Face's model hub. This adds a dependency on your system's native SSL stack, which may not

Llm

Install / Use

README

ARCHIVAL NOTICE

`llm` - Large Language Models for Everyone, in Rust

Current State

Overview

Using `llm` in a Rust Project

Leverage Accelerators with `llm`

Using `llm` from Other Languages

Using the `llm` CLI

Installing from Source

Installing with `cargo`

Features

Llm

Install / Use

README

ARCHIVAL NOTICE

llm - Large Language Models for Everyone, in Rust

Current State

Overview

Using llm in a Rust Project

Leverage Accelerators with llm

Using llm from Other Languages

Using the llm CLI

Installing from Source

Installing with cargo

Features

`llm` - Large Language Models for Everyone, in Rust

Using `llm` in a Rust Project

Leverage Accelerators with `llm`

Using `llm` from Other Languages

Using the `llm` CLI

Installing with `cargo`