Candle
Minimalist ML framework for Rust
Install / Use
/learn @huggingface/CandleREADME
candle
Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use. Try our online demos: whisper, LLaMA2, T5, yolo, Segment Anything.
Get started
Make sure that you have candle-core correctly installed as described in Installation.
Let's see how to run a simple matrix multiplication.
Write the following to your myapp/src/main.rs file:
use candle_core::{Device, Tensor};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let device = Device::Cpu;
let a = Tensor::randn(0f32, 1., (2, 3), &device)?;
let b = Tensor::randn(0f32, 1., (3, 4), &device)?;
let c = a.matmul(&b)?;
println!("{c}");
Ok(())
}
cargo run should display a tensor of shape Tensor[[2, 4], f32].
Having installed candle with Cuda support, simply define the device to be on GPU:
- let device = Device::Cpu;
+ let device = Device::new_cuda(0)?;
For more advanced examples, please have a look at the following section.
Check out our examples
These online demos run entirely in your browser:
- yolo: pose estimation and object recognition.
- whisper: speech recognition.
- LLaMA2: text generation.
- T5: text generation.
- Phi-1.5, and Phi-2: text generation.
- Segment Anything Model: Image segmentation.
- BLIP: image captioning.
We also provide some command line based examples using state of the art models:
- LLaMA v1, v2, and v3: general LLM, includes the SOLAR-10.7B variant.
- Falcon: general LLM.
- Codegeex4: Code completion, code interpreter, web search, function calling, repository-level
- GLM4: Open Multilingual Multimodal Chat LMs by THUDM
- Gemma v1 and v2: 2b and 7b+/9b general LLMs from Google Deepmind.
- RecurrentGemma: 2b and 7b Griffin based models from Google that mix attention with a RNN like state.
- Phi-1, Phi-1.5, Phi-2, and Phi-3: 1.3b, 2.7b, and 3.8b general LLMs with performance on par with 7b models.
- StableLM-3B-4E1T: a 3b general LLM pre-trained on 1T tokens of English and code datasets. Also supports StableLM-2, a 1.6b LLM trained on 2T tokens, as well as the code variants.
- Mamba: an inference only implementation of the Mamba state space model.
- Mistral7b-v0.1: a 7b general LLM with better performance than all publicly available 13b models as of 2023-09-28.
- Mixtral8x7b-v0.1: a sparse mixture of experts 8x7b general LLM with better performance than a Llama 2 70B model with much faster inference.
- StarCoder and StarCoder2: LLM specialized to code generation.
- Qwen1.5: Bilingual (English/Chinese) LLMs.
- RWKV v5 and v6: An RNN with transformer level LLM performance.
- Replit-code-v1.5: a 3.3b LLM specialized for code completion.
- Yi-6B / Yi-34B: two bilingual (English/Chinese) general LLMs with 6b and 34b parameters.
- Quantized LLaMA: quantized version of the LLaMA model using the same quantization techniques as llama.cpp.
- Quantized Qwen3 MoE: support gguf quantized models of Qwen3 MoE models.
- Stable Diffusion: text to image generative model, support for the 1.5, 2.1, SDXL 1.0 and Turbo versions.
- Wuerstchen: another text to image generative model.
<img src="https://github.com/huggingface/candle/raw/main/candle-examples/examples/yolo-v8/assets/bike.od.jpg" width="200"><img src="https://github.com/huggingface/candle/raw/main/candle-examples/examples/yolo-v8/assets/bike.pose.jpg" width="200">
- segment-anything: image segmentation model with prompt.
- SegFormer: transformer based semantic segmentation model.
- Whisper: speech recognition model.
- EnCodec: high-quality audio compression model using residual vector quantization.
- MetaVoice: foundational model for text-to-speech.
- Parler-TTS: large text-to-speech model.
- T5, Bert, JinaBert : useful for sentence embeddings.
- DINOv2: computer vision model trained using self-supervision (can be used for imagenet classification, depth evaluation, segmentation).
- VGG, RepVGG: computer vision models.
- BLIP: image to text model, can be used to generate captions for an image.
- CLIP: multi-model vision and language model.
- TrOCR: a transformer OCR model, with dedicated submodels for hand-writing and printed recognition.
- Marian-MT: neural machine translation model, generates the translated text from the input text.
- Moondream: tiny computer-vision model that can answer real-world questions about images.
Run them using commands like:
cargo run --example quantized --release
In order to use CUDA add --features cuda to the example command line. If
you have cuDNN installed, use --features cudnn for even more speedups.
There are also some wasm examples for whisper and
llama2.c. You can either build them with
trunk or try them online:
whisper,
llama2,
T5,
Phi-1.5, and Phi-2,
Segment Anything Model.
For LLaMA2, run the following command to retrieve the weight files and start a test server:
cd candle-wasm-examples/llama2-c
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/model.bin
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/tokenizer.json
trunk serve --release --port 8081
And then head over to http://localhost:8081/.
<!--- ANCHOR: useful_libraries --->Useful External Resources
candle-tutorial: A very detailed tutorial showing how to convert a PyTorch model to Candle.candle-lora: Efficient and ergonomic LoRA implementation for Candle.candle-lorahas
out-of-the-box LoRA support for many models from Candle, which can be found here.candle-video: Rust library for text-to-video generation (LTX-Video and related models) built on Candle, focused on fast, Python-free inferen
