<img src="https://github.com/user-attachments/assets/1a338ecf-dc84-4495-8c70-16882955da47" width=50%> <a href="https://github.com/containers/ramalama/issues"> <img src="https://img.shields.io/github/issues/containers/ramalama?style=flat-square" height="22" alt="Open Issues"/> </a> <a href="https://github.com/containers/ramalama/blob/main/LICENSE"><img src="https://img.shields.io/github/license/containers/ramalama?style=flat-square" height="22" alt="License"/> </a> <a href="https://github.com/containers/ramalama"> <img src="https://img.shields.io/github/languages/top/containers/ramalama?style=flat-square" height="22" alt="Top language"/> </a> <a href="https://pypi.org/project/ramalama/"> <img src="https://img.shields.io/pypi/v/ramalama?style=flat-square" height="22" alt="PyPI version"/> </a> <a href="https://pypi.org/project/ramalama/"> <img src="https://img.shields.io/pypi/pyversions/ramalama?style=flat-square" height="22" alt="Supported Python versions"/> </a> <a href="https://discord.gg/MkCXuTRBUn"> <img src="https://img.shields.io/discord/1388199960503128165?style=flat-square&logo=discord&label=discord" height="22" alt="Join Discord"/> </a> <a href="https://matrix.to/#/#ramalama:fedoraproject.org"> <img src="https://img.shields.io/badge/matrix-join-blue?style=flat-square&logo=matrix" height="22" alt="Join Matrix"/> </a>

RamaLama strives to make working with AI simple, straightforward, and familiar by using OCI containers.

Description

RamaLama is an open-source tool that simplifies the local use and serving of AI models for inference from any source through the familiar approach of containers. It allows engineers to use container-centric development patterns and benefits to extend to AI use cases.

RamaLama eliminates the need to configure the host system by instead pulling a container image specific to the GPUs discovered on the host system, and allowing you to work with various models and platforms.

Eliminates the complexity for users to configure the host system for AI.
Detects and pulls an accelerated container image specific to the GPUs on the host system, handling dependencies and hardware optimization.
RamaLama supports multiple AI model registries, including OCI Container Registries.
Models are treated similarly to how Podman and Docker treat container images.
Use common container commands to work with AI models.
Run AI models securely in rootless containers, isolating the model from the underlying host.
Keep data secure by defaulting to no network access and removing all temporary data on application exits.
Interact with models via REST API or as a chatbot.

Install

Install on macOS (Self-Contained Installer)

Download the self-contained macOS installer that includes Python and all dependencies:

Download the latest .pkg installer from Releases
Double-click to install, or run: sudo installer -pkg RamaLama-*-macOS-Installer.pkg -target /

See macOS Installation Guide for detailed instructions.

Install on Fedora

RamaLama is available in Fedora and later. To install it, run:

sudo dnf install ramalama

Install via PyPI

RamaLama is available via PyPI at https://pypi.org/project/ramalama

pip install ramalama

Install script (Linux and macOS)

Install RamaLama by running:

curl -fsSL https://ramalama.ai/install.sh | bash

Install on Windows

RamaLama supports Windows with Docker Desktop or Podman Desktop:

pip install ramalama

Requirements:

Python 3.10 or later
Docker Desktop or Podman Desktop with WSL2 backend
For GPU support, see NVIDIA GPU Setup for WSL2

Note: Windows support requires running containers via Docker/Podman. The model store uses hardlinks (no admin required) or falls back to file copies if hardlinks are unavailable.

Uninstall

Uninstall via pip

If you installed RamaLama using pip, you can uninstall it with:

pip uninstall ramalama

Uninstall on Fedora

If you installed RamaLama using DNF:

sudo dnf remove ramalama

Uninstall on macOS (Self-Contained Installer)

To remove RamaLama installed via the .pkg installer:

# Remove the executable
sudo rm /usr/local/bin/ramalama

# Remove configuration and data files
sudo rm -rf /usr/local/share/ramalama

# Remove man pages (optional)
sudo rm /usr/local/share/man/man1/ramalama*.1
sudo rm /usr/local/share/man/man5/ramalama*.5
sudo rm /usr/local/share/man/man7/ramalama*.7

# Remove shell completions (optional)
sudo rm /usr/local/share/bash-completion/completions/ramalama
sudo rm /usr/local/share/fish/vendor_completions.d/ramalama.fish
sudo rm /usr/local/share/zsh/site-functions/_ramalama

See the macOS Installation Guide for more details.

Remove User Data and Configuration

After uninstalling RamaLama using any method above, you may want to remove downloaded models and configuration files:

# Remove downloaded models and data (can be large)
rm -rf -- "${XDG_DATA_HOME:-~/.local/share}/ramalama"

# Remove configuration files
rm -rf -- "${XDG_CONFIG_HOME:-~/.config}/ramalama"

# If you ran RamaLama as root, also remove:
sudo rm -rf /var/lib/ramalama

Note: The model data directory (by default ~/.local/share/ramalama) can be quite large depending on how many models you've downloaded. Make sure you want to remove these files before running the commands above.

Accelerated images

| Accelerator | Image | | :---------------------------------| :------------------------- | | GGML_VK_VISIBLE_DEVICES (or CPU) | quay.io/ramalama/ramalama | | HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | | CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | | ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | | INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | | ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | | MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |

GPU support inspection

On first run, RamaLama inspects your system for GPU support, falling back to CPU if none are present. RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all necessary software to run an AI Model for your system setup.

<details> <summary> How does RamaLama select the right image? </summary>

After initialization, RamaLama runs AI Models within a container based on the OCI image. RamaLama pulls container images specific to the GPUs discovered on your system. These images are tied to the minor version of RamaLama.

For example, RamaLama version 1.2.3 on an NVIDIA system pulls quay.io/ramalama/cuda:1.2. To override the default image, use the --image option.

RamaLama then pulls AI Models from model registries, starting a chatbot or REST API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.

</details>

Hardware Support

Nvidia GPUs

On systems with NVIDIA GPUs, see ramalama-cuda documentation for the correct host system configuration.

Intel GPUs

The following Intel GPUs are auto-detected by RamaLama:

| GPU ID | Description | | :------ | :--------------------------------- | |0xe20b | Intel® Arc™ B580 Graphics | |0xe20c | Intel® Arc™ B570 Graphics | |0x7d51 | Intel® Graphics - Arrow Lake-H | |0x7dd5 | Intel® Graphics - Meteor Lake | |0x7d55 | Intel® Arc™ Graphics - Meteor Lake |

See the Intel hardware table for more information.

Moore Threads GPUs

On systems with Moore Threads GPUs, see ramalama-musa documentation for the correct host system configuration.

MLX Runtime (macOS only)

The MLX runtime provides optimized inference for Apple Silicon Macs. MLX requires:

macOS operating system
Apple Silicon hardware (M1, M2, M3, or later)
Usage with --nocontainer option (containers are not supported)
The mlx-lm uv package installed on the host system as a uv tool

To install and run Phi-4 on MLX, use uv. If uv is not installed, you can install it with curl -LsSf https://astral.sh/uv/install.sh | sh:

uv tool install mlx-lm
# or upgrade to the latest version:
uv tool upgrade mlx-lm

ramalama --runtime=mlx

Ramalama

Install / Use

README