Ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Install / Use
/learn @containers/RamalamaREADME
RamaLama strives to make working with AI simple, straightforward, and familiar by using OCI containers. <br> <br>
Description
RamaLama is an open-source tool that simplifies the local use and serving of AI models for inference from any source through the familiar approach of containers. It allows engineers to use container-centric development patterns and benefits to extend to AI use cases.
RamaLama eliminates the need to configure the host system by instead pulling a container image specific to the GPUs discovered on the host system, and allowing you to work with various models and platforms.
- Eliminates the complexity for users to configure the host system for AI.
- Detects and pulls an accelerated container image specific to the GPUs on the host system, handling dependencies and hardware optimization.
- RamaLama supports multiple AI model registries, including OCI Container Registries.
- Models are treated similarly to how Podman and Docker treat container images.
- Use common container commands to work with AI models.
- Run AI models securely in rootless containers, isolating the model from the underlying host.
- Keep data secure by defaulting to no network access and removing all temporary data on application exits.
- Interact with models via REST API or as a chatbot. <br>
Install
Install on macOS (Self-Contained Installer)
Download the self-contained macOS installer that includes Python and all dependencies:
- Download the latest
.pkginstaller from Releases - Double-click to install, or run:
sudo installer -pkg RamaLama-*-macOS-Installer.pkg -target /
See macOS Installation Guide for detailed instructions.
Install on Fedora
RamaLama is available in Fedora and later. To install it, run:
sudo dnf install ramalama
Install via PyPI
RamaLama is available via PyPI at https://pypi.org/project/ramalama
pip install ramalama
Install script (Linux and macOS)
Install RamaLama by running:
curl -fsSL https://ramalama.ai/install.sh | bash
Install on Windows
RamaLama supports Windows with Docker Desktop or Podman Desktop:
pip install ramalama
Requirements:
- Python 3.10 or later
- Docker Desktop or Podman Desktop with WSL2 backend
- For GPU support, see NVIDIA GPU Setup for WSL2
Note: Windows support requires running containers via Docker/Podman. The model store uses hardlinks (no admin required) or falls back to file copies if hardlinks are unavailable.
Uninstall
Uninstall via pip
If you installed RamaLama using pip, you can uninstall it with:
pip uninstall ramalama
Uninstall on Fedora
If you installed RamaLama using DNF:
sudo dnf remove ramalama
Uninstall on macOS (Self-Contained Installer)
To remove RamaLama installed via the .pkg installer:
# Remove the executable
sudo rm /usr/local/bin/ramalama
# Remove configuration and data files
sudo rm -rf /usr/local/share/ramalama
# Remove man pages (optional)
sudo rm /usr/local/share/man/man1/ramalama*.1
sudo rm /usr/local/share/man/man5/ramalama*.5
sudo rm /usr/local/share/man/man7/ramalama*.7
# Remove shell completions (optional)
sudo rm /usr/local/share/bash-completion/completions/ramalama
sudo rm /usr/local/share/fish/vendor_completions.d/ramalama.fish
sudo rm /usr/local/share/zsh/site-functions/_ramalama
See the macOS Installation Guide for more details.
Remove User Data and Configuration
After uninstalling RamaLama using any method above, you may want to remove downloaded models and configuration files:
# Remove downloaded models and data (can be large)
rm -rf -- "${XDG_DATA_HOME:-~/.local/share}/ramalama"
# Remove configuration files
rm -rf -- "${XDG_CONFIG_HOME:-~/.config}/ramalama"
# If you ran RamaLama as root, also remove:
sudo rm -rf /var/lib/ramalama
Note: The model data directory (by default ~/.local/share/ramalama) can be quite large depending on how many models you've downloaded. Make sure you want to remove these files before running the commands above.
Accelerated images
| Accelerator | Image | | :---------------------------------| :------------------------- | | GGML_VK_VISIBLE_DEVICES (or CPU) | quay.io/ramalama/ramalama | | HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm | | CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda | | ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi | | INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu | | ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann | | MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |
GPU support inspection
On first run, RamaLama inspects your system for GPU support, falling back to CPU if none are present. RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all necessary software to run an AI Model for your system setup.
<details> <summary> How does RamaLama select the right image? </summary> <br>After initialization, RamaLama runs AI Models within a container based on the OCI image. RamaLama pulls container images specific to the GPUs discovered on your system. These images are tied to the minor version of RamaLama.
- For example, RamaLama version 1.2.3 on an NVIDIA system pulls quay.io/ramalama/cuda:1.2. To override the default image, use the
--imageoption.
RamaLama then pulls AI Models from model registries, starting a chatbot or REST API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.
</details> <br>Hardware Support
| Hardware | Enabled | | :--------------------------------- | :-------------------------: | | CPU | ✓ | | Apple Silicon GPU (Linux / Asahi) | ✓ | | Apple Silicon GPU (macOS) | ✓ llama.cpp or MLX | | Apple Silicon GPU (podman-machine) | ✓ | | Nvidia GPU (cuda) | ✓ See note below | | AMD GPU (rocm, vulkan) | ✓ | | Ascend NPU (Linux) | ✓ | | Intel ARC GPUs (Linux) | ✓ See note below | | Intel GPUs (vulkan / Linux) | ✓ | | Moore Threads GPU (musa / Linux) | ✓ See note below | | Windows (with Docker/Podman) | ✓ Requires WSL2 |
Nvidia GPUs
On systems with NVIDIA GPUs, see ramalama-cuda documentation for the correct host system configuration.
Intel GPUs
The following Intel GPUs are auto-detected by RamaLama:
| GPU ID | Description |
| :------ | :--------------------------------- |
|0xe20b | Intel® Arc™ B580 Graphics |
|0xe20c | Intel® Arc™ B570 Graphics |
|0x7d51 | Intel® Graphics - Arrow Lake-H |
|0x7dd5 | Intel® Graphics - Meteor Lake |
|0x7d55 | Intel® Arc™ Graphics - Meteor Lake |
See the Intel hardware table for more information. <br> <br>
Moore Threads GPUs
On systems with Moore Threads GPUs, see ramalama-musa documentation for the correct host system configuration.
MLX Runtime (macOS only)
The MLX runtime provides optimized inference for Apple Silicon Macs. MLX requires:
- macOS operating system
- Apple Silicon hardware (M1, M2, M3, or later)
- Usage with
--nocontaineroption (containers are not supported) - The
mlx-lmuv package installed on the host system as a uv tool
To install and run Phi-4 on MLX, use uv. If uv is not installed, you can install it with curl -LsSf https://astral.sh/uv/install.sh | sh:
uv tool install mlx-lm
# or upgrade to the latest version:
uv tool upgrade mlx-lm
ramalama --runtime=mlx
