Alloy
Model-agnostic agent harness for Elixir
Install / Use
/learn @alloy-ex/AlloyQuality Score
Category
Development & EngineeringSupported Platforms
README
Alloy
Minimal, OTP-native agent loop for Elixir.
Alloy is the completion-tool-call loop and nothing else. Send messages to any LLM, execute tool calls, loop until done. Swap providers with one line. Run agents as supervised GenServers. No opinions on sessions, persistence, memory, scheduling, or UI — those belong in your application.
{:ok, result} = Alloy.run("Read mix.exs and tell me the version",
provider: {Alloy.Provider.OpenAI, api_key: System.get_env("OPENAI_API_KEY"), model: "gpt-5.4"},
tools: [Alloy.Tool.Core.Read]
)
result.text #=> "The version is 0.9.0"
Why Alloy?
Most agent frameworks try to be everything — sessions, memory, RAG, multi-agent orchestration, scheduling, UI. Alloy does one thing well: the agent loop. Inspired by Pi Agent's minimalism, Alloy brings the same philosophy to the BEAM with OTP's natural advantages: supervision, fault isolation, parallel tool execution, and real concurrency.
- 3 providers — Anthropic, OpenAI, and OpenAICompat (works with any OpenAI-compatible API: Ollama, OpenRouter, xAI, DeepSeek, Mistral, Groq, Together, etc.)
- 4 built-in tools — read, write, edit, bash
- GenServer agents — supervised, stateful, message-passing
- Streaming — token-by-token from any provider, unified interface
- Async dispatch —
send_message/2fires non-blocking, result arrives via PubSub - Middleware — custom hooks, tool blocking
- Context compaction — summary-based compaction when approaching token limits, with configurable reserve and fallback to truncation
- Prompt caching — Anthropic
cache: trueadds cache breakpoints for 60-90% input token savings - Reasoning blocks — DeepSeek/xAI
reasoning_contentparsed as first-class thinking blocks - Provider passthrough —
extra_bodyinjects arbitrary provider-specific params (response_format, temperature, reasoning_effort) - Telemetry — run, turn, provider, and compaction lifecycle events for OTEL/logging/metrics
- Cost guard —
max_budget_centshalts the loop before overspending - OTP-native — supervision trees, hot code reloading, real parallel tool execution
- ~5,000 lines — small enough to read, understand, and extend
Design Boundary
Alloy stays minimal by owning protocol and loop concerns, not application workflows.
What belongs in Alloy:
- Provider wire-format translation
- Tool-call / completion loop mechanics
- Normalized message blocks
- Opaque provider-owned state such as stored response IDs
- Provider response metadata such as citations or server-side tool telemetry
What does not belong in Alloy:
- Sessions and persistence policy
- File storage, indexing, or retrieval workflows
- UI rendering for citations, search, or artifacts
- Scheduling, background job orchestration, or dashboards
- Tenant plans, quotas, billing, or hosted infrastructure policy
Rule of thumb: if the feature is required to speak a provider API correctly, and could help any Alloy consumer, it likely belongs here. If it needs a database table, product defaults, UI decisions, or tenancy logic, it belongs in your application layer.
Installation
Add alloy to your dependencies in mix.exs:
def deps do
[
{:alloy, "~> 0.9"}
]
end
Quick Start
Simple completion
{:ok, result} = Alloy.run("What is 2+2?",
provider: {Alloy.Provider.Anthropic, api_key: "sk-ant-...", model: "claude-sonnet-4-6"}
)
result.text #=> "4"
Agent with tools
{:ok, result} = Alloy.run("Read mix.exs and summarize the dependencies",
provider: {Alloy.Provider.OpenAICompat,
api_url: "https://generativelanguage.googleapis.com",
chat_path: "/v1beta/openai/chat/completions",
api_key: "...", model: "gemini-2.5-flash-lite"},
tools: [Alloy.Tool.Core.Read, Alloy.Tool.Core.Bash],
max_turns: 10
)
Gemini model IDs Alloy now budgets for include gemini-2.5-pro,
gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3-pro-preview, and
gemini-3-flash-preview.
Swap providers in one line
# The same tools and conversation work with any provider
opts = [tools: [Alloy.Tool.Core.Read], max_turns: 10]
# Anthropic
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.Anthropic, api_key: "...", model: "claude-sonnet-4-6"}} | opts])
# OpenAI
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"}} | opts])
# xAI via Responses-compatible API
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAI, api_key: "...", api_url: "https://api.x.ai", model: "grok-4"}} | opts])
# xAI via chat completions (reasoning models, extra_body)
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAICompat, api_key: "...", api_url: "https://api.x.ai", model: "grok-4.1-fast-reasoning"}} | opts])
# Any OpenAI-compatible API (Ollama, OpenRouter, DeepSeek, Mistral, Groq, etc.)
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAICompat, api_url: "http://localhost:11434", model: "llama4"}} | opts])
Streaming
For a one-shot run, use Alloy.stream/3:
{:ok, result} =
Alloy.stream("Explain OTP", fn chunk ->
IO.write(chunk)
end,
provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"}
)
For a persistent agent process with conversation state, use Alloy.Agent.Server.stream_chat/4:
{:ok, agent} = Alloy.Agent.Server.start_link(
provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"},
tools: [Alloy.Tool.Core.Read]
)
{:ok, result} = Alloy.Agent.Server.stream_chat(agent, "Explain OTP", fn chunk ->
IO.write(chunk) # Print each token as it arrives
end)
All providers support streaming. If a custom provider doesn't implement
stream/4, the turn loop falls back to complete/3 automatically.
Alloy.run/2 remains the buffered convenience API. Use Alloy.stream/3
when you want the same one-shot flow with token streaming.
Provider-owned state
Some provider APIs expose server-side state such as stored response IDs. That transport concern lives in Alloy; your app decides whether and how to persist it.
Results expose provider-owned state in result.metadata.provider_state:
{:ok, result} =
Alloy.run("Read the repo",
provider: {Alloy.Provider.OpenAI,
api_key: System.get_env("XAI_API_KEY"),
api_url: "https://api.x.ai",
model: "grok-4",
store: true
}
)
provider_state = result.metadata.provider_state
Pass that state back to the same provider on the next turn to continue a provider-native conversation:
{:ok, next_result} =
Alloy.run("Keep going",
messages: result.messages,
provider: {Alloy.Provider.OpenAI,
api_key: System.get_env("XAI_API_KEY"),
api_url: "https://api.x.ai",
model: "grok-4",
provider_state: provider_state
}
)
Provider-native tools and citations
Responses-compatible providers can expose built-in server-side tools without leaking those wire details into your app layer.
For xAI search tools:
{:ok, result} =
Alloy.run("Summarise the latest xAI docs updates",
provider: {Alloy.Provider.OpenAI,
api_key: System.get_env("XAI_API_KEY"),
api_url: "https://api.x.ai",
model: "grok-4",
web_search: %{allowed_domains: ["docs.x.ai"]},
include: ["inline_citations"]
}
)
Citation metadata is exposed in two places:
result.metadata.provider_response.citationsfor provider-level citation data- assistant text blocks may include
:annotationsfor inline citation spans
Overriding model metadata
Alloy derives the compaction budget from the configured provider model when it knows that model's context window. If you need to support a just-released model before Alloy ships a catalog update, override it in config:
{:ok, result} = Alloy.run("Summarise this repository",
provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4-2026-03-05"},
model_metadata_overrides: %{
"gpt-5.4" => 900_000,
"acme-reasoner" => %{limit: 640_000, suffix_patterns: ["", ~r/^-\d{4}\.\d{2}$/]}
}
)
Set max_tokens explicitly when you want a fixed compaction budget. Otherwise
Alloy derives it from the current model, including after
Alloy.Agent.Server.set_model/2 switches to a different provider model.
Use compaction: when you want to tune how much room Alloy reserves before it
summarizes older context:
{:ok, result} = Alloy.run("Summarise this repository",
provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"},
compaction: [
reserve_tokens: 12_000,
keep_recent_tokens: 8_000,
fallback: :truncate
]
)
Cost guard
Cap how much an agent run can spend:
{:ok, result} = Alloy.run("Research this codebase thoroughly",
provider: {Alloy.Provider.Anthropic, api_key: "...", model: "claude-sonnet-4-6"},
tools: [Alloy.Tool.Core.Read, Alloy.Tool.Core.Bash],
max_budget_cents: 50
)
case result.status do
:completed -> IO.puts(result.text)
:budget_exceeded -> IO.puts("Stopped: spent #{result.usage.estimated_cost_cents}¢")
end
Set max_budget_cents: nil (default) for no limit.
Anthropic prompt caching
Enable prompt caching to save 60-90% on input tokens. Alloy automatically adds
cache_control breakpoints to the system prompt and last tool definition:
{:ok, result} = Alloy.run("Explain this codebase",
provider: {Alloy.Provider.Anthropic,
api_key: "...", model: "claude-sonnet-4-6",
cache: true
},
tools: [Alloy.Tool.Core.Read, Alloy.To
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
async-pr-review
99.2kTrigger this skill when the user wants to start an asynchronous PR review, run background checks on a PR, or check the status of a previously started async PR review.
ci
99.2kCI Replicate & Status This skill enables the agent to efficiently monitor GitHub Actions, triage failures, and bridge remote CI errors to local development. It defaults to automatic replication
