SkillAgentSearch skills...

Alloy

Model-agnostic agent harness for Elixir

Install / Use

/learn @alloy-ex/Alloy
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Claude Desktop
Gemini CLI

README

Alloy

Hex.pm CI Docs License

Minimal, OTP-native agent loop for Elixir.

Alloy is the completion-tool-call loop and nothing else. Send messages to any LLM, execute tool calls, loop until done. Swap providers with one line. Run agents as supervised GenServers. No opinions on sessions, persistence, memory, scheduling, or UI — those belong in your application.

{:ok, result} = Alloy.run("Read mix.exs and tell me the version",
  provider: {Alloy.Provider.OpenAI, api_key: System.get_env("OPENAI_API_KEY"), model: "gpt-5.4"},
  tools: [Alloy.Tool.Core.Read]
)

result.text #=> "The version is 0.9.0"

Why Alloy?

Most agent frameworks try to be everything — sessions, memory, RAG, multi-agent orchestration, scheduling, UI. Alloy does one thing well: the agent loop. Inspired by Pi Agent's minimalism, Alloy brings the same philosophy to the BEAM with OTP's natural advantages: supervision, fault isolation, parallel tool execution, and real concurrency.

  • 3 providers — Anthropic, OpenAI, and OpenAICompat (works with any OpenAI-compatible API: Ollama, OpenRouter, xAI, DeepSeek, Mistral, Groq, Together, etc.)
  • 4 built-in tools — read, write, edit, bash
  • GenServer agents — supervised, stateful, message-passing
  • Streaming — token-by-token from any provider, unified interface
  • Async dispatchsend_message/2 fires non-blocking, result arrives via PubSub
  • Middleware — custom hooks, tool blocking
  • Context compaction — summary-based compaction when approaching token limits, with configurable reserve and fallback to truncation
  • Prompt caching — Anthropic cache: true adds cache breakpoints for 60-90% input token savings
  • Reasoning blocks — DeepSeek/xAI reasoning_content parsed as first-class thinking blocks
  • Provider passthroughextra_body injects arbitrary provider-specific params (response_format, temperature, reasoning_effort)
  • Telemetry — run, turn, provider, and compaction lifecycle events for OTEL/logging/metrics
  • Cost guardmax_budget_cents halts the loop before overspending
  • OTP-native — supervision trees, hot code reloading, real parallel tool execution
  • ~5,000 lines — small enough to read, understand, and extend

Design Boundary

Alloy stays minimal by owning protocol and loop concerns, not application workflows.

What belongs in Alloy:

  • Provider wire-format translation
  • Tool-call / completion loop mechanics
  • Normalized message blocks
  • Opaque provider-owned state such as stored response IDs
  • Provider response metadata such as citations or server-side tool telemetry

What does not belong in Alloy:

  • Sessions and persistence policy
  • File storage, indexing, or retrieval workflows
  • UI rendering for citations, search, or artifacts
  • Scheduling, background job orchestration, or dashboards
  • Tenant plans, quotas, billing, or hosted infrastructure policy

Rule of thumb: if the feature is required to speak a provider API correctly, and could help any Alloy consumer, it likely belongs here. If it needs a database table, product defaults, UI decisions, or tenancy logic, it belongs in your application layer.

Installation

Add alloy to your dependencies in mix.exs:

def deps do
  [
    {:alloy, "~> 0.9"}
  ]
end

Quick Start

Simple completion

{:ok, result} = Alloy.run("What is 2+2?",
  provider: {Alloy.Provider.Anthropic, api_key: "sk-ant-...", model: "claude-sonnet-4-6"}
)

result.text #=> "4"

Agent with tools

{:ok, result} = Alloy.run("Read mix.exs and summarize the dependencies",
  provider: {Alloy.Provider.OpenAICompat,
    api_url: "https://generativelanguage.googleapis.com",
    chat_path: "/v1beta/openai/chat/completions",
    api_key: "...", model: "gemini-2.5-flash-lite"},
  tools: [Alloy.Tool.Core.Read, Alloy.Tool.Core.Bash],
  max_turns: 10
)

Gemini model IDs Alloy now budgets for include gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3-pro-preview, and gemini-3-flash-preview.

Swap providers in one line

# The same tools and conversation work with any provider
opts = [tools: [Alloy.Tool.Core.Read], max_turns: 10]

# Anthropic
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.Anthropic, api_key: "...", model: "claude-sonnet-4-6"}} | opts])

# OpenAI
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"}} | opts])

# xAI via Responses-compatible API
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAI, api_key: "...", api_url: "https://api.x.ai", model: "grok-4"}} | opts])

# xAI via chat completions (reasoning models, extra_body)
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAICompat, api_key: "...", api_url: "https://api.x.ai", model: "grok-4.1-fast-reasoning"}} | opts])

# Any OpenAI-compatible API (Ollama, OpenRouter, DeepSeek, Mistral, Groq, etc.)
Alloy.run("Read mix.exs", [{:provider, {Alloy.Provider.OpenAICompat, api_url: "http://localhost:11434", model: "llama4"}} | opts])

Streaming

For a one-shot run, use Alloy.stream/3:

{:ok, result} =
  Alloy.stream("Explain OTP", fn chunk ->
    IO.write(chunk)
  end,
    provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"}
  )

For a persistent agent process with conversation state, use Alloy.Agent.Server.stream_chat/4:

{:ok, agent} = Alloy.Agent.Server.start_link(
  provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"},
  tools: [Alloy.Tool.Core.Read]
)

{:ok, result} = Alloy.Agent.Server.stream_chat(agent, "Explain OTP", fn chunk ->
  IO.write(chunk)  # Print each token as it arrives
end)

All providers support streaming. If a custom provider doesn't implement stream/4, the turn loop falls back to complete/3 automatically.

Alloy.run/2 remains the buffered convenience API. Use Alloy.stream/3 when you want the same one-shot flow with token streaming.

Provider-owned state

Some provider APIs expose server-side state such as stored response IDs. That transport concern lives in Alloy; your app decides whether and how to persist it.

Results expose provider-owned state in result.metadata.provider_state:

{:ok, result} =
  Alloy.run("Read the repo",
    provider: {Alloy.Provider.OpenAI,
      api_key: System.get_env("XAI_API_KEY"),
      api_url: "https://api.x.ai",
      model: "grok-4",
      store: true
    }
  )

provider_state = result.metadata.provider_state

Pass that state back to the same provider on the next turn to continue a provider-native conversation:

{:ok, next_result} =
  Alloy.run("Keep going",
    messages: result.messages,
    provider: {Alloy.Provider.OpenAI,
      api_key: System.get_env("XAI_API_KEY"),
      api_url: "https://api.x.ai",
      model: "grok-4",
      provider_state: provider_state
    }
  )

Provider-native tools and citations

Responses-compatible providers can expose built-in server-side tools without leaking those wire details into your app layer.

For xAI search tools:

{:ok, result} =
  Alloy.run("Summarise the latest xAI docs updates",
    provider: {Alloy.Provider.OpenAI,
      api_key: System.get_env("XAI_API_KEY"),
      api_url: "https://api.x.ai",
      model: "grok-4",
      web_search: %{allowed_domains: ["docs.x.ai"]},
      include: ["inline_citations"]
    }
  )

Citation metadata is exposed in two places:

  • result.metadata.provider_response.citations for provider-level citation data
  • assistant text blocks may include :annotations for inline citation spans

Overriding model metadata

Alloy derives the compaction budget from the configured provider model when it knows that model's context window. If you need to support a just-released model before Alloy ships a catalog update, override it in config:

{:ok, result} = Alloy.run("Summarise this repository",
  provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4-2026-03-05"},
  model_metadata_overrides: %{
    "gpt-5.4" => 900_000,
    "acme-reasoner" => %{limit: 640_000, suffix_patterns: ["", ~r/^-\d{4}\.\d{2}$/]}
  }
)

Set max_tokens explicitly when you want a fixed compaction budget. Otherwise Alloy derives it from the current model, including after Alloy.Agent.Server.set_model/2 switches to a different provider model.

Use compaction: when you want to tune how much room Alloy reserves before it summarizes older context:

{:ok, result} = Alloy.run("Summarise this repository",
  provider: {Alloy.Provider.OpenAI, api_key: "...", model: "gpt-5.4"},
  compaction: [
    reserve_tokens: 12_000,
    keep_recent_tokens: 8_000,
    fallback: :truncate
  ]
)

Cost guard

Cap how much an agent run can spend:

{:ok, result} = Alloy.run("Research this codebase thoroughly",
  provider: {Alloy.Provider.Anthropic, api_key: "...", model: "claude-sonnet-4-6"},
  tools: [Alloy.Tool.Core.Read, Alloy.Tool.Core.Bash],
  max_budget_cents: 50
)

case result.status do
  :completed -> IO.puts(result.text)
  :budget_exceeded -> IO.puts("Stopped: spent #{result.usage.estimated_cost_cents}¢")
end

Set max_budget_cents: nil (default) for no limit.

Anthropic prompt caching

Enable prompt caching to save 60-90% on input tokens. Alloy automatically adds cache_control breakpoints to the system prompt and last tool definition:

{:ok, result} = Alloy.run("Explain this codebase",
  provider: {Alloy.Provider.Anthropic,
    api_key: "...", model: "claude-sonnet-4-6",
    cache: true
  },
  tools: [Alloy.Tool.Core.Read, Alloy.To

Related Skills

View on GitHub
GitHub Stars50
CategoryDevelopment
Updated6h ago
Forks3

Languages

Elixir

Security Score

100/100

Audited on Mar 26, 2026

No findings