Bolna
Conversational voice AI agents
Install / Use
/learn @bolna-ai/BolnaREADME
[!NOTE] We are actively looking for maintainers.
Introduction
Bolna is the end-to-end open source production ready framework for quickly building LLM based voice driven conversational applications.
Demo
https://github.com/bolna-ai/bolna/assets/1313096/2237f64f-1c5b-4723-b7e7-d11466e9b226
What is this repository?
This repository contains the entire orchestration platform to build voice AI applications. It technically orchestrates voice conversations using combination of different ASR+LLM+TTS providers and models over websockets.
Components
Bolna helps you create AI Voice Agents which can be instructed to do tasks beginning with:
- Orchestration platform (this open source repository)
- Hosted APIs (https://docs.bolna.ai/api-reference/introduction) built on top of this orchestration platform [currently closed source]
- No-code UI playground at https://platform.bolna.ai/ using the hosted APIs + tailwind CSS [currently closed source]
Development philosophy
- Any integration, enhancement or feature initially lands on this open source package since it forms the backbone of our Hosted APIs and dashboard
- Post that we expose APIs or make changes to existing APIs as required for the same
- Thirdly, we push it to the UI dashboard
graph LR;
A[Bolna open source] -->B[Hosted APIs];
B[Hosted APIs] --> C[Hosted Playground]
Supported providers and models
- Initiating a phone call using telephony providers like
Twilio,Plivo,Exotel(coming soon),Vonage(coming soon) etc. - Transcribing the conversations using
Deepgram,Azureetc. - Using LLMs like
OpenAI,DeepSeek,Llama,Cohere,Mistral, etc to handle conversations - Synthesizing LLM responses back to telephony using
AWS Polly,ElevenLabs,Deepgram,OpenAI,Azure,Cartesia,Smallestetc.
Refer to the docs for a deepdive into all supported providers.
Local example setup [will be moved to a different repository]
A basic local setup includes usage of Twilio or Plivo for telephony. We have dockerized the setup in local_setup/. One will need to populate an environment .env file from .env.sample.
The setup consists of four containers:
- Telephony web server:
- Choosing Twilio: for initiating the calls one will need to set up a Twilio account
- Choosing Plivo: for initiating the calls one will need to set up a Plivo account
- Bolna server: for creating and handling agents
ngrok: for tunneling. One will need to add theauthtokentongrok-config.ymlredis: for persisting agents & prompt data
Quick Start
The easiest way to get started is to use the provided script:
cd local_setup
chmod +x start.sh
./start.sh
This script will check for Docker dependencies, build all services with BuildKit enabled, and start them in detached mode.
Manual Setup
Alternatively, you can manually build and run the services:
- Make sure you have Docker with Docker Compose V2 installed
- Enable BuildKit for faster builds:
export DOCKER_BUILDKIT=1 export COMPOSE_DOCKER_CLI_BUILD=1 - Build the images:
docker compose build - Run the services:
docker compose up -d
To run specific services only:
docker compose up -d bolna-app twilio-app
# or
docker compose up -d bolna-app plivo-app
Once the docker containers are up, you can now start to create your agents and instruct them to initiate calls.
Example agents to create, use and start making calls
You may try out different agents from example.bolna.dev.
Programmatic usage (minimal example)
You can also build and run an agent directly in Python without the local telephony setup.
Example script: examples/simple_assistant.py
import asyncio
from bolna.assistant import Assistant
from bolna.models import (
Transcriber,
Synthesizer,
ElevenLabsConfig,
LlmAgent,
SimpleLlmAgent,
)
async def main():
assistant = Assistant(name="demo_agent")
# Configure audio input (ASR)
transcriber = Transcriber(provider="deepgram", model="nova-2", stream=True, language="en")
# Configure LLM
llm_agent = LlmAgent(
agent_type="simple_llm_agent",
agent_flow_type="streaming",
llm_config=SimpleLlmAgent(
provider="openai",
model="gpt-4o-mini",
temperature=0.3,
),
)
# Configure audio output (TTS)
synthesizer = Synthesizer(
provider="elevenlabs",
provider_config=ElevenLabsConfig(
voice="George", voice_id="JBFqnCBsd6RMkjVDRZzb", model="eleven_turbo_v2_5"
),
stream=True,
audio_format="wav",
)
# Build a single coherent pipeline: transcriber -> llm -> synthesizer
assistant.add_task(
task_type="conversation",
llm_agent=llm_agent,
transcriber=transcriber,
synthesizer=synthesizer,
enable_textual_input=False,
)
# Stream results
async for chunk in assistant.execute():
print(chunk)
if __name__ == "__main__":
asyncio.run(main())
How to run:
export OPENAI_API_KEY=...
export DEEPGRAM_AUTH_TOKEN=...
export ELEVENLABS_API_KEY=...
python examples/simple_assistant.py
This demonstrates orchestration and streaming output. For telephony, use the services in local_setup/.
Note: For REST-based usage (Agent CRUD over HTTP), see API.md in the repo root.
Expected output shape: assistant.execute() is an async generator yielding per-task result dicts (event-like chunks). The exact keys depend on configured tools/providers; treat it as a stream and process incrementally.
Text-only pipeline example
If you want a text-only flow (no transcriber/synthesizer), you can enable a text-only pipeline:
Example script: examples/text_only_assistant.py
import asyncio
from bolna.assistant import Assistant
from bolna.models import LlmAgent, SimpleLlmAgent
async def main():
assistant = Assistant(name="text_only_agent")
llm_agent = LlmAgent(
agent_type="simple_llm_agent",
agent_flow_type="streaming",
llm_config=SimpleLlmAgent(
provider="openai",
model="gpt-4o-mini",
temperature=0.2,
),
)
# No transcriber/synthesizer; enable a text-only pipeline
assistant.add_task(
task_type="conversation",
llm_agent=llm_agent,
enable_textual_input=True,
)
async for chunk in assistant.execute():
print(chunk)
if __name__ == "__main__":
asyncio.run(main())
How to run (text-only):
export OPENAI_API_KEY=...
python examples/text_only_assistant.py
Expected output shape: assistant.execute() yields streaming dicts per task step; fields vary by configuration. Handle chunk-by-chunk.
Using your own providers
You can populate the .env file to use your own keys for providers.
| Provider | Environment variable to be added in .env file |
|--------------|-------------------------------------------------|
| Deepgram | DEEPGRAM_AUTH_TOKEN |
These are the current supported LLM Provider Family: https://github.com/bolna-ai/bolna/blob/10fa26e5985d342eedb5a8985642f12f1cf92a4b/bolna/providers.py#L30-L47
For LiteLLM based LLMs, add either of the following to the .env file depending on your use-case:<br><br>
LITELLM_MODEL_API_KEY: API Key of the LLM<br>
LITELLM_MODEL_API_BASE: URL of the hosted LLM<br>
LITELLM_MODEL_API_VERSION: API VERSION for LLMs like Azure
For LLMs hosted via VLLM, add the following to the .env file:<br>
VLLM_SERVER_BASE_URL: URL of the hosted LLM using VLLM
| Provider | Environment variable to be added in .env file |
|------------|--------------------------------------------------|
| AWS Polly | Accessed from system wide credentials via ~/.aws |
| Elevenlabs | ELEVENLABS_API_KEY |
| OpenAI | OPENAI_API_KEY |
| Deepgram | DEEPGRAM_AUTH_TOKEN |
| Cartesia | CARTESIA_API_KEY |
| Smallest | SMALLEST_API_KEY |
<sum
Related Skills
node-connect
333.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
333.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.0kCommit, push, and open a PR
