Bolna
End-to-end platform for building voice first multimodal agents
Install / Use
/learn @voxos-ai/BolnaQuality Score
Category
Development & EngineeringSupported Platforms
README
Introduction
Bolna is the end-to-end open source production ready framework for quickly building LLM based voice driven conversational applications.
Demo
https://github.com/bolna-ai/bolna/assets/1313096/2237f64f-1c5b-4723-b7e7-d11466e9b226
Components
Bolna helps you create AI Voice Agents which can be instructed to do tasks beginning with:
- Initiating a phone call using telephony providers like
Twilio,Plivo,Exotel, etc. - Transcribing the conversations using
Deepgram, etc. - Using LLMs like
OpenAI,Llama,Cohere,Mistral, etc to handle conversations - Synthesizing LLM responses back to telephony using
AWS Polly,XTTS,ElevenLabs,Deepgrametc. - Instructing the Agent to perform tasks like sending emails, text messages, booking calendar after the conversation has ended
Refer to the docs for a deepdive into all supported providers.
Local example setup
A basic local setup includes usage of Twilio or Plivo for telephony. We have dockerized the setup in local_setup/. One will need to populate an environment .env file from .env.sample.
The setup consists of four containers:
- Telephony web server:
- Choosing Twilio: for initiating the calls one will need to set up a Twilio account
- Choosing Plivo: for initiating the calls one will need to set up a Plivo account
- Bolna server: for creating and handling agents
ngrok: for tunneling. One will need to add theauthtokentongrok-config.ymlredis: for persisting agents & prompt data
Use docker to build the images using .env file as the environment file and run them locally
docker-compose build --no-cache <twilio-app | plivo-app>: rebuild imagesdocker-compose up <twilio-app | plivo-app>: run the build images
Once the docker containers are up, you can now start to create your agents and instruct them to initiate calls.
Creating your agent and invoking calls
Once you have the above docker setup and running, you can create agents and initiate calls.
- Use the below payload to create an Agent via
http://localhost:5001/agent
{
"agent_config": {
"agent_name": "Alfred",
"agent_type": "other",
"agent_welcome_message": "Welcome",
"tasks": [
{
"task_type": "conversation",
"toolchain": {
"execution": "parallel",
"pipelines": [
[
"transcriber",
"llm",
"synthesizer"
]
]
},
"tools_config": {
"input": {
"format": "pcm",
"provider": "twilio"
},
"llm_agent": {
"agent_flow_type": "streaming",
"provider": "openai",
"request_json": true,
"model": "gpt-3.5-turbo-16k",
"use_fallback": true
},
"output": {
"format": "pcm",
"provider": "twilio"
},
"synthesizer": {
"audio_format": "wav",
"provider": "elevenlabs",
"stream": true,
"provider_config": {
"voice": "Meera - high quality, emotive",
"model": "eleven_turbo_v2_5",
"voice_id": "TTa58Hl9lmhnQEvhp1WM"
},
"buffer_size": 100.0
},
"transcriber": {
"encoding": "linear16",
"language": "en",
"provider": "deepgram",
"stream": true
}
},
"task_config": {
"hangup_after_silence": 30.0
}
}
]
},
"agent_prompts": {
"task_1": {
"system_prompt": "Ask if they are coming for party tonight"
}
}
}
</details>
- The response of the previous API will return a uuid as the
agent_id. Use thisagent_idto initiate a call via the telephony server running on8001port (for Twilio) or8002port (for Plivo) athttp://localhost:8001/call
{
"agent_id": "4c19700b-227c-4c2d-8bgf-42dfe4b240fc",
"recipient_phone_number": "+19876543210",
}
</details>
Using your own providers
You can populate the .env file to use your own keys for providers.
| Provider | Environment variable to be added in .env file |
|--------------|-------------------------------------------------|
| Deepgram | DEEPGRAM_AUTH_TOKEN |
These are the current supported LLM Provider Family: https://github.com/bolna-ai/bolna/blob/477e08d6800dbf02931abeeea883d78451b7d7e2/bolna/providers.py#L29-L44
For LiteLLM based LLMs, add either of the following to the .env file depending on your use-case:<br><br>
LITELLM_MODEL_API_KEY: API Key of the LLM<br>
LITELLM_MODEL_API_BASE: URL of the hosted LLM<br>
LITELLM_MODEL_API_VERSION: API VERSION for LLMs like Azure
For LLMs hosted via VLLM, add the following to the .env file:<br>
VLLM_SERVER_BASE_URL: URL of the hosted LLM using VLLM
| Provider | Environment variable to be added in .env file |
|------------|--------------------------------------------------|
| AWS Polly | Accessed from system wide credentials via ~/.aws |
| Elevenlabs | ELEVENLABS_API_KEY |
| OpenAI | OPENAI_API_KEY |
| Deepgram | DEEPGRAM_AUTH_TOKEN |
| Provider | Environment variable to be added in .env file |
|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Twilio | TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER|
| Plivo | PLIVO_AUTH_ID, PLIVO_AUTH_TOKEN, PLIVO_PHONE_NUMBER|
Extending with other Telephony Providers
In case you wish to extend and add some other Telephony like Vonage, Telnyx, etc. following the guidelines below:
- Make sure bi-directional streaming is supported by the Telephony provider
- Add the telephony-specific input handler file in input_handlers/telephony_providers writing custom functions extending from the telephony.py class
- This file will mainly contain how different types of event packets are being ingested from the telephony provider
- Add telephony-specific output handler file in output_handlers/telephony_providers writing custom functions extending from the telephony.py class
- This mainly concerns converting audio from the synthesizer class to a supported audio format and streaming it over the websocket provided by the telephony provider
- Lastly, you'll have to write a dedicated server like the example twilio_api_server.py provided in local_setup to initiate calls over websockets.
Open-source v/s Paid
Though the repository is completely open source, you can connect with us if interested in managed hosted offerings or more customized
Related Skills
node-connect
334.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.1kCommit, push, and open a PR
