Voicesummary
Open Source AI Database for Voice Agent Transcripts | Call Analysis & Insights | Extraction | Labelling & Classification
Install / Use
/learn @DrDroidLab/VoicesummaryREADME
🎤 Voice Summary
Open Source AI Database for Voice Agent Transcripts
A comprehensive AI-powered database and analytics platform for storing, analyzing, and extracting insights from voice agent call transcripts. Built with FastAPI, React/Next.js, and PostgreSQL, featuring advanced AI analysis, transcript enhancement, and intelligent data extraction.
🎯 What is Voice Summary?
Voice Summary is an open-source AI database specifically designed for voice agent transcripts and call analytics. It provides:
- 🤖 AI-Powered Transcript Analysis - Advanced machine learning for call outcome analysis
- 📊 Intelligent Data Extraction - Automatic extraction of customer information and business insights
- 🏷️ Smart Classification & Labeling - AI-driven call categorization and sentiment analysis
- 🎵 Advanced Audio Processing - Voice analysis with pause detection and conversation health scoring
- ☁️ Cloud-Ready Architecture - Built with FastAPI, React, PostgreSQL, and AWS S3 integration
Perfect for call centers, voice bot developers, customer service teams, and AI researchers who need comprehensive voice analytics and transcript management.
📋 Table of Contents
- ✨ Features
- 🖼️ What you will get
- 🚀 Quick Start
- 📥 Data Ingestion
- 🔍 AI-Powered Data Extraction Pipeline
- 🔌 API Endpoints
- 🎯 Use Cases
- 🏗️ Project Structure
- 🛠️ Development
- 🚀 Deployment
- 🤝 Contributing
- 📚 Documentation
- 🐛 Troubleshooting
- 📄 License
✨ Features
- 🤖 AI-Powered Transcript Analysis: Advanced AI models for call outcome analysis, quality assessment, and performance evaluation
- 📊 Intelligent Data Extraction: Automatic extraction of customer information, call reasons, and business insights from transcripts
- 🏷️ Smart Classification & Labeling: AI-driven call categorization, sentiment analysis, and business action labeling
- 📝 Enhanced Transcript Processing: Automatic timestamp alignment, turn-by-turn conversation analysis, and transcript normalization
- 🎵 Advanced Audio Analysis: AI-powered voice analysis with pause detection, speech segmentation, and conversation health scoring
- 🔄 Multi-Agent Comparison: Scenario-based testing to compare multiple voice agents with AI-powered metrics
- ☁️ S3 Integration: Secure audio file storage with automatic format detection
- 🌐 Modern Web UI: Beautiful React/Next.js frontend with real-time timeline visualization
- 🔌 Flexible Data Ingestion: Support for both direct API calls and Bolna platform integration
- 🚀 FastAPI Backend: High-performance async API with automatic documentation
- 🗄️ PostgreSQL Database: Robust data storage with Alembic migrations
- ⚡ Asynchronous Processing: Real-time API responses with background AI processing
🖼️ What you will get
Samples for an appointment booking bot
Calls List

Call Insights

Transcript

Transcript Analysis

Audio Analysis

Extracted Data

Labelling & Classification

🚀 Quick Start
Prerequisites
- Python 3.9+
- Node.js 18+
- PostgreSQL 12+
- AWS S3 bucket (for audio storage)
- OpenAI API key (for AI-powered analysis)
One-Command Setup
# Clone the repository
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary
# Run the complete setup script
./setup.sh
The setup script will:
- ✅ Check all prerequisites
- ✅ Create Python virtual environment
- ✅ Install Python dependencies
- ✅ Install Node.js dependencies
- ✅ Set up database and run migrations
- ✅ Create convenient start scripts
Manual Setup
If you prefer manual setup:
# 1. Clone and navigate
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary
# 2. Setup Python backend
uv sync
# 3. Setup frontend
cd frontend
npm install
cd ..
# 4. Configure environment
cp env.example .env
# Edit .env with your credentials
# 5. Setup database
alembic upgrade head
🏃♂️ Running the Application
Start Backend Server
# Option 1: Use the generated script
./start_backend.sh
# Option 2: Manual start
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Start Frontend Server
# Option 1: Use the generated script (in new terminal)
./start_frontend.sh
# Option 2: Manual start
cd frontend
npm run dev
Access Your Application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Interactive API: http://localhost:8000/redoc
🔧 Configuration
Environment Variables
Create a .env file in the project root:
# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/voicesummary
# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-audio-bucket
# OpenAI API (required for AI-powered analysis)
OPENAI_API_KEY=your_openai_api_key
# Optional: Bolna API (if using Bolna platform)
BOLNA_API_KEY=your_bolna_api_key
Database Setup
# Create PostgreSQL database
createdb voicesummary
# Run migrations
alembic upgrade head
📥 Data Ingestion
Voice Summary supports two main data ingestion methods for voice agent transcripts:
⚠️ Important: OpenAI API Key Required
For full AI functionality, you need to add your OpenAI API key to the environment variables:
OPENAI_API_KEY=your_openai_api_key
What happens with OpenAI API key:
- ✅ AI Transcript Analysis: Intelligent call outcome analysis, quality assessment, and improvement areas
- ✅ Agent Performance Evaluation: AI-powered goal achievement analysis and script adherence evaluation
- ✅ Executive Summaries: Intelligent call summaries with key insights and recommendations
- ✅ Data Extraction Pipeline: Automatic extraction, classification, and labeling of call data using AI
What happens without OpenAI API key:
- ✅ Audio Analysis: Pause detection, speech segmentation, conversation health scoring
- ✅ Basic Processing: Audio file processing and S3 storage
- ❌ No AI Transcript Analysis: Call outcome, quality metrics, and improvement areas won't be generated
- ❌ No Agent Evaluation: Performance analysis and script adherence won't be available
- ❌ No Data Extraction: Structured data extraction, classification, and labeling won't be available
Method 1: Direct API Calls (Recommended for Custom Integrations)
Use the REST API to directly ingest voice agent call data with your own S3 storage:
# Create a new call record
curl -X POST "http://localhost:8000/api/calls/" \
-H "Content-Type: application/json" \
-d '{
"call_id": "call_123",
"transcript": {
"turns": [
{
"role": "AGENT",
"content": "Hello, how can I help you?",
"timestamp": "2025-01-01T10:00:00Z"
},
{
"role": "USER",
"content": "I need help with my order",
"timestamp": "2025-01-01T10:00:01Z"
}
]
},
"audio_file_url": "https://your-s3-bucket.s3.amazonaws.com/audio/call_123.mp3",
"timestamp": "2025-01-01T10:00:00Z"
}'
Benefits:
- ✅ Full control over S3 storage
- ✅ Custom audio processing pipelines
- ✅ Integration with any voice agent platform
- ✅ Real-time data ingestion
- ✅ AI-powered analysis and insights
Method 2: Bolna Platform Integration
Use the built-in Bolna integration for automatic voice agent call processing:
# Run the Bolna fetcher
python app/integrations/fetch_bolna_calls_simple.py
Benefits:
- ✅ Automatic call discovery and processing
- ✅ Built-in audio analysis and enhancement
- ✅ Transcript normalization and timestamp alignment
- ✅ Seamless S3 upload and storage
- ✅ AI-powered insights and analysis
🔍 AI-Powered Data Extraction Pipeline
Voice Summary includes a sophisticated AI-driven data extraction pipeline that automatically processes voice agent call transcripts to extract structured information, classify calls, and apply relevant business labels.
🎯 Pipeline Features
AI Data Extraction
- Customer Information: Name, email, phone, account number, customer ID
- Product Mentions: Products and services discussed during the call
- Call Reasons
