🎤 Voice Summary

Open Source AI Database for Voice Agent Transcripts

A comprehensive AI-powered database and analytics platform for storing, analyzing, and extracting insights from voice agent call transcripts. Built with FastAPI, React/Next.js, and PostgreSQL, featuring advanced AI analysis, transcript enhancement, and intelligent data extraction.

🎯 What is Voice Summary?

Voice Summary is an open-source AI database specifically designed for voice agent transcripts and call analytics. It provides:

🤖 AI-Powered Transcript Analysis - Advanced machine learning for call outcome analysis
📊 Intelligent Data Extraction - Automatic extraction of customer information and business insights
🏷️ Smart Classification & Labeling - AI-driven call categorization and sentiment analysis
🎵 Advanced Audio Processing - Voice analysis with pause detection and conversation health scoring
☁️ Cloud-Ready Architecture - Built with FastAPI, React, PostgreSQL, and AWS S3 integration

Perfect for call centers, voice bot developers, customer service teams, and AI researchers who need comprehensive voice analytics and transcript management.

✨ Features

🤖 AI-Powered Transcript Analysis: Advanced AI models for call outcome analysis, quality assessment, and performance evaluation
📊 Intelligent Data Extraction: Automatic extraction of customer information, call reasons, and business insights from transcripts
🏷️ Smart Classification & Labeling: AI-driven call categorization, sentiment analysis, and business action labeling
📝 Enhanced Transcript Processing: Automatic timestamp alignment, turn-by-turn conversation analysis, and transcript normalization
🎵 Advanced Audio Analysis: AI-powered voice analysis with pause detection, speech segmentation, and conversation health scoring
🔄 Multi-Agent Comparison: Scenario-based testing to compare multiple voice agents with AI-powered metrics
☁️ S3 Integration: Secure audio file storage with automatic format detection
🌐 Modern Web UI: Beautiful React/Next.js frontend with real-time timeline visualization
🔌 Flexible Data Ingestion: Support for both direct API calls and Bolna platform integration
🚀 FastAPI Backend: High-performance async API with automatic documentation
🗄️ PostgreSQL Database: Robust data storage with Alembic migrations
⚡ Asynchronous Processing: Real-time API responses with background AI processing

🖼️ What you will get

Samples for an appointment booking bot

Calls List

Main Dashboard

Call Insights

Transcript

Transcript Analysis

Audio Analysis

Extracted Data

Extracted data

Labelling & Classification

🚀 Quick Start

Prerequisites

Python 3.9+
Node.js 18+
PostgreSQL 12+
AWS S3 bucket (for audio storage)
OpenAI API key (for AI-powered analysis)

One-Command Setup

# Clone the repository
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary

# Run the complete setup script
./setup.sh

The setup script will:

✅ Check all prerequisites
✅ Create Python virtual environment
✅ Install Python dependencies
✅ Install Node.js dependencies
✅ Set up database and run migrations
✅ Create convenient start scripts

Manual Setup

If you prefer manual setup:

# 1. Clone and navigate
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary

# 2. Setup Python backend
uv sync

# 3. Setup frontend
cd frontend
npm install
cd ..

# 4. Configure environment
cp env.example .env
# Edit .env with your credentials

# 5. Setup database
alembic upgrade head

🏃‍♂️ Running the Application

Start Backend Server

# Option 1: Use the generated script
./start_backend.sh

# Option 2: Manual start
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Start Frontend Server

# Option 1: Use the generated script (in new terminal)
./start_frontend.sh

# Option 2: Manual start
cd frontend
npm run dev

Access Your Application

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Interactive API: http://localhost:8000/redoc

🔧 Configuration

Environment Variables

Create a .env file in the project root:

# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/voicesummary

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-audio-bucket

# OpenAI API (required for AI-powered analysis)
OPENAI_API_KEY=your_openai_api_key

# Optional: Bolna API (if using Bolna platform)
BOLNA_API_KEY=your_bolna_api_key

Database Setup

# Create PostgreSQL database
createdb voicesummary

# Run migrations
alembic upgrade head

📥 Data Ingestion

Voice Summary supports two main data ingestion methods for voice agent transcripts:

⚠️ Important: OpenAI API Key Required

For full AI functionality, you need to add your OpenAI API key to the environment variables:

OPENAI_API_KEY=your_openai_api_key

What happens with OpenAI API key:

✅ AI Transcript Analysis: Intelligent call outcome analysis, quality assessment, and improvement areas
✅ Agent Performance Evaluation: AI-powered goal achievement analysis and script adherence evaluation
✅ Executive Summaries: Intelligent call summaries with key insights and recommendations
✅ Data Extraction Pipeline: Automatic extraction, classification, and labeling of call data using AI

What happens without OpenAI API key:

✅ Audio Analysis: Pause detection, speech segmentation, conversation health scoring
✅ Basic Processing: Audio file processing and S3 storage
❌ No AI Transcript Analysis: Call outcome, quality metrics, and improvement areas won't be generated
❌ No Agent Evaluation: Performance analysis and script adherence won't be available
❌ No Data Extraction: Structured data extraction, classification, and labeling won't be available

Method 1: Direct API Calls (Recommended for Custom Integrations)

Use the REST API to directly ingest voice agent call data with your own S3 storage:

# Create a new call record
curl -X POST "http://localhost:8000/api/calls/" \
  -H "Content-Type: application/json" \
  -d '{
    "call_id": "call_123",
    "transcript": {
      "turns": [
        {
          "role": "AGENT",
          "content": "Hello, how can I help you?",
          "timestamp": "2025-01-01T10:00:00Z"
        },
        {
          "role": "USER", 
          "content": "I need help with my order",
          "timestamp": "2025-01-01T10:00:01Z"
        }
      ]
    },
    "audio_file_url": "https://your-s3-bucket.s3.amazonaws.com/audio/call_123.mp3",
    "timestamp": "2025-01-01T10:00:00Z"
  }'

Benefits:

✅ Full control over S3 storage
✅ Custom audio processing pipelines
✅ Integration with any voice agent platform
✅ Real-time data ingestion
✅ AI-powered analysis and insights

Method 2: Bolna Platform Integration

Use the built-in Bolna integration for automatic voice agent call processing:

# Run the Bolna fetcher
python app/integrations/fetch_bolna_calls_simple.py