SkillAgentSearch skills...

Voicesummary

Open Source AI Database for Voice Agent Transcripts | Call Analysis & Insights | Extraction | Labelling & Classification

Install / Use

/learn @DrDroidLab/Voicesummary
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🎤 Voice Summary

Open Source AI Database for Voice Agent Transcripts

A comprehensive AI-powered database and analytics platform for storing, analyzing, and extracting insights from voice agent call transcripts. Built with FastAPI, React/Next.js, and PostgreSQL, featuring advanced AI analysis, transcript enhancement, and intelligent data extraction.

Python FastAPI React Next.js PostgreSQL OpenAI License Stars Forks Issues

🎯 What is Voice Summary?

Voice Summary is an open-source AI database specifically designed for voice agent transcripts and call analytics. It provides:

  • 🤖 AI-Powered Transcript Analysis - Advanced machine learning for call outcome analysis
  • 📊 Intelligent Data Extraction - Automatic extraction of customer information and business insights
  • 🏷️ Smart Classification & Labeling - AI-driven call categorization and sentiment analysis
  • 🎵 Advanced Audio Processing - Voice analysis with pause detection and conversation health scoring
  • ☁️ Cloud-Ready Architecture - Built with FastAPI, React, PostgreSQL, and AWS S3 integration

Perfect for call centers, voice bot developers, customer service teams, and AI researchers who need comprehensive voice analytics and transcript management.

📋 Table of Contents

✨ Features

  • 🤖 AI-Powered Transcript Analysis: Advanced AI models for call outcome analysis, quality assessment, and performance evaluation
  • 📊 Intelligent Data Extraction: Automatic extraction of customer information, call reasons, and business insights from transcripts
  • 🏷️ Smart Classification & Labeling: AI-driven call categorization, sentiment analysis, and business action labeling
  • 📝 Enhanced Transcript Processing: Automatic timestamp alignment, turn-by-turn conversation analysis, and transcript normalization
  • 🎵 Advanced Audio Analysis: AI-powered voice analysis with pause detection, speech segmentation, and conversation health scoring
  • 🔄 Multi-Agent Comparison: Scenario-based testing to compare multiple voice agents with AI-powered metrics
  • ☁️ S3 Integration: Secure audio file storage with automatic format detection
  • 🌐 Modern Web UI: Beautiful React/Next.js frontend with real-time timeline visualization
  • 🔌 Flexible Data Ingestion: Support for both direct API calls and Bolna platform integration
  • 🚀 FastAPI Backend: High-performance async API with automatic documentation
  • 🗄️ PostgreSQL Database: Robust data storage with Alembic migrations
  • ⚡ Asynchronous Processing: Real-time API responses with background AI processing

🖼️ What you will get

Samples for an appointment booking bot

Calls List

Main Dashboard

Call Insights

Call Insights

Transcript

Transcript

Transcript Analysis

Transcript Analysis

Audio Analysis

Audio Analysis

Extracted Data

Extracted data

Labelling & Classification

Labelling & Classification

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • Node.js 18+
  • PostgreSQL 12+
  • AWS S3 bucket (for audio storage)
  • OpenAI API key (for AI-powered analysis)

One-Command Setup

# Clone the repository
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary

# Run the complete setup script
./setup.sh

The setup script will:

  • ✅ Check all prerequisites
  • ✅ Create Python virtual environment
  • ✅ Install Python dependencies
  • ✅ Install Node.js dependencies
  • ✅ Set up database and run migrations
  • ✅ Create convenient start scripts

Manual Setup

If you prefer manual setup:

# 1. Clone and navigate
git clone https://github.com/DrDroidLab/voicesummary.git
cd voicesummary

# 2. Setup Python backend
uv sync

# 3. Setup frontend
cd frontend
npm install
cd ..

# 4. Configure environment
cp env.example .env
# Edit .env with your credentials

# 5. Setup database
alembic upgrade head

🏃‍♂️ Running the Application

Start Backend Server

# Option 1: Use the generated script
./start_backend.sh

# Option 2: Manual start
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Start Frontend Server

# Option 1: Use the generated script (in new terminal)
./start_frontend.sh

# Option 2: Manual start
cd frontend
npm run dev

Access Your Application

  • Frontend: http://localhost:3000
  • Backend API: http://localhost:8000
  • API Documentation: http://localhost:8000/docs
  • Interactive API: http://localhost:8000/redoc

🔧 Configuration

Environment Variables

Create a .env file in the project root:

# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/voicesummary

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-audio-bucket

# OpenAI API (required for AI-powered analysis)
OPENAI_API_KEY=your_openai_api_key

# Optional: Bolna API (if using Bolna platform)
BOLNA_API_KEY=your_bolna_api_key

Database Setup

# Create PostgreSQL database
createdb voicesummary

# Run migrations
alembic upgrade head

📥 Data Ingestion

Voice Summary supports two main data ingestion methods for voice agent transcripts:

⚠️ Important: OpenAI API Key Required

For full AI functionality, you need to add your OpenAI API key to the environment variables:

OPENAI_API_KEY=your_openai_api_key

What happens with OpenAI API key:

  • AI Transcript Analysis: Intelligent call outcome analysis, quality assessment, and improvement areas
  • Agent Performance Evaluation: AI-powered goal achievement analysis and script adherence evaluation
  • Executive Summaries: Intelligent call summaries with key insights and recommendations
  • Data Extraction Pipeline: Automatic extraction, classification, and labeling of call data using AI

What happens without OpenAI API key:

  • Audio Analysis: Pause detection, speech segmentation, conversation health scoring
  • Basic Processing: Audio file processing and S3 storage
  • No AI Transcript Analysis: Call outcome, quality metrics, and improvement areas won't be generated
  • No Agent Evaluation: Performance analysis and script adherence won't be available
  • No Data Extraction: Structured data extraction, classification, and labeling won't be available

Method 1: Direct API Calls (Recommended for Custom Integrations)

Use the REST API to directly ingest voice agent call data with your own S3 storage:

# Create a new call record
curl -X POST "http://localhost:8000/api/calls/" \
  -H "Content-Type: application/json" \
  -d '{
    "call_id": "call_123",
    "transcript": {
      "turns": [
        {
          "role": "AGENT",
          "content": "Hello, how can I help you?",
          "timestamp": "2025-01-01T10:00:00Z"
        },
        {
          "role": "USER", 
          "content": "I need help with my order",
          "timestamp": "2025-01-01T10:00:01Z"
        }
      ]
    },
    "audio_file_url": "https://your-s3-bucket.s3.amazonaws.com/audio/call_123.mp3",
    "timestamp": "2025-01-01T10:00:00Z"
  }'

Benefits:

  • ✅ Full control over S3 storage
  • ✅ Custom audio processing pipelines
  • ✅ Integration with any voice agent platform
  • ✅ Real-time data ingestion
  • ✅ AI-powered analysis and insights

Method 2: Bolna Platform Integration

Use the built-in Bolna integration for automatic voice agent call processing:

# Run the Bolna fetcher
python app/integrations/fetch_bolna_calls_simple.py

Benefits:

  • ✅ Automatic call discovery and processing
  • ✅ Built-in audio analysis and enhancement
  • ✅ Transcript normalization and timestamp alignment
  • ✅ Seamless S3 upload and storage
  • ✅ AI-powered insights and analysis

🔍 AI-Powered Data Extraction Pipeline

Voice Summary includes a sophisticated AI-driven data extraction pipeline that automatically processes voice agent call transcripts to extract structured information, classify calls, and apply relevant business labels.

🎯 Pipeline Features

AI Data Extraction

  • Customer Information: Name, email, phone, account number, customer ID
  • Product Mentions: Products and services discussed during the call
  • Call Reasons
View on GitHub
GitHub Stars23
CategoryData
Updated4d ago
Forks9

Languages

Python

Security Score

95/100

Audited on Apr 1, 2026

No findings