<div align="center"> <img src="https://cdn.prod.website-files.com/66a1237564b8afdc9767dd3d/66df7b326efdddf8c1af9dbb_Momentum%20Logo.svg" height="80"> <h1>Notetaker AI</h1> <p><strong>Intelligent Transcription & Summarization for Professionals</strong></p>

</div>

📋 Table of Contents

🔍 About
🚀 Getting Started
📝 Usage
🖥️ Demo
🐳 Docker Setup
🗺️ Roadmap
👥 Contributors
📄 License

🔍 About The Project

Notetaker AI transforms how professionals handle meetings, interviews, and consultations with advanced audio-to-text capabilities. It combines precise transcription with intelligent summarization to create concise, structured notes that save time and enhance documentation accuracy.

Notetaker workflow

✨ Key Features

🎙️ Smart Transcription: Convert audio to text with exceptional accuracy, including optional speaker diarization and time alignment
📊 Multiple Summary Formats: Generate summaries in various formats to fit different professional needs:
- 📝 Text – Simple, readable plain-text format
- 📋 SOAP – Structured clinical format (Subjective, Objective, Assessment, Plan)
- 🏥 PKI HL7 CDA – Standards-compliant summary for healthcare interoperability
- 🩺 Therapy Assessment – Custom format for structured evaluation of therapist performance across key professional competencies
⏳ Long-form Audio Support: Designed to handle recordings of over 1 hour
⚙️ Flexible Deployment: Can be deployed fully locally, using local AI models for full data control, or using wavaliable external integrations
⚙️ Multiple access points: Run as an API-only service or with an intuitive Gradio UI for interactive use
🚄 GPU Acceleration: Leverage GPU hardware for faster processing of large audio files
🔧 Customizable: Configure to your specific requirements with extensive environment variables

🚀 Getting Started

Follow these steps to set up Notetaker AI in your environment.

Prerequisites

Python: 3.12 or higher
Poetry: For dependency management (Installation Guide)
FFmpeg: Required for audio processing
CUDA Toolkit: 12.2+ recommended (only if using GPU acceleration)
Hugging Face Access: You'll need access to these gated models:
- Speaker Diarization
- Segmentation

Installation

Clone the repository:

git clone https://github.com/the-momentum/notetaker
cd notetaker

Install dependencies:

# For API only (recommended for production)
poetry install --without demo --without dev

# With demo interface (for testing and demonstration)
poetry install --with demo --without dev

📝 Usage

Configuration

Set up environment variables:
```
cp .env.example .env
```
Edit the .env file with your specific configuration.
Start the application:
```
./run.sh
```
The API will be available at http://localhost:8001 by default.
Access the API documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc

Environment Variables

| Variable | Description | Example Value | |----------|-------------|---------------| | PROJECT_NAME | Name used for logging and display | Notetaker AI | | BACKEND_CORS_ORIGINS | Allowed CORS origins | ["http://localhost:8000"] | | HOST | Host address for API availability | 0.0.0.0 | | PORT | Port for the API server | 8001 | | OLLAMA_URL | Base URL for Ollama server | http://localhost:11434 | | LLM_MODEL | LLM model name | llama3.2 | | USE_LOCAL_MODELS | Whether to use local models | True | | WHISPER_MODEL | Whisper model type | turbo | | WHISPER_DEVICE | Device for running Whisper | cpu or cuda | | WHISPER_COMPUTE_TYPE | Compute type for Whisper | int8 | | WHISPER_BATCH_SIZE | Batch size for processing | 16 | | HF_API_KEY | Hugging Face API key | hf_... | | OPENAI_API_KEY | OpenAI API key | sk-proj-... |

⚠️ Note: The transcription output length depends on the selected model's token limit. If the transcription is too long, it may be truncated or cause errors. Choose a model appropriate for the expected transcription length to ensure complete results.

🖥️ Demo

The interactive Gradio demo provides a user-friendly interface to experience Notetaker AI's capabilities without writing code.

Running the Demo

Install demo dependencies (if not already done):
```
poetry install --with demo --without dev
```
Configure the demo: Update demo/.env.demo with your API base URL.
Launch the integrated demo:
```
./run.sh --demo
```
This starts both the API and Gradio interface.
Or run the demo separately (if API is already running):
```
poetry run python demo/ui.py
```

The demo will be available at http://localhost:7860.

Demo Features

📁 Upload or Record: Submit audio files or record directly in your browser
⚙️ Configure Options: Set parameters for transcription and summarization
📊 Format Selection: Choose between different summary formats
⏱️ Real-time Processing: Watch as your audio is transcribed and summarized
💾 Download Results: Save output as JSON for further use

🐳 Docker Setup

For consistent deployment across environments, use our Docker setup.

Quick Commands

# Build the Docker images
just docker-build

# Rebuild without using cache
just docker-rebuild

# Run the API only
just docker-up

# Run API with Gradio demo
just docker-demo

Access Points

API: http://localhost:8001
API Documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
Gradio Demo (if enabled): http://localhost:7860

🗺️ Roadmap

We're continuously enhancing Notetaker AI with new capabilities. Here's what's on the horizon:

[ ] OpenAI API Integration: Direct connection to Whisper via OpenAI API
[ ] Expanded LLM Support: Integration with additional LLM providers
[ ] Enhanced Note Formats: More specialized formats and improved customization options
[ ] Performance Optimizations: Faster processing for large audio files

Have a suggestion? We'd love to hear from you! Contact us or contribute directly.

👥 Contributors

📄 License

Distributed under the MIT License. See LICENSE for more information.

<div align="center"> <p><em>Built with ❤️ by <a href="https://themomentum.ai">Momentum</a> • Turning conversations into structured knowledge</em></p> </div>

Notetaker

Install / Use

README