Notetaker
π§ββοΈπ€ AI-powered audio transcription and smart summarization tool that transforms spoken conversations into structured notes for healthcare professionals.
Install / Use
/learn @the-momentum/NotetakerREADME
<a name="readme-top"></a>
<div align="center"> <img src="https://cdn.prod.website-files.com/66a1237564b8afdc9767dd3d/66df7b326efdddf8c1af9dbb_Momentum%20Logo.svg" height="80"> <h1>Notetaker AI</h1> <p><strong>Intelligent Transcription & Summarization for Professionals</strong></p> </div>π Table of Contents
- π About
- π Getting Started
- π Usage
- π₯οΈ Demo
- π³ Docker Setup
- πΊοΈ Roadmap
- π₯ Contributors
- π License
π About The Project
Notetaker AI transforms how professionals handle meetings, interviews, and consultations with advanced audio-to-text capabilities. It combines precise transcription with intelligent summarization to create concise, structured notes that save time and enhance documentation accuracy.
β¨ Key Features
- ποΈ Smart Transcription: Convert audio to text with exceptional accuracy, including optional speaker diarization and time alignment
- π Multiple Summary Formats: Generate summaries in various formats to fit different professional needs:
- π Text β Simple, readable plain-text format
- π SOAP β Structured clinical format (Subjective, Objective, Assessment, Plan)
- π₯ PKI HL7 CDA β Standards-compliant summary for healthcare interoperability
- π©Ί Therapy Assessment β Custom format for structured evaluation of therapist performance across key professional competencies
- β³ Long-form Audio Support: Designed to handle recordings of over 1 hour
- βοΈ Flexible Deployment: Can be deployed fully locally, using local AI models for full data control, or using wavaliable external integrations
- βοΈ Multiple access points: Run as an API-only service or with an intuitive Gradio UI for interactive use
- π GPU Acceleration: Leverage GPU hardware for faster processing of large audio files
- π§ Customizable: Configure to your specific requirements with extensive environment variables
<video src="https://github.com/user-attachments/assets/52b74add-9733-442c-a083-830bfba9d900" controls="controls"></video>
<p align="right">(<a href="#readme-top">back to top</a>)</p>π Getting Started
Follow these steps to set up Notetaker AI in your environment.
Prerequisites
- Python: 3.12 or higher
- Poetry: For dependency management (Installation Guide)
- FFmpeg: Required for audio processing
- CUDA Toolkit: 12.2+ recommended (only if using GPU acceleration)
- Hugging Face Access: You'll need access to these gated models:
Installation
-
Clone the repository:
git clone https://github.com/the-momentum/notetaker cd notetaker -
Install dependencies:
# For API only (recommended for production) poetry install --without demo --without dev # With demo interface (for testing and demonstration) poetry install --with demo --without dev
π Usage
Configuration
-
Set up environment variables:
cp .env.example .envEdit the
.envfile with your specific configuration. -
Start the application:
./run.shThe API will be available at http://localhost:8001 by default.
-
Access the API documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
Environment Variables
| Variable | Description | Example Value |
|----------|-------------|---------------|
| PROJECT_NAME | Name used for logging and display | Notetaker AI |
| BACKEND_CORS_ORIGINS | Allowed CORS origins | ["http://localhost:8000"] |
| HOST | Host address for API availability | 0.0.0.0 |
| PORT | Port for the API server | 8001 |
| OLLAMA_URL | Base URL for Ollama server | http://localhost:11434 |
| LLM_MODEL | LLM model name | llama3.2 |
| USE_LOCAL_MODELS | Whether to use local models | True |
| WHISPER_MODEL | Whisper model type | turbo |
| WHISPER_DEVICE | Device for running Whisper | cpu or cuda |
| WHISPER_COMPUTE_TYPE | Compute type for Whisper | int8 |
| WHISPER_BATCH_SIZE | Batch size for processing | 16 |
| HF_API_KEY | Hugging Face API key | hf_... |
| OPENAI_API_KEY | OpenAI API key | sk-proj-... |
β οΈ Note: The transcription output length depends on the selected model's token limit. If the transcription is too long, it may be truncated or cause errors. Choose a model appropriate for the expected transcription length to ensure complete results.
<p align="right">(<a href="#readme-top">back to top</a>)</p>π₯οΈ Demo
The interactive Gradio demo provides a user-friendly interface to experience Notetaker AI's capabilities without writing code.
Running the Demo
-
Install demo dependencies (if not already done):
poetry install --with demo --without dev -
Configure the demo: Update
demo/.env.demowith your API base URL. -
Launch the integrated demo:
./run.sh --demoThis starts both the API and Gradio interface.
-
Or run the demo separately (if API is already running):
poetry run python demo/ui.py
The demo will be available at http://localhost:7860.
Demo Features
- π Upload or Record: Submit audio files or record directly in your browser
- βοΈ Configure Options: Set parameters for transcription and summarization
- π Format Selection: Choose between different summary formats
- β±οΈ Real-time Processing: Watch as your audio is transcribed and summarized
- πΎ Download Results: Save output as JSON for further use
π³ Docker Setup
For consistent deployment across environments, use our Docker setup.
Quick Commands
# Build the Docker images
just docker-build
# Rebuild without using cache
just docker-rebuild
# Run the API only
just docker-up
# Run API with Gradio demo
just docker-demo
Access Points
- API: http://localhost:8001
- API Documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
- Gradio Demo (if enabled): http://localhost:7860
πΊοΈ Roadmap
We're continuously enhancing Notetaker AI with new capabilities. Here's what's on the horizon:
- [ ] OpenAI API Integration: Direct connection to Whisper via OpenAI API
- [ ] Expanded LLM Support: Integration with additional LLM providers
- [ ] Enhanced Note Formats: More specialized formats and improved customization options
- [ ] Performance Optimizations: Faster processing for large audio files
Have a suggestion? We'd love to hear from you! Contact us or contribute directly.
π₯ Contributors
<a href="https://github.com/the-momentum/notetaker/graphs/contributors"> <img src="https://contrib.rocks/image?repo=the-momentum/notetaker" /> </a> <p align="right">(<a href="#readme-top">back to top</a>)</p>π License
Distributed under the MIT License. See LICENSE for more information.
<div align="center"> <p><em>Built with β€οΈ by <a href="https://themomentum.ai">Momentum</a> β’ Turning conversations into structured knowledge</em></p> </div>
Related Skills
DataOverTime
Bizard: A Biomedical Visualization Atlas. https://openbiox.github.io/Bizard/
OpenClaw-Medical-Skills
1.7kThe largest open-source medical AI skills library for OpenClawπ¦.
Hiplot
Bizard: A Biomedical Visualization Atlas. https://openbiox.github.io/Bizard/
core
Proof of Concept demonstrating Supabase Auth and Storage integration for the GHOSTLY+ clinical rehabilitation platform.
