SkillAgentSearch skills...

Notetaker

πŸ§‘β€βš•οΈπŸ€– AI-powered audio transcription and smart summarization tool that transforms spoken conversations into structured notes for healthcare professionals.

Install / Use

/learn @the-momentum/Notetaker
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<a name="readme-top"></a>

<div align="center"> <img src="https://cdn.prod.website-files.com/66a1237564b8afdc9767dd3d/66df7b326efdddf8c1af9dbb_Momentum%20Logo.svg" height="80"> <h1>Notetaker AI</h1> <p><strong>Intelligent Transcription & Summarization for Professionals</strong></p>

Contact us Visit Momentum MIT License

</div>

πŸ“‹ Table of Contents

πŸ” About The Project

Notetaker AI transforms how professionals handle meetings, interviews, and consultations with advanced audio-to-text capabilities. It combines precise transcription with intelligent summarization to create concise, structured notes that save time and enhance documentation accuracy.

Notetaker workflow

✨ Key Features

  • πŸŽ™οΈ Smart Transcription: Convert audio to text with exceptional accuracy, including optional speaker diarization and time alignment
  • πŸ“Š Multiple Summary Formats: Generate summaries in various formats to fit different professional needs:
    • πŸ“ Text – Simple, readable plain-text format
    • πŸ“‹ SOAP – Structured clinical format (Subjective, Objective, Assessment, Plan)
    • πŸ₯ PKI HL7 CDA – Standards-compliant summary for healthcare interoperability
    • 🩺 Therapy Assessment – Custom format for structured evaluation of therapist performance across key professional competencies
  • ⏳ Long-form Audio Support: Designed to handle recordings of over 1 hour
  • βš™οΈ Flexible Deployment: Can be deployed fully locally, using local AI models for full data control, or using wavaliable external integrations
  • βš™οΈ Multiple access points: Run as an API-only service or with an intuitive Gradio UI for interactive use
  • πŸš„ GPU Acceleration: Leverage GPU hardware for faster processing of large audio files
  • πŸ”§ Customizable: Configure to your specific requirements with extensive environment variables

<video src="https://github.com/user-attachments/assets/52b74add-9733-442c-a083-830bfba9d900" controls="controls"></video>

<p align="right">(<a href="#readme-top">back to top</a>)</p>

πŸš€ Getting Started

Follow these steps to set up Notetaker AI in your environment.

Prerequisites

  • Python: 3.12 or higher
  • Poetry: For dependency management (Installation Guide)
  • FFmpeg: Required for audio processing
  • CUDA Toolkit: 12.2+ recommended (only if using GPU acceleration)
  • Hugging Face Access: You'll need access to these gated models:

Installation

  1. Clone the repository:

    git clone https://github.com/the-momentum/notetaker
    cd notetaker
    
  2. Install dependencies:

    # For API only (recommended for production)
    poetry install --without demo --without dev
    
    # With demo interface (for testing and demonstration)
    poetry install --with demo --without dev
    
<p align="right">(<a href="#readme-top">back to top</a>)</p>

πŸ“ Usage

Configuration

  1. Set up environment variables:

    cp .env.example .env
    

    Edit the .env file with your specific configuration.

  2. Start the application:

    ./run.sh
    

    The API will be available at http://localhost:8001 by default.

  3. Access the API documentation:

    • Swagger UI: http://localhost:8001/docs
    • ReDoc: http://localhost:8001/redoc

Environment Variables

| Variable | Description | Example Value | |----------|-------------|---------------| | PROJECT_NAME | Name used for logging and display | Notetaker AI | | BACKEND_CORS_ORIGINS | Allowed CORS origins | ["http://localhost:8000"] | | HOST | Host address for API availability | 0.0.0.0 | | PORT | Port for the API server | 8001 | | OLLAMA_URL | Base URL for Ollama server | http://localhost:11434 | | LLM_MODEL | LLM model name | llama3.2 | | USE_LOCAL_MODELS | Whether to use local models | True | | WHISPER_MODEL | Whisper model type | turbo | | WHISPER_DEVICE | Device for running Whisper | cpu or cuda | | WHISPER_COMPUTE_TYPE | Compute type for Whisper | int8 | | WHISPER_BATCH_SIZE | Batch size for processing | 16 | | HF_API_KEY | Hugging Face API key | hf_... | | OPENAI_API_KEY | OpenAI API key | sk-proj-... |

⚠️ Note: The transcription output length depends on the selected model's token limit. If the transcription is too long, it may be truncated or cause errors. Choose a model appropriate for the expected transcription length to ensure complete results.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

πŸ–₯️ Demo

The interactive Gradio demo provides a user-friendly interface to experience Notetaker AI's capabilities without writing code.

Running the Demo

  1. Install demo dependencies (if not already done):

    poetry install --with demo --without dev
    
  2. Configure the demo: Update demo/.env.demo with your API base URL.

  3. Launch the integrated demo:

    ./run.sh --demo
    

    This starts both the API and Gradio interface.

  4. Or run the demo separately (if API is already running):

    poetry run python demo/ui.py
    

The demo will be available at http://localhost:7860.

Demo Features

  • πŸ“ Upload or Record: Submit audio files or record directly in your browser
  • βš™οΈ Configure Options: Set parameters for transcription and summarization
  • πŸ“Š Format Selection: Choose between different summary formats
  • ⏱️ Real-time Processing: Watch as your audio is transcribed and summarized
  • πŸ’Ύ Download Results: Save output as JSON for further use
<p align="right">(<a href="#readme-top">back to top</a>)</p>

🐳 Docker Setup

For consistent deployment across environments, use our Docker setup.

Quick Commands

# Build the Docker images
just docker-build

# Rebuild without using cache
just docker-rebuild

# Run the API only
just docker-up

# Run API with Gradio demo
just docker-demo

Access Points

  • API: http://localhost:8001
  • API Documentation:
    • Swagger UI: http://localhost:8001/docs
    • ReDoc: http://localhost:8001/redoc
  • Gradio Demo (if enabled): http://localhost:7860
<p align="right">(<a href="#readme-top">back to top</a>)</p>

πŸ—ΊοΈ Roadmap

We're continuously enhancing Notetaker AI with new capabilities. Here's what's on the horizon:

  • [ ] OpenAI API Integration: Direct connection to Whisper via OpenAI API
  • [ ] Expanded LLM Support: Integration with additional LLM providers
  • [ ] Enhanced Note Formats: More specialized formats and improved customization options
  • [ ] Performance Optimizations: Faster processing for large audio files

Have a suggestion? We'd love to hear from you! Contact us or contribute directly.

πŸ‘₯ Contributors

<a href="https://github.com/the-momentum/notetaker/graphs/contributors"> <img src="https://contrib.rocks/image?repo=the-momentum/notetaker" /> </a> <p align="right">(<a href="#readme-top">back to top</a>)</p>

πŸ“„ License

Distributed under the MIT License. See LICENSE for more information.


<div align="center"> <p><em>Built with ❀️ by <a href="https://themomentum.ai">Momentum</a> β€’ Turning conversations into structured knowledge</em></p> </div>

Related Skills

View on GitHub
GitHub Stars57
CategoryHealthcare
Updated6d ago
Forks3

Languages

Python

Security Score

80/100

Audited on Mar 20, 2026

No findings