RewindDB

A Python library for interfacing with the Rewind.ai SQLite database.

Changelog

2025-07-04 - Voice Export & Training Data Features

NEW: --export-own-voice CLI option for exporting user's voice transcripts organized by day
NEW: --speech-source filter to separate user voice (me) from other speakers (others)
NEW: Multi-format export support: text, JSON, and audio file export
NEW: --export-format audio with --audio-export-dir for exporting actual M4A audio files
NEW: my-words.sh script for generating word clouds from your voice data
ENHANCED: RewindDB core library now supports speech source filtering
USE CASE: Perfect for collecting clean voice training data for LLM fine-tuning
FILTER: Text exports contain only user's voice (no other speakers), audio exports contain full conversations

Project Overview

RewindDB is a Python library that provides a convenient interface to the Rewind.ai SQLite database. Rewind.ai is a personal memory assistant that captures audio transcripts and screen OCR data in real-time. This project allows you to programmatically access and search through this data, making it possible to retrieve past conversations, find specific information mentioned in meetings, or analyze screen content from previous work sessions.

The project consists of three main components:

A core Python library (rewinddb) for direct database access
Command-line tools for transcript retrieval, keyword searching, screen OCR data retrieval, and activity tracking
An MCP STDIO server that exposes these capabilities to GenAI models through the standardized Model Context Protocol

The main purpose of this project, for me, was to connect Rewind to my Raycast:

Installation

Prerequisites

Python 3.6+

Install from Source

# clone the repository
git clone https://github.com/pedramamini/RewindMCP.git
cd RewindMCP

# install the package and dependencies
pip install .

Manual Installation

# install the package in development mode
pip install -e .

Configuration

RewindDB uses a .env file to store database connection parameters. This approach avoids hardcoding sensitive information like database paths and passwords in the source code.

Setting Up the .env File

Create a .env file in your project directory or in your home directory as ~/.rewinddb.env
Add the following configuration parameters:

DB_PATH=/path/to/your/rewind/database.sqlite3
DB_PASSWORD=your_database_password

For example:

DB_PATH=/Users/username/Library/Application Support/com.memoryvault.MemoryVault/db-enc.sqlite3
DB_PASSWORD=your_database_password_here

Custom .env File Location

You can also specify a custom location for your .env file when using the library or CLI tools:

# in python code
db = rewinddb.RewindDB(env_file="/path/to/custom/.env")

# with cli tools
python transcript_cli.py --relative "1 hour" --env-file /path/to/custom/.env
python search_cli.py "meeting" --env-file /path/to/custom/.env
python ocr_cli.py --relative "1 hour" --env-file /path/to/custom/.env
python activity_cli.py --relative "1 day" --env-file /path/to/custom/.env

# with mcp server
python mcp_stdio.py --env-file /path/to/custom/.env

CLI Tools

transcript_cli.py

Retrieve audio transcripts from the Rewind.ai database with advanced voice filtering and export capabilities.

Basic Transcript Retrieval

# get transcripts from the last hour
python transcript_cli.py --relative "1 hour"

# get transcripts from the last 5 hours
python transcript_cli.py --relative "5 hours"

# get transcripts from a specific time range
python transcript_cli.py --from "2023-05-11 13:00:00" --to "2023-05-11 17:00:00"

# enable debug output
python transcript_cli.py --relative "7 days" --debug

# use a custom .env file
python transcript_cli.py --relative "1 hour" --env-file /path/to/custom/.env

Voice Source Filtering

# filter for only your own voice
python transcript_cli.py --relative "1 hour" --speech-source me

# filter for other speakers only
python transcript_cli.py --relative "1 day" --speech-source others

# filter works with any time range
python transcript_cli.py --from "2025-07-01" --to "2025-07-02" --speech-source me

Voice Export for Training Data 🎙️

Perfect for collecting clean voice training data for LLM fine-tuning

# export your voice transcripts organized by day (text format)
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04"

# export as JSON with metadata
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04" --export-format json --save-to my_voice.json

# export actual audio files organized by day
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04" --export-format audio --audio-export-dir ./my_voice_audio

# generate word cloud from your voice data (requires wordcloud library)
pip install wordcloud matplotlib  # install dependencies
./my-words.sh  # automatically uses last 6 months of your voice data

Key Features:

Clean Training Data: Text exports contain only YOUR voice, filtered out other speakers
Audio Export: M4A files organized by day with transcript summaries
Multiple Formats: Text (readable), JSON (structured), Audio (original files)
Day Organization: Perfect for chronological training data or analysis
Word Cloud: Quick visualization of your most-used words with my-words.sh

search_cli.py

Search for keywords across both audio transcripts and screen OCR data.

# search for a keyword with default time range (7 days)
python search_cli.py "meeting"

# search with a specific time range
python search_cli.py "project" --from "2023-05-11 13:00:00" --to "2023-05-11 17:00:00"

# search with a relative time period
python search_cli.py "presentation" --relative "1 day"

# adjust context size and enable debug output
python search_cli.py "python" --context 5 --debug

# use a custom .env file
python search_cli.py "meeting" --env-file /path/to/custom/.env

ocr_cli.py

Retrieve screen OCR (Optical Character Recognition) data from the Rewind.ai database. This tool allows you to see what text was visible on your screen during specific time periods, providing complete OCR text content rather than just metadata about frames and nodes.

# get OCR data from the last hour
python ocr_cli.py --relative "1 hour"

# get OCR data from the last 5 hours (supports short form)
python ocr_cli.py --relative "5h"

# get OCR data from a specific time range
python ocr_cli.py --from "2023-05-11 13:00:00" --to "2023-05-11 17:00:00"

# get OCR data for today only
python ocr_cli.py --from "2023-05-11" --to "2023-05-11"

# get OCR data for specific hours today
python ocr_cli.py --from "13:00" --to "17:00"

# list all applications that have OCR data
python ocr_cli.py --list-apps

# filter OCR data by specific application
python ocr_cli.py --relative "1 day" --app "com.apple.Safari"

# enable debug output and use custom .env file
python ocr_cli.py --relative "7 days" --debug --env-file /path/to/custom/.env

# display times in UTC instead of local time
python ocr_cli.py --relative "1 day" --utc

Key Features:

Time formats: Supports relative time ("1 hour", "5h", "30m", "2d", "1w") and absolute time ranges
Application filtering: Use --list-apps to see available applications, then --app to filter by specific app
Flexible time input: Accepts various formats including date-only, time-only, and full datetime strings
Text extraction: Shows actual text content that was visible on screen, organized by timestamp and application

activity_cli.py

Display comprehensive activity tracking data from the Rewind.ai database, including computer usage patterns, application usage statistics, and calendar meetings.

# get activity data for the last day
python activity_cli.py --relative "1 day"

# get activity data for the last 5 hours (supports short form)
python activity_cli.py --relative "5h"

# get activity data from a specific time range
python activity_cli.py --from "2023-05-11 13:00:00" --to "2023-05-11 17:00:00"

# get activity data for today only
python activity_cli.py --from "2023-05-11" --to "2023-05-11"

# get activity data for specific hours today
python activity_cli.py --from "13:00" --to "17:00"

# enable debug output and use custom .env file
python activity_cli.py --relative "1 week" --debug --env-file /path/to/custom/.env

# display times in UTC instead of local time
python activity_cli.py --relative "1 day" --utc

Key Features:

Active Hours: Shows when your computer was actively being used, with hourly and daily breakdowns
Application Usage: Displays top applications by usage time with visual charts
Calendar Meetings: Shows meeting statistics, duration, and distribution by time of day
Visual Charts: Includes simple ASCII bar charts for easy data visualization
Time Zone Support: Displays times in local timezone by default, with UTC option available

MCP STDIO Server

The Model Context Protocol (MCP) server exposes RewindDB functionality to GenAI models through the standardized MCP STDIO protocol. This implementation is fully MCP-compliant and works with MCP clients like Claude, Raycast, and other AI assistants.

Quick Start

# start the STDIO MCP server
python mcp_stdio.py

# enable debug logging
python mcp_stdio.py --debug

# use a custom .env file
python mcp_stdio.py --env-file /path/to/custom/.env

Available Tools

The MCP server provides the following tools:

get_transcripts_relative: Get audio transcripts from a relative time period (e.g., "

RewindMCP

Install / Use

README

RewindDB

Changelog

2025-07-04 - Voice Export & Training Data Features

Project Overview

Installation

Prerequisites

Install from Source

Manual Installation

Configuration

Setting Up the .env File

Custom .env File Location

CLI Tools

transcript_cli.py

Basic Transcript Retrieval

Voice Source Filtering

Voice Export for Training Data 🎙️

search_cli.py

ocr_cli.py

activity_cli.py

MCP STDIO Server

Quick Start

Available Tools