FileRAG
An advanced file-based Multimodal Retrieval-Augmented Generation system that preserves file context and improves information retrieval precision. Supports multiple AI models (Claude, GPT-4) and file formats (text, image, audio and video).
Install / Use
/learn @richards199999/FileRAGQuality Score
Category
Customer SupportSupported Platforms
README
FileRAG: A File-based Multimodal Retrieval-Augmented Generation System
Demo
https://github.com/user-attachments/assets/3202e8a7-4240-4fd5-a2e0-cc23041b6558
Motivation
Traditional Retrieval-Augmented Generation (RAG) systems often struggle with maintaining context and coherence when dealing with large documents or complex information structures. FileRAG was born out of the need for a more efficient and context-aware knowledge/document/information retrieval system.
The primary motivations for developing FileRAG are:
-
Preserving Document Context: Unlike traditional RAG systems that often retrieve fragmented text snippets, FileRAG maintains the integrity of entire documents, ensuring that the context and coherence of information are preserved.
-
Multimodal Indexing and Retrieval: By using frontier models with vision and audio capabilities, FileRAG is able to index and retrieve text, images, audio files, and video files, providing a comprehensive file retrieval system.
-
Improved Precision: By summarizing and indexing entire documents and media files, FileRAG achieves higher precision in retrieving relevant information, especially crucial in fields such as academia, legal research, technical documentation, and multimedia content management.
-
Scalability: The file-based approach allows for easier management and updating of the knowledge base, making it more scalable for growing document and media collections.
By addressing these challenges, FileRAG aims to provide a more robust and effective solution for knowledge/document/information retrieval and information extraction tasks across various media types.
Features
- Dual Model Support: Choose between Anthropic's Claude and OpenAI's GPT-4 for document summarization and retrieval.
- Multiple File Format Support: Handles PDF, DOCX, TXT, MD, various image files (JPEG, PNG, GIF, WEBP), audio files (MP3, WAV, OGG, FLAC, AAC, OPUS, M4A), and video files (MP4, AVI, MOV, MKV).
- Intelligent Summarization: Generates concise summaries of files for efficient indexing, including specialized summarization for audio transcripts and video content.
- Video Processing: Extracts key frames from videos and summarizes both visual and audio content for comprehensive video understanding.
- Context-Aware Retrieval: Retrieves relevant files based on user queries using advanced language models, including video content.
- Flexible API Integration: Easily switch between different AI providers (Anthropic and OpenAI) for summarization and audio transcription (OpenAI and Lemonfox.ai).
- Organized Results: Stores retrieval results in a structured folder system, separating text, image, audio, and video results for easy access and review.
Structure
Components
- File Indexer (
indexer.py): Indexes and summarizes files in a specified folder, including text, images, audio, and video files. - File Retriever (
retriever.py): Retrieves relevant files based on user queries, handling text, images, audio files, and video files.
Prerequisites
- Python 3.6+
anthropiclibraryopenailibraryPyPDF2librarypython-docxlibraryPillowlibraryopencv-pythonlibrary
Installation
-
Clone the repository:
git clone https://github.com/richards199999/FileRAG.git cd FileRAG -
Install the required dependencies:
pip install anthropic openai PyPDF2 python-docx Pillow opencv-python
Usage
File Indexer
-
Run the indexer:
python indexer.py -
Choose the AI model (Anthropic or OpenAI) for summarization when prompted.
-
Choose the API (OpenAI or Lemonfox.ai) for audio transcription when prompted.
-
Enter your API key(s) for the chosen provider(s).
-
Specify the folder path containing the documents, images, audio files, and video files you want to index.
-
The script will generate a
folder_overview.jsonfile in the specified folder.
File Retriever
-
Run the retriever:
python retriever.py -
Choose the AI model (Anthropic or OpenAI) when prompted.
-
Enter your API key for the chosen provider.
-
Specify the path to the
folder_overview.jsonfile created by the indexer. -
Enter your queries when prompted. The script will retrieve relevant documents, images, audio files, and video files, saving them in the
filerag_resultsfolder with the following structure:filerag_results/ ├── api_response_log.txt ├── YYYYMMDD_HHMMSS/ │ ├── image_results/ │ ├── text_results/ │ │ └── retrieved_text_results.txt │ ├── audio_results/ │ └── video_results/ └── ...Each query session creates a new timestamped folder (YYYYMMDD_HHMMSS) containing the results for that session.
Configuration
- API keys can be set as environment variables (
ANTHROPIC_API_KEY,OPENAI_API_KEY,LEMONFOX_API_KEY) or entered when prompted. - Adjust the
max_tokensandtemperatureparameters in the API calls to fine-tune the model outputs. - For video processing, you can modify the number of key frames extracted by changing the
num_framesparameter in theextract_key_frames()function.
Limitations
- The system currently processes only the first 5 pages of PDF documents to manage processing time and API usage.
- Large files may be truncated to fit within API token limits.
- Audio files might encounter recognition issues with model capacity limitation, low-quality recordings or noisy environments.
- Video processing may be time-consuming for large video files or when processing many videos at once.
- The quality of video summarization depends on the extracted key frames and may not capture all nuances of the video content.
Contributing
Contributions to FileRAG are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgement
Great thanks to Claude-3.5 Sonnet from Anthropic for bringing the idea to life together!🤗
Related Skills
openhue
341.8kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
341.8kElevenLabs text-to-speech with mac-style say UX.
weather
341.8kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.5kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
