Domain-Specific Q&A Agent: The RAG Killer?

This project showcases a simpler, more practical alternative to traditional RAG systems - demonstrating how modern search APIs combined with large context windows can eliminate the complexity of Retrieval-Augmented Generation for many documentation Q&A use cases.

As we enter 2025, there's growing evidence that search-first approaches are becoming more cost-effective and simpler than traditional RAG. With models like Gemini 2.5 Flash offering 5M token context windows at competitive prices, many developers are discovering: "Why build complex RAG pipelines when you can just search and load relevant content into context?"

This project provides a hands-on example of this approach - showcasing intelligent search with domain restrictions and organizational guardrails.

Perfect for organizations wanting to create internal knowledge assistants that stay within approved documentation boundaries without the overhead of traditional RAG infrastructure.

🚀 Key Features

🎯 Smart Tool Selection: Automatically chooses between fast search and comprehensive scraping based on query needs
🔍 Domain-Restricted Search: Only searches approved organizational documentation websites
🧠 Web Scraping Fallback: Comprehensive page scraping when search results are insufficient
📝 Intelligent Summarization: Optional AI-powered result summarization reduces token usage by 60-80%
💰 Cost-Competitive: At $0.005-$0.075 per query, often cheaper than traditional RAG systems
⚡ Performance Optimized: Fast search for 90% of queries, deep scraping only when needed
🛡️ Data Security: No sensitive data sent to vector databases or training systems
📊 Transparent Sources: Every answer includes clear source attribution from official documentation
🔧 Easy Configuration: Simple CSV file controls which knowledge sources are accessible
💬 Conversation Memory: Maintains context across multiple questions in a session
🎮 Production Ready: FastAPI backend with proper error handling and logging

🚀 Quick Start

Setting Up Your Knowledge Sources

To configure which websites your agent can search, edit the sites_data.csv file. This CSV defines your agent's knowledge boundaries and domains:

domain,site,description
AI Agent Frameworks,github.com/openai/swarm,OpenAI Swarm documentation for lightweight multi-agent orchestration
AI Operations,docs.agentops.ai,AgentOps documentation for testing debugging and deploying AI agents and LLM apps
AI Data Frameworks,docs.llamaindex.ai,LlamaIndex documentation for building LLM-powered agents over your data

CSV Structure:

domain: The subject area or topic (e.g., "AI Agents", "Web Development", "Machine Learning")
site: The actual website domain to search (e.g., "docs.langchain.com", "docs.python.org")
description: A clear explanation of what the site contains and when to use it

Pro Tip: The description is crucial - it's what the agent uses to decide whether a particular site will be helpful for answering a user's question. Be specific about what topics and types of information each site covers.

Obtaining API Keys

Getting a Tavily API Key:

Go to tavily.com and sign up for a free account
Navigate to your dashboard or API section
Find your API key in the dashboard
Tavily offers a generous free tier with thousands of searches per month

Getting a Google API Key:

Visit ai.google.dev (Google AI Studio)
Sign in with your Google account
Click "Get API Key" or navigate to the API keys section
Create a new project if needed
Generate your API key
Google's Gemini API includes a substantial free tier

After obtaining both keys, add them to your .env file:

TAVILY_API_KEY=your_tavily_key_here
GOOGLE_API_KEY=your_google_key_here

Security Note: Keep these keys secure and never commit them to public repositories. Both services offer excellent free tiers suitable for development and small-scale production use.

Option 1: Using Make (Recommended)

# Clone the repository
git clone https://github.com/javiramos1/qagent.git
cd qagent

# Setup environment and install dependencies
make install

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your API keys

# Run the application
make run

Option 2: Using Docker

# Clone the repository
git clone https://github.com/javiramos1/qagent.git
cd qagent

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your API keys

# Run with Docker Compose
make docker-run

🔧 Configuration

Required Environment Variables

GOOGLE_API_KEY=your_google_api_key_here    # Get from Google Cloud Console
TAVILY_API_KEY=your_tavily_api_key_here    # Get from Tavily.com

Optional Environment Variables

# Search Configuration
MAX_RESULTS=10                    # Maximum search results per query
SEARCH_DEPTH=basic              # Search depth: basic or advanced
MAX_CONTENT_SIZE=100000         # Maximum content size per result
MAX_SCRAPE_LENGTH=10000          # Maximum content length for web scraping (characters)
ENABLE_SEARCH_SUMMARIZATION=false  # Enable AI summarization of search results (reduces tokens 60-80%)

# LLM Configuration
LLM_TEMPERATURE=0.1             # Response creativity (0.0-1.0)
LLM_MAX_TOKENS=10000           # Maximum response length

# Timeout Configuration
REQUEST_TIMEOUT=30              # Request timeout in seconds
LLM_TIMEOUT=60                 # LLM response timeout in seconds

# Web Scraping Configuration
USER_AGENT=QAgent/1.0 (Educational Search-First Q&A Agent)  # Identifies your requests (prevents warnings)

📊 Why Search-First Beats RAG in 2025

Cost Reality Check

Our analysis reveals that search-first approaches are now cost-competitive or even cheaper than traditional RAG systems:

# Fair comparison: Same model (Gemini 2.0 Flash), same token usage

# Search-First Approach (this project)
search_cost = $0.075                    # 1M tokens input + 1K output
# No additional infrastructure needed

# Traditional RAG Approach  
rag_llm_cost = $0.075                   # Same LLM costs as search-first
rag_overhead = $0.002                   # Embeddings + vector DB queries
rag_infrastructure = $0.001             # Hosting, maintenance, pipelines
total_rag_cost = $0.078                 # 4% MORE expensive than search-first!

# Ultra-affordable option
gemini_lite_cost = $0.005               # 128K context with Gemini 2.0 Flash-Lite

Key Findings

Gemini 2.0 Flash-Lite: $0.005 per query - 15x cheaper than RAG
Gemini 2.0 Flash: $0.075 per query - same cost as RAG but no infrastructure
Search-first eliminates: Vector databases, embeddings, chunking, maintenance overhead
Always fresh: No stale embeddings or index updates needed

Latest Model Context Windows (2025)

| Model | Context Window | Token Pricing | Best For | |-------|----------------|---------------|----------| | Gemini 2.0 Flash-Lite | 128K tokens | $0.0375/1M input | Most Q&A scenarios | | Gemini 2.0 Flash | 1M tokens | $0.075/1M input | Complex documentation | | Gemini 2.5 Flash Preview | 1M tokens | $0.15/1M input | Reasoning-heavy tasks | | Gemini 2.5 Pro | 5M tokens | $1.25/1M input | Enterprise analysis | | Traditional RAG | Variable | $0.077/query | Legacy systems only |

Architecture Comparison

Search-First Architecture (This Project):

graph TD
    A[User Query] --> B[Search API]
    B --> C[Relevant Results]
    C --> D[LLM with Context]
    D --> E[Response]
    
    style B fill:#ccffcc
    style D fill:#cceeff

Traditional RAG Architecture:

graph TD
    A[User Query] --> B[Embedding Model]
    B --> C[Vector Database]
    C --> D[Similarity Search]
    D --> E[Chunk Retrieval]
    E --> F[Context Assembly]
    F --> G[LLM Processing]
    G --> H[Response]
    
    I[Document Ingestion] --> J[Chunking]
    J --> K[Embedding Generation]
    K --> L[Vector Storage]
    L --> C
    
    style C fill:#ffcccc
    style J fill:#ffcccc
    style K fill:#ffcccc

Performance Advantages

Recent research (2024-2025) shows that search-first approaches often outperform RAG:

No "lost in the middle" issues - Search returns most relevant content first
Better context relevance - Search algorithms optimize for query relevance
Faster iteration - No embedding regeneration when documents change
Simpler debugging - Easy to see what content was retrieved and why

2025 Strategy Recommendations

🥇 Primary Approach: Search-First (This Project)

✅ Public documentation - Use search APIs with large context windows
✅ Internal wikis - Search across approved domains with guardrails
✅ Cost optimization - 15x cheaper with Gemini 2.0 Flash-Lite
✅ Simplicity - No vector databases or embedding maintenance
✅ Always current - Real-time search results

🥈 Fallback: Hybrid RAG-Search

🔄 Private enterprise data with strict access controls
🔄 Fine-grained permissions on document chunks
🔄 Offline scenarios where search APIs aren't available

🥉 Legacy: Traditional RAG

⚠️ Specialized use cases requiring complex document relationships
⚠️ Ultra-high volume (>100K queries/day) where infrastructure costs amortize

The Verdict: Search-first approaches have fundamentally changed the game in 2025. This project demonstrates: Search + Large Context > RAG for most organizational knowledge systems. 🚀

🏗️ System Architecture

The system uses a search-first approach with intelligent fallback to web scraping for comprehensive information retrieval:

graph TD
    A[User Query] --> B[LangChain Agent]
    B --> C{Analyze Query}

Qagent

Install / Use

README