🤖 AI Coding Assistant - Enhanced with Dual-Mode AI Processing

A comprehensive, production-ready AI coding assistant that matches and exceeds the functionality of Augment Code, Cursor, and Windsurf. Features dual-mode operation with seamless switching between local privacy and cloud-powered performance, advanced semantic search, intelligent code generation, and contextual explanations.

🌟 NEW: Augment Code-Level Features

🔄 Dual-Mode AI Processing: Seamlessly switch between local (Ollama) and online (GPT-4o, Claude-3.5-Sonnet, Gemini Pro) AI models
🔐 Secure API Key Management: Built-in secure storage using VS Code's SecretStorage API
⚡ Ultra-Fast Inference: Groq integration with 500+ tokens/second processing
🎯 Smart Provider Selection: Automatic fallback and intelligent routing
📱 Enhanced UI: Context-aware chat, inline suggestions, and real-time streaming
🔧 Advanced Configuration: Comprehensive settings panel with provider management

🚀 Core Features

🤖 Dual-Mode AI Processing

Local Mode: Complete privacy with Ollama (CodeLlama, DeepSeek-Coder, Qwen2.5-Coder)
Online Mode: Access to latest models (GPT-4o, Claude-3.5-Sonnet, Gemini Pro, Groq)
Hybrid Mode: Intelligent routing between local and cloud for optimal performance
Smart Fallback: Automatic provider switching if primary fails

🔍 Advanced Code Intelligence

Semantic Code Search: Query your entire codebase using natural language
Context-Aware Explanations: Detailed code explanations with surrounding context
Intelligent Code Generation: Generate production-ready code from prompts
Smart Refactoring: AI-powered refactoring suggestions with examples
Pattern Detection: Find similar code patterns and potential duplications
Real-time Indexing: Automatic re-indexing when files change

🎯 Enhanced User Experience

Inline Code Suggestions: Real-time AI suggestions as you type
Streaming Responses: See AI responses as they're generated
Context-Aware Chat: Persistent chat with full codebase context
Quick Actions: Right-click context menu for instant AI help
Status Indicators: Real-time status of AI providers and indexing progress

🤖 Supported AI Providers

🏠 Local Providers (Free & Private)

| Provider | Models | Context | Speed | Privacy | |----------|--------|---------|-------|---------| | Ollama | CodeLlama 7B/13B/34B<br>DeepSeek-Coder 6.7B<br>Qwen2.5-Coder 7B | 16K-32K | Hardware-dependent | 🟢 Complete |

☁️ Online Providers (Cloud APIs)

| Provider | Models | Context | Speed | Free Tier | |----------|--------|---------|-------|-----------| | OpenAI | GPT-4o, GPT-4 Turbo | 128K | Fast | $5 credit | | Anthropic | Claude 3.5 Sonnet, Claude 3 Opus | 200K | Fast | Limited | | Google AI | Gemini 1.5 Pro, Gemini 1.5 Flash | 2M | Medium | Generous | | Groq | Llama 3.1 70B, Mixtral 8x7B | 131K | Ultra-fast | 14.4K req/day | | Together AI | Llama 3 70B, CodeLlama 34B | 8K-16K | Fast | $25 credit |

🎯 Quick Setup Recommendations

Privacy-First: Use Local mode with Ollama
Speed-First: Use Groq (500+ tokens/second, generous free tier)
Quality-First: Use GPT-4o or Claude 3.5 Sonnet
Budget-First: Use Hybrid mode (local + Groq fallback)
Enterprise: Use Local mode with larger models (13B/34B)

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   VS Code       │    │   FastAPI        │    │   Qdrant        │
│   Extension     │◄──►│   Backend        │◄──►│   Vector DB     │
│   (TypeScript)  │    │   (Python)       │    │   (Self-hosted) │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                       ┌────────▼────────┐
                       │   Ollama        │
                       │   Local LLMs    │
                       │   (Free Models) │
                       └─────────────────┘

🛠️ Tech Stack (100% Free & Open Source)

Core Components

Frontend: VS Code Extension (TypeScript)
Backend: Python FastAPI
Database: Qdrant Vector Database (self-hosted)
AI Models: Ollama (CodeLlama, DeepSeek-Coder, Qwen2.5-Coder)
Code Parsing: Tree-sitter
Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)

Free Cloud Alternatives (Optional)

Vector DB: Pinecone (free tier), Weaviate Cloud
LLM APIs: Groq (free tier), Together AI, Google Gemini Pro
Hosting: Railway, Render, Fly.io (all have free tiers)

📋 Prerequisites

Minimum Requirements

RAM: 8GB (16GB recommended)
CPU: 4-core (8-core recommended)
Storage: 5GB free space (10GB recommended)
OS: Windows 10+, macOS 10.15+, or Linux

Required Software

Node.js (v16+)
Python (3.8+)
Docker & Docker Compose
VS Code
Git

🚀 Quick Start (3 Steps)

1. Automated Setup

# Clone and setup everything automatically
git clone https://github.com/njrgourav11/Reflyx.git
cd Reflyx
python setup.py  # Installs everything: Docker, models, dependencies

# Start all services
docker-compose up -d

2. Install VS Code Extension

# Build and package the extension
cd extension
npm install && npm run compile && npm run package

# Install the .vsix file in VS Code:
# 1. Open VS Code
# 2. Ctrl+Shift+P → "Extensions: Install from VSIX"
# 3. Select the generated .vsix file
# 4. Restart VS Code

3. Configure AI Providers

# Option A: Local Only (Complete Privacy)
# 1. Install Ollama: https://ollama.ai
# 2. Pull models: ollama pull codellama:7b-code
# 3. In VS Code: Ctrl+Shift+, → Set mode to "Local"

# Option B: Online + Local (Best Performance)
# 1. Get free API keys (see API Keys Guide)
# 2. In VS Code: Ctrl+Shift+, → Configure providers
# 3. Set mode to "Hybrid" for best of both worlds

🎯 First Steps After Installation

Immediate Setup (2 minutes)

Open VS Code in your project directory
Index your codebase: Ctrl+Shift+P → "AI Coding Assistant: Index Workspace"
Open settings: Ctrl+Shift+, to configure AI providers
Start chatting: Ctrl+Shift+C to open the AI chat panel

Test Your Installation

# Quick health check
make health

# Or manually test each service:
curl http://localhost:8000/api/v1/health  # Backend
curl http://localhost:6333/health         # Vector DB
curl http://localhost:11434/api/tags      # Ollama (if using local)

Try These Example Queries

"Where is user authentication handled in this codebase?"
"Show me all database connection functions"
"Find error handling patterns"
Select code → Right-click → "Explain Selection"
Ctrl+Shift+G → "Create a REST API endpoint for user login"

Or start individually:

Backend server

cd server && python -m uvicorn app.main:app --reload

Qdrant vector database

docker run -p 6333:6333 qdrant/qdrant

Ollama (install from https://ollama.ai)

ollama pull codellama:7b-code


### 3. Install VS Code Extension
```bash
cd extension
npm install
npm run compile
# Install the .vsix package in VS Code

4. Configure Workspace

Open VS Code in your project directory
Run command: AI Coding Assistant: Index Workspace
Start chatting with your codebase!

📖 Usage Guide

Basic Commands

Ctrl+Shift+P → AI Coding Assistant: Ask Codebase
Ctrl+Shift+P → AI Coding Assistant: Explain Selection
Ctrl+Shift+P → AI Coding Assistant: Generate Code
Ctrl+Shift+P → AI Coding Assistant: Find Similar

Chat Interface

Open the AI Assistant sidebar panel
Type natural language queries about your code
Get contextual responses with code references

Example Queries

"Where is user authentication handled?"
"Explain this function and its dependencies"
"Generate a REST API endpoint for user login"
"Find all database connection functions"
"Suggest refactoring for this class"

⚙️ Configuration

VS Code Settings

{
  "aiCodingAssistant.modelProvider": "ollama",
  "aiCodingAssistant.embeddingModel": "all-MiniLM-L6-v2",
  "aiCodingAssistant.maxChunkSize": 500,
  "aiCodingAssistant.retrievalCount": 10,
  "aiCodingAssistant.ignorePatterns": [
    "node_modules/**",
    ".git/**",
    "*.min.js",
    "dist/**"
  ]
}

Environment Variables

# Backend Configuration
QDRANT_URL=http://localhost:6333
OLLAMA_URL=http://localhost:11434
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Optional Cloud APIs (Free Tiers)
OPENAI_API_KEY=your_key_here
GROQ_API_KEY=your_key_here

🧪 Supported Languages

Primary: TypeScript, JavaScript, Python, Java
Additional: C++, Rust, Go, C#, PHP, Ruby
Extensible: Easy to add new languages via Tree-sitter grammars

📊 Performance Metrics

Indexing Performance

Speed: ~1000 files/minute (4-core CPU)
Memory: <2GB during indexing
Storage: ~100MB per 10K files

Query Performance

Simple Queries: <3 seconds
Code Generation: <8 seconds
Context Window: Up to 32K tokens

🔧 Development

Project Structure

ai-coding-assistant/
├── extension/          # VS Code extension (TypeScript)
├── server/            # FastAPI backend (Python)
├── indexer/           # Code parsing & embedding
├── docker-compose.yml # Local development setup
└── docs/             # Documentation

Running in Development Mode

# Backend with hot reload
cd server && python -m uvicorn app.main:app --reload

# Extension development
cd extension && npm run watch

# Vector database
docker run -p 6333:6333 qdrant/qdrant

🐛 Troubleshooting

Common Issues

1. Ollama Connection Failed

# Check if Ollama is running
curl http://localhost:11434/api/tags

Reflyx

Install / Use

README