Reflyx
Reflyx — a free, open-source AI coding assistant for VS Code. Works like Augment, using only free resources. Runs fully offline with local LLMs (Ollama, LM Studio, DeepSeek, Qwen) or online with GPT-4o/Claude. Indexes your entire codebase with Tree-sitter + embeddings + vector DB. Private, fast, and auto-syncs as you code.
Install / Use
/learn @njrgourav11/ReflyxQuality Score
Category
Development & EngineeringSupported Platforms
README
🤖 AI Coding Assistant - Enhanced with Dual-Mode AI Processing
A comprehensive, production-ready AI coding assistant that matches and exceeds the functionality of Augment Code, Cursor, and Windsurf. Features dual-mode operation with seamless switching between local privacy and cloud-powered performance, advanced semantic search, intelligent code generation, and contextual explanations.
🌟 NEW: Augment Code-Level Features
- 🔄 Dual-Mode AI Processing: Seamlessly switch between local (Ollama) and online (GPT-4o, Claude-3.5-Sonnet, Gemini Pro) AI models
- 🔐 Secure API Key Management: Built-in secure storage using VS Code's SecretStorage API
- ⚡ Ultra-Fast Inference: Groq integration with 500+ tokens/second processing
- 🎯 Smart Provider Selection: Automatic fallback and intelligent routing
- 📱 Enhanced UI: Context-aware chat, inline suggestions, and real-time streaming
- 🔧 Advanced Configuration: Comprehensive settings panel with provider management
🚀 Core Features
🤖 Dual-Mode AI Processing
- Local Mode: Complete privacy with Ollama (CodeLlama, DeepSeek-Coder, Qwen2.5-Coder)
- Online Mode: Access to latest models (GPT-4o, Claude-3.5-Sonnet, Gemini Pro, Groq)
- Hybrid Mode: Intelligent routing between local and cloud for optimal performance
- Smart Fallback: Automatic provider switching if primary fails
🔍 Advanced Code Intelligence
- Semantic Code Search: Query your entire codebase using natural language
- Context-Aware Explanations: Detailed code explanations with surrounding context
- Intelligent Code Generation: Generate production-ready code from prompts
- Smart Refactoring: AI-powered refactoring suggestions with examples
- Pattern Detection: Find similar code patterns and potential duplications
- Real-time Indexing: Automatic re-indexing when files change
🎯 Enhanced User Experience
- Inline Code Suggestions: Real-time AI suggestions as you type
- Streaming Responses: See AI responses as they're generated
- Context-Aware Chat: Persistent chat with full codebase context
- Quick Actions: Right-click context menu for instant AI help
- Status Indicators: Real-time status of AI providers and indexing progress
🤖 Supported AI Providers
🏠 Local Providers (Free & Private)
| Provider | Models | Context | Speed | Privacy | |----------|--------|---------|-------|---------| | Ollama | CodeLlama 7B/13B/34B<br>DeepSeek-Coder 6.7B<br>Qwen2.5-Coder 7B | 16K-32K | Hardware-dependent | 🟢 Complete |
☁️ Online Providers (Cloud APIs)
| Provider | Models | Context | Speed | Free Tier | |----------|--------|---------|-------|-----------| | OpenAI | GPT-4o, GPT-4 Turbo | 128K | Fast | $5 credit | | Anthropic | Claude 3.5 Sonnet, Claude 3 Opus | 200K | Fast | Limited | | Google AI | Gemini 1.5 Pro, Gemini 1.5 Flash | 2M | Medium | Generous | | Groq | Llama 3.1 70B, Mixtral 8x7B | 131K | Ultra-fast | 14.4K req/day | | Together AI | Llama 3 70B, CodeLlama 34B | 8K-16K | Fast | $25 credit |
🎯 Quick Setup Recommendations
- Privacy-First: Use Local mode with Ollama
- Speed-First: Use Groq (500+ tokens/second, generous free tier)
- Quality-First: Use GPT-4o or Claude 3.5 Sonnet
- Budget-First: Use Hybrid mode (local + Groq fallback)
- Enterprise: Use Local mode with larger models (13B/34B)
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ VS Code │ │ FastAPI │ │ Qdrant │
│ Extension │◄──►│ Backend │◄──►│ Vector DB │
│ (TypeScript) │ │ (Python) │ │ (Self-hosted) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌────────▼────────┐
│ Ollama │
│ Local LLMs │
│ (Free Models) │
└─────────────────┘
🛠️ Tech Stack (100% Free & Open Source)
Core Components
- Frontend: VS Code Extension (TypeScript)
- Backend: Python FastAPI
- Database: Qdrant Vector Database (self-hosted)
- AI Models: Ollama (CodeLlama, DeepSeek-Coder, Qwen2.5-Coder)
- Code Parsing: Tree-sitter
- Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
Free Cloud Alternatives (Optional)
- Vector DB: Pinecone (free tier), Weaviate Cloud
- LLM APIs: Groq (free tier), Together AI, Google Gemini Pro
- Hosting: Railway, Render, Fly.io (all have free tiers)
📋 Prerequisites
Minimum Requirements
- RAM: 8GB (16GB recommended)
- CPU: 4-core (8-core recommended)
- Storage: 5GB free space (10GB recommended)
- OS: Windows 10+, macOS 10.15+, or Linux
Required Software
🚀 Quick Start (3 Steps)
1. Automated Setup
# Clone and setup everything automatically
git clone https://github.com/njrgourav11/Reflyx.git
cd Reflyx
python setup.py # Installs everything: Docker, models, dependencies
# Start all services
docker-compose up -d
2. Install VS Code Extension
# Build and package the extension
cd extension
npm install && npm run compile && npm run package
# Install the .vsix file in VS Code:
# 1. Open VS Code
# 2. Ctrl+Shift+P → "Extensions: Install from VSIX"
# 3. Select the generated .vsix file
# 4. Restart VS Code
3. Configure AI Providers
# Option A: Local Only (Complete Privacy)
# 1. Install Ollama: https://ollama.ai
# 2. Pull models: ollama pull codellama:7b-code
# 3. In VS Code: Ctrl+Shift+, → Set mode to "Local"
# Option B: Online + Local (Best Performance)
# 1. Get free API keys (see API Keys Guide)
# 2. In VS Code: Ctrl+Shift+, → Configure providers
# 3. Set mode to "Hybrid" for best of both worlds
🎯 First Steps After Installation
Immediate Setup (2 minutes)
- Open VS Code in your project directory
- Index your codebase:
Ctrl+Shift+P→ "AI Coding Assistant: Index Workspace" - Open settings:
Ctrl+Shift+,to configure AI providers - Start chatting:
Ctrl+Shift+Cto open the AI chat panel
Test Your Installation
# Quick health check
make health
# Or manually test each service:
curl http://localhost:8000/api/v1/health # Backend
curl http://localhost:6333/health # Vector DB
curl http://localhost:11434/api/tags # Ollama (if using local)
Try These Example Queries
- "Where is user authentication handled in this codebase?"
- "Show me all database connection functions"
- "Find error handling patterns"
- Select code → Right-click → "Explain Selection"
Ctrl+Shift+G→ "Create a REST API endpoint for user login"
Or start individually:
Backend server
cd server && python -m uvicorn app.main:app --reload
Qdrant vector database
docker run -p 6333:6333 qdrant/qdrant
Ollama (install from https://ollama.ai)
ollama pull codellama:7b-code
### 3. Install VS Code Extension
```bash
cd extension
npm install
npm run compile
# Install the .vsix package in VS Code
4. Configure Workspace
- Open VS Code in your project directory
- Run command:
AI Coding Assistant: Index Workspace - Start chatting with your codebase!
📖 Usage Guide
Basic Commands
- Ctrl+Shift+P →
AI Coding Assistant: Ask Codebase - Ctrl+Shift+P →
AI Coding Assistant: Explain Selection - Ctrl+Shift+P →
AI Coding Assistant: Generate Code - Ctrl+Shift+P →
AI Coding Assistant: Find Similar
Chat Interface
- Open the AI Assistant sidebar panel
- Type natural language queries about your code
- Get contextual responses with code references
Example Queries
"Where is user authentication handled?"
"Explain this function and its dependencies"
"Generate a REST API endpoint for user login"
"Find all database connection functions"
"Suggest refactoring for this class"
⚙️ Configuration
VS Code Settings
{
"aiCodingAssistant.modelProvider": "ollama",
"aiCodingAssistant.embeddingModel": "all-MiniLM-L6-v2",
"aiCodingAssistant.maxChunkSize": 500,
"aiCodingAssistant.retrievalCount": 10,
"aiCodingAssistant.ignorePatterns": [
"node_modules/**",
".git/**",
"*.min.js",
"dist/**"
]
}
Environment Variables
# Backend Configuration
QDRANT_URL=http://localhost:6333
OLLAMA_URL=http://localhost:11434
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Optional Cloud APIs (Free Tiers)
OPENAI_API_KEY=your_key_here
GROQ_API_KEY=your_key_here
🧪 Supported Languages
- Primary: TypeScript, JavaScript, Python, Java
- Additional: C++, Rust, Go, C#, PHP, Ruby
- Extensible: Easy to add new languages via Tree-sitter grammars
📊 Performance Metrics
Indexing Performance
- Speed: ~1000 files/minute (4-core CPU)
- Memory: <2GB during indexing
- Storage: ~100MB per 10K files
Query Performance
- Simple Queries: <3 seconds
- Code Generation: <8 seconds
- Context Window: Up to 32K tokens
🔧 Development
Project Structure
ai-coding-assistant/
├── extension/ # VS Code extension (TypeScript)
├── server/ # FastAPI backend (Python)
├── indexer/ # Code parsing & embedding
├── docker-compose.yml # Local development setup
└── docs/ # Documentation
Running in Development Mode
# Backend with hot reload
cd server && python -m uvicorn app.main:app --reload
# Extension development
cd extension && npm run watch
# Vector database
docker run -p 6333:6333 qdrant/qdrant
🐛 Troubleshooting
Common Issues
1. Ollama Connection Failed
# Check if Ollama is running
curl http://localhost:11434/api/tags
