FalconEYE

███████╗ █████╗ ██╗      ██████╗ ██████╗ ███╗   ██╗███████╗██╗   ██╗███████╗
██╔════╝██╔══██╗██║     ██╔════╝██╔═══██╗████╗  ██║██╔════╝╚██╗ ██╔╝██╔════╝
█████╗  ███████║██║     ██║     ██║   ██║██╔██╗ ██║█████╗   ╚████╔╝ █████╗
██╔══╝  ██╔══██║██║     ██║     ██║   ██║██║╚██╗██║██╔══╝    ╚██╔╝  ██╔══╝
██║     ██║  ██║███████╗╚██████╗╚██████╔╝██║ ╚████║███████╗   ██║   ███████╗
╚═╝     ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═════╝ ╚═╝  ╚═══╝╚══════╝   ╚═╝   ╚══════╝

Next-Generation Security Code Analysis Powered by Local LLMs

by hardw00t & h4ckologic

FalconEYE represents a paradigm shift in static code analysis. Instead of relying on predefined vulnerability patterns, it leverages large language models to reason about your code the same way a security expert would — understanding context, intent, and subtle security implications that traditional tools miss.

Why FalconEYE?
How It Works
Getting Started
MLX Backend (Apple Silicon)
Supported Languages
CLI Reference
Output Formats
Configuration
Architecture
Development
FAQ
Discoveries
License

Why FalconEYE?

Traditional security scanners are limited by their pattern databases. They can only find what they've been programmed to look for. FalconEYE is different:

No Pattern Matching — Uses pure AI reasoning to understand your code semantically
Context-Aware Analysis — Retrieval-Augmented Generation (RAG) provides relevant code context for deeper insights
Novel Vulnerability Detection — Identifies security issues that don't match known patterns
LLM-Powered Enrichment — Every finding gets AI-generated descriptions, mitigations, code snippets, and line numbers
Reduced False Positives — Optional AI validation pass filters noise from false alarms
Rich Reporting — Console, JSON, HTML, and SARIF output with interactive dashboards
Smart Re-indexing — Incremental analysis means re-scans only process changed files
Privacy-First — Runs entirely locally with Ollama or MLX — your code never leaves your machine
Apple Silicon Accelerated — Native MLX backend delivers 20-40% faster inference on M-series chips

How It Works

FalconEYE follows a multi-stage analysis pipeline:

┌─────────────────────────────────────────────────────────────────┐
│                     1. CODE INGESTION                          │
│  Scans repository -> Detects languages -> Parses AST structure │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                    2. INTELLIGENT INDEXING                      │
│  Chunks code semantically -> Generates embeddings -> Stores in │
│  vector database for fast semantic search                      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                   3. CONTEXT ASSEMBLY (RAG)                     │
│  For each code segment -> Retrieves similar code -> Gathers    │
│  relevant context from your entire codebase                    │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                    4. AI SECURITY ANALYSIS                      │
│  LLM analyzes code with context -> Reasons about vulns ->      │
│  Understands data flow -> Identifies security implications     │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                   5. LLM-POWERED ENRICHMENT                    │
│  Incomplete findings sent back to the LLM for detailed         │
│  reasoning, specific mitigations, code snippets & line numbers │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                     6. VALIDATION & REPORTING                  │
│  Optional AI validation pass -> Formats findings -> Outputs in │
│  Console/JSON/HTML/SARIF format with actionable remediation    │
└─────────────────────────────────────────────────────────────────┘

Semantic Understanding: FalconEYE reads your code like a security engineer, understanding business logic, data flows, and architectural patterns to identify real vulnerabilities.

RAG-Enhanced Analysis: By retrieving similar code patterns from your entire codebase, the AI gets crucial context about how functions are used, what data they handle, and potential security implications across your application.

LLM-Powered Enrichment: When the initial analysis produces incomplete findings (missing line numbers, generic descriptions, or vague mitigations), FalconEYE sends them back to the LLM with the full source code for enrichment. Every displayed finding includes specific line numbers, the vulnerable code snippet, a detailed description of the exploit, and actionable remediation referencing actual identifiers from your code.

Getting Started

Prerequisites

Python 3.12 (recommended) · 3.12+ minimum
Ollama running locally (Install Ollama)

Installation

# Pull required AI models
ollama pull qwen3-coder:30b
ollama pull embeddinggemma:300m

# Install FalconEYE
pip install -e .

Apple Silicon Installation (MLX)

For native Apple Silicon acceleration:

# Pull embedding model (still required -- MLX uses Ollama for embeddings)
ollama pull embeddinggemma:300m

# Install FalconEYE with MLX support
pip install -e ".[mlx]"

Note: MLX requires an Apple Silicon Mac (M1/M2/M3/M4). The MLX backend uses a hybrid approach — MLX handles LLM inference while Ollama handles embeddings. Ollama must still be running for the indexing step. The MLX analysis model is downloaded automatically from HuggingFace on first use.

Keeping Up to Date

Once installed from a git clone, you can upgrade FalconEYE to the latest version with a single command:

falconeye upgrade

This will:

git pull origin main — fetch and merge the latest changes
Reinstall dependencies from pyproject.toml
Show what changed (commits pulled) and the new version

Note: falconeye upgrade requires FalconEYE to be installed from a git clone (pip install -e .). If installed from a package archive, reinstall from the repository instead.

Your First Scan

# Full scan (index + review in one step)
falconeye scan /path/to/your/project

# Or run the steps separately:
falconeye index /path/to/your/project     # Index codebase (one-time)
falconeye review /path/to/your/project    # Analyze for vulnerabilities

# Scan with MLX backend on Apple Silicon
falconeye scan /path/to/your/project --backend mlx

# Generate an HTML report
falconeye scan /path/to/your/project --format html --output report.html

# Verbose mode -- see LLM streaming and full logs
falconeye scan /path/to/your/project -v

MLX Backend (Apple Silicon)

FalconEYE supports native Apple Silicon inference via MLX, Apple's machine learning framework optimized for the unified memory architecture and Neural Engine of M-series chips.

Performance Benefits

MLX delivers significant performance improvements over Ollama (which uses llama.cpp internally) on Apple Silicon hardware:

| Metric | Improvement | Details | |--------|-------------|---------| | Token Generation | 20-40% faster | MLX runs inference directly on the Apple GPU/Neural Engine. On MoE models like Qwen3-30B-A3B, benchmarks show 17-43% higher tok/s vs llama.cpp. Smaller models see up to 87% gains. | | Memory Usage | ~30% lower RAM | Zero-copy unified memory eliminates data duplication between CPU and GPU. Lazy evaluation fuses operations and reduces allocation overhead. | | First-Token Latency | ~50% lower | In-process inference removes the HTTP round-trip overhead of Ollama's REST API on localhost:11434. | | Prompt Processing | ~25% faster prefill | Native Metal compute path vs llama.cpp's Metal abstraction layer. | | Model Availability | Broader selection | Access thousands of quantized models from HuggingFace's mlx-community, compared to Ollama's curated library. |

Figures based on published benchmarks of MLX vs llama.cpp on M-series chips (Barrios et al., arXiv:2601.19139; arXiv:2511.05502; Google Cloud Community Gemma 3 benchmarks). Actual results vary by model size, quantization level, and hardware generation.

When to use MLX:

You have an Apple Silicon Mac (M1/M2/M3/M4)
You want faster scan times and lower memory consumption
You want access to the latest quantized models from HuggingFace

When to stick with Ollama:

Cross-platform deployment (Linux, Intel Mac, Windows via WSL)
You prefer Ollama's curated model library and CLI tooling
Running on non-Apple hardware

Hybrid Architecture

The MLX backend uses a hybrid approach because MLX does not natively support embedding models:

                 ┌──────────────────────────┐
                 │      FalconEYE CLI       │
                 └────────────┬─────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
    ┌─────────▼──────────┐        ┌──────────▼──────────┐
    │   MLX (Analysis)   │        │  Ollama (Embeddings) │
    │                    │        │                      │
    │  Qwen3-Coder-30B  │

FalconEYE

Install / Use

README

FalconEYE

Table of Contents

Why FalconEYE?

How It Works

Getting Started

Prerequisites

Installation

Apple Silicon Installation (MLX)

Keeping Up to Date

Your First Scan

MLX Backend (Apple Silicon)

Performance Benefits

Hybrid Architecture