ArkhamMirror
Local-first AI-powered document intelligence platform for investigative journalism
Install / Use
/learn @mantisfury/ArkhamMirrorREADME
SHATTERED
<div align="center">
A modular, local-first platform for document analysis and investigative research
Philosophy | Architecture | Features | Quick Start | Security | Production | Shards | Documentation
</div> <details> <summary><strong>Screenshots</strong> (click to expand)</summary> <br>Dashboard & LLM Configuration
Configure local or cloud LLM providers with one-click switching between LM Studio, Ollama, OpenAI, Groq, and custom endpoints.

ACH Analysis with AI Assistant
Full Analysis of Competing Hypotheses implementation with AI-powered observations, recommendations, and devil's advocate mode.

Graph Visualization
Interactive network analysis with 10+ visualization modes including force-directed layouts and geospatial mapping.
| Force-Directed | Geospatial |
|----------------|------------|
|
|
|
Timeline Analysis
Temporal event extraction with AI-powered analysis, conflict detection, and phase management.

Pattern Detection
Automated pattern recognition across documents with statistical analysis and AI interpretation.

Credibility Assessment
Source reliability scoring with deception detection checklists (MOM, POP, MOSES, EVE).

Search with Regex Presets
Hybrid semantic/keyword search with built-in regex patterns for PII, financial data, and technical indicators.

Media Forensics
Image authenticity analysis with EXIF extraction, Error Level Analysis (ELA), perceptual hashing, and reverse image search integration.

Philosophy
SHATTERED isn't a product - it's a platform. The shards are the products. Or rather, bundles of shards configured for specific use cases.
Core Principles:
- Build domain-agnostic infrastructure that supports domain-specific applications
- Lower the bar for contribution so non-coders can build custom shards
- Provide utility to people in need, not just those who can pay
- Local-first: Your data never leaves your machine unless you want it to
- Privacy-preserving: No telemetry, no cloud dependencies, full data sovereignty
The Meta-Pattern
Every investigative workflow follows the same fundamental pattern:
INGEST --> EXTRACT --> ORGANIZE --> ANALYZE --> ACT
| | | | |
| | | | +-- Export, Generate, Notify
| | | +-- ACH, Contradictions, Patterns, Anomalies
| | +-- Timeline, Graph, Matrix, Provenance
| +-- Entities, Claims, Events, Relationships
+-- Documents, Data, Communications, Records
- Core shards handle INGEST and EXTRACT
- Domain shards handle ORGANIZE and ANALYZE
- Output shards handle ACT
Architecture
SHATTERED uses the Voltron architectural philosophy: a modular, plug-and-play system where self-contained shards combine into a unified application.
+------------------+
| ArkhamFrame | <-- THE FRAME (immutable core)
| (17 Services) |
+--------+---------+
|
+--------+---------+
| arkham-shell | <-- THE SHELL (UI renderer)
| (React/TypeScript)|
+--------+---------+
|
+--------------------+--------------------+
| | | | |
+----v----+ +--v--+ +-----v-----+ +--v--+ +---v---+
|Dashboard| | ACH | | Search | |Graph| |Timeline| <-- SHARDS (26)
+---------+ +-----+ +-----------+ +-----+ +--------+
Core Design Principles
- Frame is Immutable: Shards depend on the Frame, never the reverse
- No Shard Dependencies: Shards communicate via events, not imports
- Schema Isolation: Each shard gets its own PostgreSQL schema
- Graceful Degradation: Works with or without AI/GPU capabilities
- Event-Driven Architecture: Loose coupling through pub/sub messaging
Features
AI-Powered Analysis
| Feature | Description | |---------|-------------| | AI Junior Analyst | LLM-powered analysis across all shards - anomaly detection, contradiction finding, pattern recognition, credibility assessment, and insight synthesis | | LLM Summarization | Automatic document and corpus summarization with multiple formats (brief, standard, detailed, executive, key points) | | Deception Detection | AI-assisted credibility assessment using MOM, POP, MOSES, and EVE checklists | | Query Expansion | Semantic search enhancement via LLM | | Devil's Advocate | AI-generated counter-arguments for ACH analysis |
Structured Analytic Techniques
| Technique | Capabilities | |-----------|-------------| | ACH (Analysis of Competing Hypotheses) | Full matrix analysis, evidence scoring, premortem analysis, cone of plausibility, corpus search integration, scenario planning, devil's advocate mode | | Contradiction Detection | Automated identification of conflicting claims across documents with severity scoring and resolution tracking | | Pattern Recognition | Recurring patterns, behavioral patterns, temporal patterns, correlation analysis with statistical significance | | Anomaly Detection | Statistical anomalies, contextual anomalies, collective anomalies with LLM-powered analysis | | Credibility Assessment | Source reliability scoring, bias indicators, deception detection checklists | | Provenance Tracking | Evidence chains, data lineage, audit trails, artifact verification |
Advanced Visualization
Graph Analysis - 10+ visualization modes:
| Mode | Description | |------|-------------| | Force-Directed | Interactive network layout with physics simulation | | Hierarchical | Tree-based layouts (top-down, bottom-up, radial) | | Circular | Entities arranged in circular patterns | | Sankey | Flow diagrams showing relationships and quantities | | Matrix | Adjacency matrix for dense relationship analysis | | Geographic | Map overlays with Leaflet integration | | Causal | Cause-and-effect relationship visualization | | Argumentation | ACH integration showing evidence-hypothesis relationships | | Link Analysis | i2 Analyst Notebook-style investigation graphs | | Temporal | Time-based graph evolution |
Graph Analytics:
- Centrality measures (degree, betweenness, closeness, eigenvector, PageRank)
- Community detection algorithms
- Path finding (shortest path, all paths, critical paths)
- Cycle detection
- Component analysis
Timeline Analysis:
- Temporal event extraction and visualization
- Date normalization across formats
- Conflict detection for overlapping events
- Phase/period management
- Gap analysis
- Event clustering
Document Processing Pipeline
| Stage | Capabilities | |-------|-------------| | Ingest | Multi-format support (PDF, DOCX, images, HTML, TXT), batch processing, duplicate detection, job queue management | | OCR | PaddleOCR for standard OCR, Vision LLM for complex documents (supports local Qwen-VL or cloud APIs like GPT-4o), language detection, confidence scoring | | Parse | 8 chunking strategies, metadata extraction, relations extraction, table detection | | Embed | Multiple embedding models, batch processing, incremental updates | | Entity Extraction | spaCy-powered NER (PERSON, ORG, GPE, DATE, etc.), relationship detection, duplicate merging | | Claim Extraction | Factual claim identification, source attribution, verification status tracking |
Search Capabilities
| Type | Description | |------|-------------| | Semantic Search | Vector similarity using pgvector embeddings | | Keyword Search | PostgreSQL full-text search with BM25 ranking | | Hybrid Search | Combined semantic + keyword with configurable weights | | Similarity Search | Find documents similar to a reference document | | Faceted Search | Filter by project, document type, date range, entities |
Export & Reporting
| Feature | Formats | |---------|---------| | Data Export | JSON, CSV, PDF, DOCX | | Analytical Reports | Investigation summaries, entity profiles, timeline reports, ACH reports | | Letters | FOIA requests, complaints, legal correspondence with templates | | Packets | Complete investigation bundles with versioning and sharing | | Templates | Jinja2-based template system with placeholder validation |
Frame Services
The Frame provides 17 core services available to all shards:
| Service | Description | |---------|-------------| | ConfigService | Environment + YAML configuration management | | ResourceService | Hardware detection, GPU/CPU management, tier assignment | | StorageService | File/blob storage with categories and lifecycle | | DatabaseService | PostgreSQL with per-shard schema isolation | | VectorService | pgvector-based vector storage for em
