Discogsography
🎶 Using the Discogs database export for local graph exploration. 🎶
Install / Use
/learn @SimplicityGuy/DiscogsographyREADME
🎵 Discogsography
<div align="center">A modern Python 3.13+ microservices platform for transforming the complete Discogs music database into powerful, queryable knowledge graphs and analytics engines.
🚀 Quick Start | 📖 Documentation | 🎯 Features | 💬 Community
</div>🎯 What is Discogsography?
Discogsography transforms monthly Discogs data dumps (~11.3GB compressed XML) into:
- 🔗 Neo4j Graph Database: Navigate complex music industry relationships
- 🐘 PostgreSQL Database: High-performance queries and full-text search
- 🔍 Interactive Explorer: Graph visualisation, trends, and path discovery
- 📊 Real-time Dashboard: Monitor system health and processing metrics
- 🎵 MusicBrainz Enrichment: Cross-reference with MusicBrainz for metadata, relationships, and external links
Perfect for music researchers, data scientists, developers, and music enthusiasts who want to explore the world's largest music database.
🏛️ Architecture Overview
⚙️ Core Services
| Service | Purpose | Key Technologies |
| ------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------------------------ |
| 🔐 API | User accounts, JWT auth, and collection sync | FastAPI, psycopg3, redis, Discogs OAuth 1.0 |
| 📊 Dashboard | Real-time monitoring and admin panel | FastAPI, WebSocket, reactive UI |
| 🔍 Explore | Serves graph exploration frontend (static files) | FastAPI, Tailwind CSS, Alpine.js, D3.js, Plotly.js |
| ⚡ Extractor | High-performance Rust-based extractor | tokio, quick-xml, lapin |
| 🔗 Graphinator | Builds Neo4j knowledge graphs | neo4j-driver, graph algorithms |
| 🔧 Schema-Init | One-shot database schema initializer | neo4j-driver, psycopg3 |
| 🐘 Tableinator | Creates PostgreSQL analytics tables | psycopg3, JSONB, full-text search |
| 📈 Insights | Precomputed analytics and music trends | FastAPI, psycopg3, httpx |
| 🤖 MCP Server | Exposes knowledge graph to AI assistants | FastMCP, httpx |
🎵 MusicBrainz Enrichment Services
| Service | Purpose | Key Technologies |
| -------------------------------------------------------------------- | ---------------------------------------------------------- | ----------------------------------- |
| 🧠 Brainzgraphinator | Enriches Neo4j graph with MusicBrainz metadata and relationships | neo4j-driver, pika |
| 🧬 Brainztableinator | Populates PostgreSQL with MusicBrainz data and external links | psycopg3, pika |
📐 System Architecture
graph TD
S3[("🌐 Discogs S3<br/>Data Dumps")]
MB[("🎵 MusicBrainz<br/>JSONL Dumps")]
subgraph Pipeline ["Data Pipeline"]
EXT[["⚡ Extractor"]]
RMQ{{"🐰 RabbitMQ"}}
GRAPH[["🔗 Graphinator"]]
TABLE[["🐘 Tableinator"]]
end
subgraph MBPipeline ["MusicBrainz Enrichment"]
BGRAPH[["🧠 Brainzgraphinator"]]
BTABLE[["🧬 Brainztableinator"]]
end
subgraph Storage ["Storage"]
NEO4J[("🔗 Neo4j")]
PG[("🐘 PostgreSQL")]
REDIS[("🔴 Redis")]
end
subgraph Services ["User-Facing Services"]
API[["🔐 API"]]
EXPLORE[["🔍 Explore"]]
DASH[["📊 Dashboard"]]
INSIGHTS[["📈 Insights"]]
end
S3 --> EXT --> RMQ
MB --> EXT
RMQ --> GRAPH --> NEO4J
RMQ --> TABLE --> PG
RMQ --> BGRAPH --> NEO4J
RMQ --> BTABLE --> PG
API --- NEO4J & PG & REDIS
EXPLORE --- API
INSIGHTS --- PG & REDIS
DASH -.- RMQ & NEO4J & PG
style S3 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style MB fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style EXT fill:#ffccbc,stroke:#d84315,stroke-width:2px
style RMQ fill:#fff3e0,stroke:#e65100,stroke-width:2px
style NEO4J fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style PG fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
style REDIS fill:#ffebee,stroke:#b71c1c,stroke-width:2px
style GRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
style TABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style BGRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
style BTABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style API fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
style EXPLORE fill:#e8eaf6,stroke:#283593,stroke-width:2px
style DASH fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style INSIGHTS fill:#fff9c4,stroke:#f57f17,stroke-width:2px
See Architecture Overview for detailed diagrams covering data pipeline, service communication, and message queue structure.
🌟 Key Features
- ⚡ High-Speed Processing: ~130–480 records/second end-to-end throughput per data type with Rust-based extractor
- 🔄 Smart Deduplication: SHA256 hash-based change detection prevents reprocessing
- 📈 Handles Big Data: Processes 19M+ releases, 10M+ artists across ~11.3GB compressed XML
- 🔁 Auto-Recovery: Automatic retries with exponential backoff and dead letter queues
- 🐋 Container Security: Non-root users, read-only filesystems, dropped capabilities
- 📝 Type Safety: Full type hints with strict mypy validation and Bandit security scanning
- ✅ Comprehensive Testing: Unit, integration, and E2E tests with Playwright
- 🚀 Query Performance: 249x overall query performance optimization across 88 endpoints (PRs #175–#184), plus configurable data quality rules for extraction validation (#187) — see Recent Improvements
🚀 Quick Start
# Clone and start all services
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
docker-compose up -d
# Access the dashboard
open http://localhost:8003
| Service | URL | Default Credentials |
| ----------------- | ---------------------- | ----------------------------------- |
| 🔐 API | http://localhost:8004 | Register via /api/auth/register |
| 📊 Dashboard | http://localhost:8003 | None |
| 🔗 Neo4j | http://localhost:7474 | neo4j / discogsography |
| 🐘 PostgreSQL | localhost:5433 | discogsography / discogsography |
| 🐰 RabbitMQ | http://localhost:15672 | discogsography / discogsography |
See the Quick Start Guide for prerequisites, local development setup, and environment configuration.
