SteamLensAI
Game analytics platform that converts Steam review data into actionable development insights, reducing feedback analysis time by 95% for game studios.
Install / Use
/learn @Matrix030/SteamLensAIREADME
╔═══════════════════════════════════════════════════════════════════════════════════════════╗
║ ███████╗████████╗███████╗ █████╗ ███╗ ███╗ ██╗ ███████╗███╗ ██╗███████╗ █████╗ ██╗ ║
║ ██╔════╝╚══██╔══╝██╔════╝██╔══██╗████╗ ████║██║ ██╔════╝████╗ ██║██╔════╝██╔══██╗██║ ║
║ ███████╗ ██║ █████╗ ███████║██╔████╔██║██║ █████╗ ██╔██╗ ██║███████╗███████║██║ ║
║ ╚════██║ ██║ ██╔══╝ ██╔══██║██║╚██╔╝██║██║ ██╔══╝ ██║╚██╗██║╚════██║██╔══██║██║ ║
║ ███████║ ██║ ███████╗██║ ██║██║ ╚═╝ ██║███████╗███████╗██║ ╚████║███████║██║ ██║██║ ║
║ ╚══════╝ ╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═╝ ╚═╝╚══════╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚═╝ ╚═╝╚═╝ ║
╚═══════════════════════════════════════════════════════════════════════════════════════════╝
All data has been uploaded on kaggle: https://www.kaggle.com/datasets/rishikeshgharat/steam-games-data-40-gb
steamLensAI Architecture Documentation
This document explains the technical architecture, distributed processing design, and engineering decisions behind steamLensAI's high-performance Steam review analysis system.
Architecture Overview
steamLensAI is built as a distributed processing pipeline that leverages parallel computing and GPU acceleration to analyze large volumes of Steam reviews efficiently. Topic Assignment using seed-values (Theme based categorization) and sentence-transformers
1.2M reviews in 2 minutes, 30 seconds
Summarization (Heirarchical Topic Based Summarization) based on the now categorized data from the previous step:
1.2M reviews in 8 minutes
The following are the Execution Time Metrics for the above mentioned data:

The core innovation lies in its distributed computing approach where multiple worker processes share a single GPU through intelligent model distribution, achieving maximum hardware utilization while maintaining processing efficiency.
┌──────────────────┐ ┌─────────────────────┐
│ Distributed │───▶│ Summarization │
│ Processing │ │ Pipeline │
│ (process_files) │ │ (summarization) │
└──────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ Dask LocalCluster │
│ │
│ Multiple Workers → Single GPU Sharing │
│ • Model Distribution via publish_dataset │
│ • Coordinated GPU Memory Management │
│ • Parallel Processing with Shared Models │
└─────────────────────────────────────────────┘
Core Components
0. Models Used
Sentence Transformer Model
- Model:
all-MiniLM-L6-v2 - Purpose: Converting review text to numerical embeddings for semantic similarity matching
- Task: Topic assignment and theme categorization
- Provider: Sentence Transformers library
Summarization Model
- Model:
sshleifer/distilbart-cnn-12-6 - Purpose: Generating concise summaries of positive and negative reviews
- Task: Hierarchical text summarization with sentiment separation
- Provider: Hugging Face Transformers (DistilBART variant)
Both models support GPU acceleration and are distributed across multiple workers using Dask's publish_dataset() mechanism for efficient parallel processing.
1. Distributed Processing Engine (processing/process_files.py)
- Role: Multi-worker data processing coordination
- Key Technologies: Dask + LocalCluster + Sentence Transformers
- Distributed Computing Features:
- Creates LocalCluster with multiple worker processes (Lighter than using mini-kubes)
- Distributes the transformer model () across workers using
publish_dataset() - Coordinates parallel processing of review chunks
- Manages shared GPU resources across workers
- Handles worker-to-worker communication and synchronization
2. Topic Assignment Module (processing/topic_assignment.py)
- Role: ML-powered review categorization on distributed workers
- Key Technologies: Sentence Transformers + Semantic Similarity
- Worker-Level Responsibilities:
- Retrieves published models from worker dataset storage
- Converts review text to numerical embeddings using shared GPU
- Performs semantic similarity matching against game themes
- Processes data chunks independently across multiple workers
3. Summarization Pipeline (processing/summarization.py + summarize_processor.py)
- Role: Distributed text summarization across workers
- Key Technologies: Transformers (DistilBART) + Multi-Worker GPU Processing
- Distributed Features:
- Sets up dedicated Dask cluster for summarization tasks
- Distributes summarization models to all workers via dataset publishing
- Coordinates hierarchical summarization across worker processes
- Manages GPU memory sharing for multiple model instances
- Aggregates results from parallel summarization workers
Data Flow Architecture
Phase 1: File Processing & Validation
Uploaded Files → Temporary Storage → App ID Extraction → Theme Validation
│ │ │ │
▼ ▼ ▼ ▼
[file1.parquet] [/tmp/uuid/] [extract_appid()] [themes.json]
[file2.parquet] ... [12345, 67890] [lookup]
[file3.parquet] ... [✓/✗]
Phase 2: Distributed Processing Architecture
┌─────────────────┐
│ Main Process │
│ │
│ 1. Load Files │──┐
│ 2. Combine Data │ │
│ 3. Create Chunks│ │
│ 4. Setup Dask │ │
└─────────────────┘ │
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ Dask LocalCluster │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ │ Worker 4 │ │
│ │ │ │ │ │ │ │ │ │
│ │ • Get Model │ │ • Get Model │ │ • Get Model │ │ • Get Model │ │
│ │ • Process │ │ • Process │ │ • Process │ │ • Process │ │
│ │ Chunk A │ │ Chunk B │ │ Chunk C │ │ Chunk D │ │
│ │ • Save │ │ • Save │ │ • Save │ │ • Save │ │
│ │ Results │ │ Results │ │ Results │ │ Results │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
└──────────────────────────────────┼────────────────────────────────┘
│
▼
┌─────────────┐
│ RTX 4080 │
│ 16GB GPU │
│ │
│ • 4 Model │
│ Copies │
│ • Shared │
│ Compute │
└─────────────┘
Phase 3: Result Aggregation
Temporary Files → Aggregation → Final Report → CSV Export
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌───────────┐ ┌──────────┐
│ /tmp/ │ │ Combine │ │ Structured│ │ Download │
│ • pos_revs/ │→│ • Count │→│ DataFrame │→│ CSV File │
│ • neg_revs/ │ │ • Merge │ │ • Metrics │ │ │
│ • agg_data/ │ │ • Calculate │ │ • Reviews │ │ │
└─────────────┘ └─────────────┘ └───────────┘ └──────────┘
Performance Optimizations
1. GPU Model Distribution Strategy
Challenge: Distribute ML models to multiple workers on single GPU
The Serialization Problem: Imagine you have a recipe book (ML model) and you try to tear out individual pages to give different pages to different chefs (workers). The problem is that recipes are interconnected - Chef A gets page 5 (which says "add the mixture from step 3"), but Chef B has page 3 (which explains what "the mixture" is). Neither chef can cook properly because they only have fragments of the complete recipe.
Technical Details:
- Dask.scatter() Fragmentation: Traditional scatter tries to break models into pieces and distribute fragments
- Model Interconnectedness: ML model layers reference each other, weights depend on other weights
- Incomplete Models Fail: Workers receiving fragmented models cannot perform inference
- Models Need Integrity: Each worker requires the complete, intact model to function
# This FAILS - Cannot serialize CUDA tensors
model = SentenceTransformer('model', device='cuda') # Model on GPU
client.scatter(model) # ERROR: Cannot serialize CUDA tensors!
Solution: CPU-Serialize-GPU pattern
# Step 1: "Translate" the recipe book to common language (move to CPU)
model.to('cpu') # Move model to CPU memory
for param in model.parameters():
param.data = param.data.cpu() # Ensure ALL components are on CPU
# Step 2: "Photocopy and distribute" (serialize and send to workers)
client.publish_dataset(model, name='model') # Successfully distribute CPU model
# Step 3: Each kitchen "translates back" (workers move to GPU)
# In each worker:
worker_model = get_dataset('model') # Get CPU copy
worker_model.to('cuda') # Move to shared GPU
# Result: 4 model instances on 1 GPU (efficient memory sharing)
Why This Works:
- CPU models are just arrays of numbers (easily serializable)
- Each worker gets its own independent copy of the model
- Workers can move their copies to GPU without conflicts
- Multiple model instances can coexist on the same GPU efficiently
2. Optimal Chunk Sizing
Previously handled inefficienty (pre commit 90) the whole datas
