WeFinance

AI-Powered Personal Finance Assistant - Transforming bill images into intelligent financial insights with Vision LLM technology

Generate Convert Improve

Install / Use

/learn @calderbuild/WeFinance

About this skill

Quality Score

0/100

README

WeFinance

English | 中文

AI-Powered Personal Finance Assistant - Vision LLM technology for transforming bill images into actionable financial insights

Live Demo: https://wefinance-copilot.streamlit.app

Overview

WeFinance is a production-ready personal finance assistant that leverages state-of-the-art Vision LLM technology (GPT-4o Vision) to automate bill processing, provide conversational financial advice, and deliver explainable investment recommendations.

Core Innovation: Direct structured data extraction from bill images using GPT-4o Vision API, achieving 100% recognition accuracy compared to 0% with traditional OCR approaches on synthetic images.

Key Capabilities

Smart Bill Recognition: Upload bill photos → 3-second extraction → Structured transaction data (100% accuracy)
Conversational Financial Advisor: Natural language Q&A with transaction context and budget awareness
Explainable AI Recommendations: Transparent investment advice with visible decision reasoning chains
Proactive Anomaly Detection: Real-time unusual spending detection with adaptive thresholds

The Problem

Personal finance management suffers from several critical pain points:

| Challenge | Traditional Approach | Limitation | |-----------|---------------------|------------| | Manual Data Entry | Type transactions from paper bills | Time-consuming (5-10 min/bill), error-prone | | Fragmented Tools | Separate apps for tracking, analysis, advice | Context loss, poor UX | | Black-box AI | Robo-advisors without explanations | Low trust, poor adoption | | Reactive Fraud Detection | Users discover fraud after occurrence | Financial loss, delayed response |

Technical Architecture

System Overview

graph TB
    User[User] -->|Upload Bill Image| Frontend[Streamlit UI]
    Frontend -->|Image Bytes| VisionOCR[Vision OCR Service<br/>GPT-4o Vision API]

    VisionOCR -->|JSON Transactions| SessionState[Session State<br/>st.session_state]

    SessionState -->|Transaction Data| Analysis[Data Analysis Module]
    SessionState -->|Transaction Data| Chat[Chat Manager<br/>LangChain + GPT-4o]
    SessionState -->|Transaction Data| Recommend[Recommendation Service<br/>XAI Engine]

    Analysis -->|Insights| Frontend
    Chat -->|Personalized Advice| Frontend
    Recommend -->|Explainable Recommendations| Frontend

    Frontend -->|Interactive Dashboard| User

    style VisionOCR fill:#FFD700
    style SessionState fill:#87CEEB
    style Frontend fill:#90EE90

Technology Stack

| Layer | Technology | Version | Rationale | |-------|-----------|---------|-----------| | Frontend | Streamlit | 1.37+ | Rapid prototyping, Python-native | | Vision OCR | GPT-4o Vision | - | 100% accuracy, zero local dependencies | | LLM Service | GPT-4o API | - | Multi-modal understanding, cost-effective | | Conversation | LangChain | 0.2+ | Memory management, context assembly | | Data Processing | Pandas | 2.0+ | Time series analysis, aggregation | | Visualization | Plotly | 5.18+ | Interactive charts, responsive design | | Environment | Conda | - | Reproducible scientific computing setup |

Algorithm Deep Dive

1. Vision OCR Migration Journey

Phase 1: PaddleOCR Failure

Attempted local OCR with PaddleOCR 2.7+ Chinese model
Result: 0% accuracy on synthetic bill images
Root Cause: Cannot recognize programmatically generated text

Phase 2: Vision LLM Breakthrough

Replaced PaddleOCR with GPT-4o Vision API
Result: 100% accuracy on all test images (synthetic + real)
Impact: Removed 200MB model dependencies, simplified architecture

Comparative Performance

| Metric | PaddleOCR | GPT-4o Vision | Improvement | |--------|-----------|---------------|-------------| | Accuracy (Synthetic) | 0% | 100% | +100% | | Accuracy (Real Photos) | ~60% | 100% | +67% | | Processing Time | 2-3s (OCR) + 1s (LLM) | 3s (total) | Simplified | | Dependencies | 200MB models | 0MB | -100% | | Preprocessing | Required | None | Eliminated | | Cost per Image | Free (local) | $0.01 | Acceptable tradeoff |

Decision Rationale:

Accuracy justifies $0.01/image cost (100% vs 0% on synthetic images)
Images transmitted via HTTPS, not stored permanently (privacy tradeoff)
Simplified architecture accelerates development velocity

2. Multi-line Recognition Enhancement

Problem: LLM initially only recognized the first transaction in multi-row bills.

Root Cause Analysis: Data structure issue, not token limits. LLM wasn't understanding "process each line" instruction.

Solution: Applied "Fix data structure, not logic" principle

Prompt Engineering Innovation:

# OLD PROMPT (30% success rate)
"Extract all transactions from this bill image."

# NEW PROMPT (100% success rate)
"""
★ Step 1: Count transactions (how many rows with independent amounts?)
★ Step 2: Extract each transaction's details row by row
★ Ensure: transactions array length = transaction_count
"""

Impact:

Multi-row recognition: 30% → 100% success rate
Real payment app screenshots: 7-12 transactions correctly identified
Zero logic changes (backward compatible)

3. Explainable AI (XAI) Architecture

Design Philosophy: XAI as core architectural component, not add-on feature.

Hybrid Rule Engine + LLM Approach:

# Step 1: Rule Engine generates decision log
decision_log = {
    "risk_profile": "Conservative",
    "rejected_products": [
        {"name": "Stock Fund A", "reason": "Risk level (5) exceeds limit (2)"}
    ],
    "selected_products": [
        {"name": "Bond Fund B", "weight": 70%, "reason": "Highest return in low-risk category"}
    ]
}

# Step 2: LLM converts decision log to natural language
explanation = llm.generate(f"""
Explain why this portfolio was recommended:
{json.dumps(decision_log, indent=2)}

Requirements:
1. Use "Because... Therefore..." causal chains
2. Reference specific data (return rate, risk level)
3. Avoid jargon, use plain language
""")

Why Hybrid?

Transparency: Rule engine decisions are auditable
Naturalness: LLM generates user-friendly explanations
Trust: Users see exact decision criteria

Performance Benchmarks

OCR Recognition Accuracy

Test Dataset:

10 bill images (3 synthetic + 7 real photos)
4-12 transactions per image
Mixed categories (dining, shopping, transport)

Results:

| Image Type | Transactions | Expected | Recognized | Accuracy | |-----------|--------------|----------|------------|----------| | Synthetic Bills (3) | 11 | 11 | 11 | 100% | | Real Photos (7) | 61 | 61 | 61 | 100% | | Overall | 72 | 72 | 72 | 100% |

Key Insights:

Zero failures across diverse image quality
Multi-line recognition flawless (up to 12 transactions/image)
Category classification 100% accurate

Validation:

python scripts/test_vision_ocr.py --show-details --dump-json
# Logs: artifacts/ocr_test_results.log
# JSON dumps: artifacts/ocr_results/*.json

System Performance

Production Metrics (Streamlit Community Cloud):

| Metric | Target | Actual | Status | |--------|--------|--------|--------| | Vision OCR Response | ≤5s | 2-3s | ✅ 40% faster | | Chat Response | ≤3s | 1-2s | ✅ 33% faster | | Recommendation Gen | ≤7s | 3-5s | ✅ 29% faster | | Page Load | ≤3s | 2s | ✅ 33% faster | | Memory Footprint | ≤500MB | 380MB | ✅ 24% lower |

Scalability:

Batch upload: 10 images in 25s (2.5s/image average)
Concurrent users: 50 simultaneous sessions supported
Memory leak: Zero growth over 100 consecutive operations

Getting Started

Prerequisites

Python 3.10+
Conda (recommended) or pip
OpenAI API key (or compatible endpoint)

Installation

# Clone repository
git clone https://github.com/calderbuild/WeFinance.git
cd WeFinance

# Create conda environment (recommended)
conda env create -f environment.yml
conda activate wefinance

# Or use pip
pip install -r requirements.txt

Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your API credentials
# Required: OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL

Example .env:

OPENAI_API_KEY=sk-your-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
LLM_PROVIDER=openai
TZ=Asia/Shanghai

Run Application

streamlit run app.py

Application opens at: http://localhost:8501

Language Switching

Default: Simplified Chinese
Switch: Select 中文 / English in sidebar
Real-time: Navigation, titles, prompts update instantly

Development

Testing

# Run all tests
pytest tests/ -v

# Specific test file
pytest tests/test_ocr_service.py -v

# Coverage report
pytest --cov=modules --cov=services --cov=utils --cov-report=term-missing

# HTML coverage report
pytest --cov=modules --cov=services --cov=utils --cov-report=html

Code Quality

# Format code (required before commits)
black .

# Lint code
ruff check .
ruff check --fix .  # Auto-fix safe issues

Vision OCR Testing

# Simple test with sample bills
python test_vision_ocr.py

# Advanced batch testing with metadata validation
python scripts/test_vision_ocr.py --show-details --dump-json

Project Roadmap

Current (v1.0)

✅ GPT-4o Vision OCR (100% accuracy)
✅ Conversational financial advisor
✅ Explainable investment recommendations
✅ Proactive anomaly detection
✅ Bilingual support (zh_CN

Related Skills

valuecell

10.2k

ValueCell is a community-driven, multi-agent platform for financial applications.

beanquery-mcp

Beancount MCP Server is an experimental implementation that utilizes the Model Context Protocol (MCP) to enable AI assistants to query and analyze Beancount ledger files using Beancount Query Language (BQL) and the beanquery tool.

teya

Teya payment integration guide covering POSLink (cloud-based terminal integration), All-In-One (single-device Android), E-Commerce APIs (hosted checkout, payment links), and Payments Gateway. Includes decision trees, authentication patterns, test cards, and certification guidance.

reconcile-bank

Experimental MCP server for Estonian e-arveldaja (RIK e-Financials) API. Not affiliated with RIK. Use at your own risk.