FoundationModelsOCR

iOS demo app using Apple’s FoundationModels to extract data from scanned invoices. Combines Vision for image processing with LLM-powered field extraction. Runs fully on-device. Ideal for expense tracking, finance apps, or smart document parsing.

Generate Convert Improve

Install / Use

/learn @AviTsadok/FoundationModelsOCR

About this skill

Quality Score

0/100

README

🧾 Invoice Extraction Demo with Vision & Foundation Models

This is a lightweight iOS demo that shows how to extract structured data from an invoice image using the power of:

🔍 Apple's Vision framework for text recognition
🧠 Foundation Models for parsing structured data using on-device LLMs
✅ Safe and strongly typed output using @Generable and @Guide

📸 What It Does

This app demonstrates the end-to-end pipeline:

You provide or capture an image of an invoice
The app uses Vision to extract the printed text from the image
That raw text is sent into Apple’s on-device Foundation Model
The model returns structured data representing the invoice, using Swift types

📦 Output Model

The structured output is defined using the @Generable macro and @Guide descriptions to guide the LLM:

@Generable
struct InvoiceItem {
    var name: String
    var price: Decimal
    var quantity: Int
}

@Generable
struct MyInvoice {
    @Guide(description: "The name of the vendor")
    var vendor: String

    @Guide(description: "List of the invoice items")
    var items: [InvoiceItem]

    @Guide(description: "total invoice amount")
    var totalAmount: Decimal

    var toString: String {
        "Vendor: \(vendor)\n" +
        "Items:\n" +
        items.map(\.name).joined(separator: "\n") +
        "------\n" +
        "\nTotal: \(totalAmount)"
    }
}

Related Skills

beanquery-mcp

Beancount MCP Server is an experimental implementation that utilizes the Model Context Protocol (MCP) to enable AI assistants to query and analyze Beancount ledger files using Beancount Query Language (BQL) and the beanquery tool.

valuecell

9.8k

ValueCell is a community-driven, multi-agent platform for financial applications.

REFERENCE

An intelligent middleware layer between crypto wallets and traditional payment systems.

cashu-skill

A Cashu wallet skill for AI agents