Theblop
The Big List of Protests - An AI-assisted Protest Flyer parser and event aggregator
Install / Use
/learn @sayhiben/TheblopREADME
Event Flyer Parser
Note: This is mostly a README I've generated by guiding AI to examine the project. My intent is to create enough docs here to be able to redeploy from scratch if I don't come back to this in a while.
I doubt anyone other than myself will actually use this. It's intended to be "good enough to get started" - please ping me in some manner if you actually intend to deploy this, and I will help you get it running.
Good luck. Godspeed.
- Ben, Feb. 2025
Project Overview
Event Flyer Parser is an automation tool that extracts structured event details from images (flyers) and logs them into Google Sheets. It monitors a Gmail inbox for incoming emails with event flyer attachments, uses an AI model to parse each flyer’s content, and outputs key information like event title, date, time, location, etc., as a JSON entry in a spreadsheet (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). The project integrates Google Apps Script (for email handling and Google Sheets interaction) with a serverless GPU service on Runpod (for AI-powered image OCR and text extraction). This allows for end-to-end processing: from receiving an email to having a new row in a Google Sheet with all the event details. By automating flyer data entry, the project saves time and reduces errors in compiling event information.
Architecture & Workflow
Components:
- Gmail & Google Apps Script: The solution uses a Google Apps Script to automatically retrieve emails from Gmail. The script (
googleAppsScript.gs) runs within Google’s environment and has access to Gmail, Google Drive, and Google Sheets APIs. It searches for unread emails in a specified Gmail inbox and processes those containing image attachments (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). - Google Drive: A Google Drive folder is used to store flyer images from the emails. When an email with image attachments is processed, each image is saved to the designated Drive folder (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). The Drive folder ID is stored in the script configuration so the script knows where to save and retrieve images.
- Google Sheets: Two Google Spreadsheets are used:
- Inbox Spreadsheet – holds a RawData sheet (for unprocessed email data) and a Jobs sheet (to track AI processing jobs) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub).
- Processed Spreadsheet – holds a Processed sheet where final extracted event details are stored after successful processing (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub).
- Runpod API (AI Service): The heavy lifting of image analysis is done by a Python application (
app.py) deployed on Runpod (a serverless GPU cloud). This application accepts requests via the Runpod API, downloads the images from Google Drive, runs a Transformer-based OCR/analysis model on them, and returns extracted text and event fields. The Runpod service provides an asynchronous endpoint URL for submitting jobs and checking their status. - AI Model & Prompt: The Python app uses a multimodal AI model (loaded via Hugging Face Transformers) capable of processing images with text. The model (by default
openbmb/MiniCPM-o-2_6) is configured with vision support to perform OCR and interpret the flyer content (event-flyer-parser/app.py at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/app.py at main · sayhiben/event-flyer-parser · GitHub). A custom system prompt (seeprompt.txt) instructs the model to output event details in a JSON format with specific fields (event-flyer-parser/prompt.txt at main · sayhiben/event-flyer-parser · GitHub). Additionally, few-shot examples (sample images and expected outputs inexamples/) are provided to guide the model (event-flyer-parser/app.py at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/app.py at main · sayhiben/event-flyer-parser · GitHub), improving accuracy.
Workflow:
-
Email Intake (Google Apps Script): A time-driven trigger invokes
processEmails()periodically (e.g., every 5 minutes). This function searches for any unread emails in the inbox (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). For each unread email, it generates a unique UUID for tracking (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub), then saves any image attachments to the specified Google Drive folder (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). The script collects the email’s date, subject, body text, any URLs found in the body, and the Drive URLs/IDs of saved images. It appends a new row to the RawData sheet with these details and a “processed” flag set to"false"(indicating this email’s flyer still needs processing) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). After logging the data, the email is marked as read and the email thread is moved to trash to prevent duplicate processing in the future (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). -
Job Creation (Google Apps Script -> Runpod): Another trigger invokes
launchRunPodJobs()(ideally shortly afterprocessEmails()runs). This function scans the RawData sheet for any entries withProcessed = false(event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). Each such row represents a flyer that has not been parsed yet. The script compiles all these pending entries into a submissions list (each submission has the UUID assubmissionIdand an array of image file IDs) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub) (event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub). If there are one or more submissions, the script calls the Runpod endpoint via HTTP POST, sending a JSON payload containing the submissions array ([event-flyer-parser/googleAppsScript.gs at main · sayhiben/event-flyer-parser · GitHub](https://githu
