Scrapitor
Proxy and capture JanitorAI traffic to OpenRouter with automatic logging. Features rule-driven parser for character sheet extraction, versioned exports, web dashboard, and one-click Cloudflare tunnel deployment.
Install / Use
/learn @daksh-7/ScrapitorREADME
A local proxy that intercepts JanitorAI traffic, captures request payloads as JSON logs, and provides a rule-driven parser to extract clean character sheets. Exports to SillyTavern-compatible JSON format.
Table of Contents
- Quick Start
- Architecture
- Installation
- Configuring JanitorAI
- Web Dashboard
- Parser System
- CLI Usage
- API Reference
- Configuration
- Docker
- Troubleshooting
- Development
- Notes
Quick Start
Quick Start (Windows)
- Download: https://github.com/daksh-7/scrapitor → Code → Download ZIP → Unzip
- Double-click
run.bat - Copy the Cloudflare Proxy URL from the terminal
- In JanitorAI: Enable "Using proxy" → paste the URL → add your OpenRouter API key
- Send a message — your request appears in the dashboard Activity tab
Requirements: Python 3.10+ and PowerShell 7. The launcher auto-installs everything else.
Quick Start (Linux/macOS)
- Clone and run:
git clone https://github.com/daksh-7/scrapitor && cd scrapitor && ./run.sh
- Copy the Cloudflare Proxy URL from the terminal
- In JanitorAI: Enable "Using proxy" → paste the URL → add your OpenRouter API key
Requirements: Python 3.10+, curl, and bash. The launcher auto-installs cloudflared and Python dependencies.
Quick Start (Termux/Android)
-
Install Termux from F-Droid (Play Store version is outdated)
-
Install dependencies:
pkg update && pkg upgrade -y && pkg install python git curl cloudflared -y
- Clone and run:
git clone https://github.com/daksh-7/scrapitor && cd scrapitor && ./run.sh
- In another Termux session, run
termux-wake-lockto prevent Android from killing the process - Copy the Cloudflare Proxy URL and use it in JanitorAI
Requirements: Termux with python, curl, git, and cloudflared packages. ARM64 device required.
Architecture
graph LR
%% --- NODES & DATA ---
J([JanitorAI<br/>Browser Client])
S[scrapitor<br/>Flask Proxy]
OR(OpenRouter<br/>API)
subgraph Data_Processing [Data Processing & UI]
direction TB
L[(JSON Log<br/>Files)]
P[[Parser<br/>Engine]]
D(Dashboard<br/>Svelte 5)
E[/Parsed TXT /<br/>SillyTavern Export/]
end
%% --- CONNECTIONS ---
%% Bi-directional traffic flow
J <==>|HTTP Request<br/>& Response| S
S <==>|Forward &<br/>Inference| OR
%% Internal Data flow
S -.->|Live State| D
S -- Capture<br/>Completion --> L
L -.->|Read| P
P -->|Generate| E
%% --- STYLING ---
classDef base fill:#fff,stroke:#333,stroke-width:1px,color:#333;
classDef client fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1;
classDef proxy fill:#e8eaf6,stroke:#3949ab,stroke-width:3px,color:#1a237e;
classDef external fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,stroke-dasharray: 5 5,color:#4a148c;
classDef storage fill:#e0f2f1,stroke:#00695c,stroke-width:2px,color:#004d40;
classDef ui fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#880e4f;
%% Apply Styles
class J client;
class S proxy;
class OR external;
class L,P,E storage;
class D ui;
%% Style Subgraph
style Data_Processing fill:#ffffff,stroke:#e0e0e0,stroke-width:2px,stroke-dasharray: 5 5,color:#9e9e9e
Data Flow:
- JanitorAI sends chat requests to the scrapitor proxy (via Cloudflare tunnel)
- scrapitor logs the full request payload as JSON, then forwards to OpenRouter
- The parser extracts character data using tag-aware rules
- Parsed content is saved as versioned
.txtfiles or exported to SillyTavern JSON
Installation
Windows (Recommended)
Prerequisites:
- Python 3.10+ (Download — check "Add python.exe to PATH")
- PowerShell 7:
winget install Microsoft.PowerShell
Option A: Download ZIP from GitHub → Code → Download ZIP → Unzip
Option B: Clone with Git:
git clone https://github.com/daksh-7/scrapitor && cd scrapitor
Then: Double-click run.bat
The launcher will:
- Create a virtual environment and install dependencies
- Build the Svelte frontend (if Node.js is available and sources changed)
- Start Flask on port 5000
- Establish a Cloudflare tunnel and display the public URL
- Show live status (press Q to quit)
███████╗ ██████╗██████╗ █████╗ ██████╗ ██╗████████╗ ██████╗ ██████╗
██╔════╝██╔════╝██╔══██╗██╔══██╗██╔══██╗██║╚══██╔══╝██╔═══██╗██╔══██╗
███████╗██║ ██████╔╝███████║██████╔╝██║ ██║ ██║ ██║██████╔╝
╚════██║██║ ██╔══██╗██╔══██║██╔═══╝ ██║ ██║ ██║ ██║██╔══██╗
███████║╚██████╗██║ ██║██║ ██║██║ ██║ ██║ ╚██████╔╝██║ ██║
╚══════╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝
[✓] Python 3.14.0 found
[✓] Dependencies up to date
[✓] Cloudflared ready
[✓] Flask healthy on :5000
[✓] Tunnel ready
┌────────────────────────────────────────────────────────────────┐
│ Dashboard: http://localhost:5000 │
│ LAN: http://192.168.0.101:5000 │
│ Proxy URL: https://example.trycloudflare.com/openrouter-cc │
└────────────────────────────────────────────────────────────────┘
macOS / Linux
Prerequisites:
- Python 3.10+ (most systems have this pre-installed)
- Bash 3.0+ (macOS ships with 3.2, Linux typically has 4.0+)
- curl (for cloudflared download)
Supported Architectures: | Platform | Architecture | Notes | |----------|--------------|-------| | macOS | Apple Silicon (M1/M2/M3/M4) | arm64 binary auto-downloaded | | macOS | Intel | amd64 binary auto-downloaded | | Linux | x86_64/amd64 | Standard servers and desktops | | Linux | aarch64/arm64 | ARM servers, Raspberry Pi 4+ (64-bit) | | Linux | armv7l/armhf | Raspberry Pi 3 and older (32-bit) |
Option A: Download ZIP from GitHub → Code → Download ZIP → Unzip
Option B: Clone with Git:
git clone https://github.com/daksh-7/scrapitor && cd scrapitor && ./run.sh
The launcher will:
- Create a virtual environment at
app/.venvand install dependencies - Auto-download cloudflared from GitHub releases (if not found in PATH)
- Build the Svelte frontend (if Node.js is available and sources changed)
- Start Flask on port 5000
- Establish a Cloudflare tunnel and display the public URL
- Show live status with uptime (press Q to quit gracefully)
macOS Notes:
- If you prefer Homebrew:
brew install cloudflared(then the launcher uses the system binary) - On Apple Silicon, Rosetta is not required — native arm64 binary is used
Manual Setup (alternative):
python3 -m venv app/.venv && source app/.venv/bin/activate && pip install -r app/requirements.txt && python -m app.server
In another terminal (optional):
cloudflared tunnel --no-autoupdate --url http://127.0.0.1:5000
Termux (Android)
Run scrapitor directly on your Android device using Termux.
Prerequisites:
- Install Termux from F-Droid (the Play Store version is outdated and will not work)
- ARM64 device required (most Android phones from 2017+ are ARM64)
- Grant storage permissions:
termux-setup-storage
Device Compatibility: | Architecture | Supported | Notes | |--------------|-----------|-------| | ARM64 (aarch64) | Yes | Most modern Android phones and tablets | | ARM32 (armv7l) | No | Older devices; cloudflared binary not available | | x86/x86_64 | Untested | Some Android emulators and Chromebooks |
Install:
pkg update && pkg upgrade -y && pkg install python git curl cloudflared -y
git clone https://github.com/daksh-7/scrapitor && cd scrapitor && ./run.sh
The launcher will:
- Create a virtual environment at
app/.venvand install dependencies - Detect Termux environment and show helpful tips
- Start Flask on port 5000
- Establish a Cloudflare tunnel and display the public URL
- Show live status with uptime (press Q to quit gracefully)
Preventing Android from Killing Termux:
Android aggressively kills background apps to save battery. To keep scrapitor running:
termux-wake-lock # Option 1 (recommended): Run in separate session
pkg install termux-services # Option 2: Install termux-services
# Option 3: Disable battery optimization for Termux in Android settings
Optional packages:
pkg install nodejs # For frontend building (~200MB)
pkg install net-tools iproute2 # For better LAN IP detection
Tips for Termux:
- Access the dashboard from your device's browser at
http://localhost:5000 - Use a split-screen or floating window to keep Termux visible
- The LAN URL (e.g.,
http://192.168.x.x:5000) works for other devices on your WiFi - If you see "Termux killed in background," the wa
