OpenCaptchaWorld
[NeurIPS 2025] The first web-based benchmark and platform to evaluate visual reasoning and interaction capabilities of MLLM powered agents through diverse and dynamic CAPTCHA puzzles.
Install / Use
/learn @MetaAgentX/OpenCaptchaWorldREADME
A comprehensive web-based platform for testing and benchmarking Multimodal LLM Web Agents on CAPTCHA-style puzzles. This project provides an environment to evaluate how artificial intelligence systems perform on a variety of visual puzzles resembling CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart).
Based on our research paper: "Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents". Below are some examples from our Open CaptchaWorld.
<div align="center"> <img src="./assets/captcha_example.png" alt="CAPTCHA Demo" width="800px"> </div>📰 News
- [2026-02-19] 🚀 We release NextGen-CAPTCHAs: A defense framework against MLLM-based web GUI agents, with an accompanying benchmark snapshot of 519 puzzles across 27 CAPTCHA families. This repository provides both the generative CAPTCHA system and tools for evaluating agent resistance. https://github.com/MetaAgentX/NextGen-CAPTCHAs
- [2025-10-20] ✅ We implement and upload the testing cli for browser-use framework, it is easy to use and you can test any MLLMs on OpenCapthaWorld by just swtiching the backbones. (See guidance below)
- [2025-09-27] ✅ We doubled the size of the captchas, you can download them here: https://huggingface.co/datasets/OpenCaptchaWorld/Open_CaptchaWorld
- [2025-09-18] ✅ Open CaptchaWorld has been accepted by NeurIPS 2025 Datasets and Benchmarks Track, many thanks to all the authors' contributions!!!
- [2025-07-28] ✅ The number of captchas has been doubled in Open CaptchaWorld Benchmark, there are a total of 463 modern captchas for agents now!!!
- [2025-05-29] ✅ We have released the first version of <span style="color:#00ffff; font-weight:bold;">Open CaptchaWorld</span> Benchmark and Dataset.
📋 Table of Contents
- 🌟 Overview
- 🎬 Demo
- 🎯 Motivation & Contributions
- ✨ Features
- 🏗 Project Structure
- 🧩 CAPTCHA Types
- 📊 Benchmark Results
- 🚀 Getting Started
- 📝 Usage
- 🗺️ Future Plan
- 👥 Contributing
- 📄 License
🌟 Overview
Open CaptchaWorld enables systematic evaluation of multimodal AI capabilities through CAPTCHA-style puzzles. It provides a controlled environment for testing how well LLM Web Agents can:
- Perceive and understand visual elements
- Extract relevant information from images
- Generate appropriate responses to visual puzzles
- Interact with web interfaces to solve tasks
The system includes a variety of CAPTCHA types ranging from basic (count dice) to complex (rotate objects to match reference direction), providing a comprehensive assessment of AI visual reasoning capabilities.
🎬 Demo
Watch these demonstration videos to see Open CaptchaWorld in action:
Demo : Human vs Agent Solving Demo
https://github.com/user-attachments/assets/c1f2edb1-ba9a-403d-9076-706014c0c750
🎯 Motivation & Contributions
Why We Built Open CaptchaWorld
Modern web interfaces increasingly rely on CAPTCHA systems to differentiate between human users and automated systems. This presents a significant challenge for LLM Web Agents attempting to navigate and interact with the real world:
-
Real-World Deployment Barrier: Web Agents frequently get stuck on websites that include CAPTCHA tests, significantly slowing down their deployment for everyday real-world usage. Without the ability to solve these challenges, LLM Web Agents cannot fully realize their potential as digital assistants.
-
Outdated Evaluation Methods: Many traditional CAPTCHAs can now be easily solved by specialized detection and classification models, making them poor benchmarks for evaluating the complete reasoning, visual understanding, and interaction capabilities of modern Web Agents.
Our Contributions
Open CaptchaWorld addresses these challenges through several key contributions:
-
Comprehensive CAPTCHA Collection: We have collected and implemented an extensive set of modern CAPTCHA types specifically designed to test the multi-modal reasoning capabilities required by Web Agents.
-
First Open-Source Benchmark: To our knowledge, this is the first open-sourced CAPTCHA benchmark and dataset specifically tailored for Web Agents, providing a standardized environment for researchers and developers.
-
Training Data Generation: Beyond evaluation, Open CaptchaWorld serves as a platform for generating high-quality training data that can improve Web Agents' ability to handle CAPTCHA challenges.
-
Real-World Simulation: Our platform closely emulates actual web interfaces, enabling more realistic testing of Web Agents' capabilities to navigate websites protected by CAPTCHA mechanisms.
By making Open CaptchaWorld available to the research community, we aim to accelerate progress in developing more capable, adaptable, and useful Web Agents that can seamlessly interact with today's web interfaces.
✨ Features
- 20 CAPTCHA Types: Diverse set of visual puzzles to test different capabilities
- Web Interface: Clean, intuitive interface for human or AI interaction
- API Endpoints: Programmatic access to puzzles and verification
- Benchmark Tracking: Automatic recording of performance metrics
- CLI Management: Tools for managing CAPTCHA puzzles and types
- Extensible Architecture: Easy addition of new puzzle types
🏗 Project Structure
Open CaptchaWorld/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── captcha_data/ # Directory containing CAPTCHA types and puzzles
│ ├── Dice_Count/
│ ├── Geometry_Click/
│ ├── Rotation_Match/
│ └── ... (17 more types)
├── static/ # Static assets
│ ├── css/
│ │ └── style.css # CSS styling
│ └── js/
│ └── script.js # Frontend JavaScript code
└── templates/ # HTML templates
└── index.html # Main application page
🧩 CAPTCHA Types
Open CaptchaWorld includes 20 distinct CAPTCHA types, each testing different visual reasoning capabilities:
- Dice_Count: Count and sum numbers on dice
- Geometry_Click: Click on a specific geometric shape
- Rotation_Match: Rotate an object to match a reference orientation
- Slide_Puzzle: Drag a component to a target position
- Unusual_Detection: Identify unusual items in a grid
- Image_Recognition: Select images matching a description
- Bingo: Swap positions to create a line of matching images
- Image_Matching: Match similar images
- Patch_Select: Select grid squares containing specific objects
- Dart_Count: Select an image where darts sum to a target number
- Object_Match: Match the number of objects to a reference
- Select_Animal: Identify a specific animal in a grid
- Coordinates: Move an object to specified coordinates
- Path_Finder: Navigate to a target position
- Place_Dot: Place a dot at a specific location
- Connect_icon: Connect matching icons
- Click_Order: Click items in a specific sequence
- Hold_Button: Hold a button for a specified duration
- Misleading_Click: Click in the correct area, avoiding distractions
- Pick_Area: Select a specific area in an image
Each type has its own directory in captcha_data/ containing puzzle images and a ground_truth.json file with solutions.
📊 Benchmark Results
The system records benchmark results in benchmark_results.json with each entry containing:
- Puzzle type
- Puzzle ID
- User's answer
- Correct answer
- Boolean indicating correctness
- Timestamp
This data can be used to analyze performance across different puzzle types and track improvement over time.
🚀 Getting Started
Prerequisites
- Python 3.10 or higher
Installation
-
Clone the repository:
git clone https://github.com/username/Open-CaptchaWorld.git cd Open-CaptchaWorld -
Create a virtual environment (optional but recommended):
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate -
Install dependencies:
uv pip install -r requirements.txt uv run playwright install-deps # In case your machine miss this playwright deps uv run playwright install -
You can just git clone, the data is already in captcha_data/, Or download the data from: https://huggingface.co/datasets/OpenCaptchaWorld/Open_CaptchaWorld, mkae them as captcha_data/ folder
Running the Application
Start the Flask application:
uv run app.py
The application will be available at: http://127.0.0.1:7860
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
