Dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
Install / Use
/learn @roniemartinez/DudeREADME
Archived!!! I can no longer maintain this repository.
<table> <tr> <td>License</td> <td><img src='https://img.shields.io/pypi/l/pydude.svg?style=for-the-badge' alt="License"></td> <td>Version</td> <td><img src='https://img.shields.io/pypi/v/pydude.svg?logo=pypi&style=for-the-badge' alt="Version"></td> </tr> <tr> <td>Github Actions</td> <td><img src='https://img.shields.io/github/actions/workflow/status/roniemartinez/dude/python.yml?branch=master&label=actions&logo=github%20actions&style=for-the-badge' alt="Github Actions"></td> <td>Coverage</td> <td><img src='https://img.shields.io/codecov/c/github/roniemartinez/dude/master?label=codecov&logo=codecov&style=for-the-badge' alt="CodeCov"></td> </tr> <tr> <td>Supported versions</td> <td><img src='https://img.shields.io/pypi/pyversions/pydude.svg?logo=python&style=for-the-badge' alt="Python Versions"></td> <td>Wheel</td> <td><img src='https://img.shields.io/pypi/wheel/pydude.svg?style=for-the-badge' alt="Wheel"></td> </tr> <tr> <td>Status</td> <td><img src='https://img.shields.io/pypi/status/pydude.svg?style=for-the-badge' alt="Status"></td> <td>Downloads</td> <td><img src='https://img.shields.io/pypi/dm/pydude.svg?style=for-the-badge' alt="Downloads"></td> </tr> <tr> <td>All Contributors</td> <td><a href="#contributors-"><img src='https://img.shields.io/github/all-contributors/roniemartinez/dude?style=for-the-badge' alt="All Contributors"></a></td> </tr> </table>dude uncomplicated data extraction
Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.
🚨 Dude is currently in Pre-Alpha. Please expect breaking changes.
Installation
To install, simply run the following from terminal.
pip install pydude
playwright install # Install playwright binaries for Chrome, Firefox and Webkit.
Minimal web scraper
The simplest web scraper will look like this:
from dude import select
@select(css="a")
def get_link(element):
return {"url": element.get_attribute("href")}
The example above will get all the hyperlink elements in a page and calls the handler function get_link() for each element.
How to run the scraper
You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
dude scrape --url "<url>" --output data.json path/to/script.py
The output in data.json should contain the actual URL and the metadata prepended with underscore.
[
{
"_page_number": 1,
"_page_url": "https://dude.ron.sh/",
"_group_id": 4502003824,
"_group_index": 0,
"_element_index": 0,
"url": "/url-1.html"
},
{
"_page_number": 1,
"_page_url": "https://dude.ron.sh/",
"_group_id": 4502003824,
"_group_index": 0,
"_element_index": 1,
"url": "/url-2.html"
},
{
"_page_number": 1,
"_page_url": "https://dude.ron.sh/",
"_group_id": 4502003824,
"_group_index": 0,
"_element_index": 2,
"url": "/url-3.html"
}
]
Changing the output to --output data.csv should result in the following CSV content.

Features
- Simple Flask-inspired design - build a scraper with decorators.
- Uses Playwright API - run your scraper in Chrome, Firefox and Webkit and leverage Playwright's powerful selector engine supporting CSS, XPath, text, regex, etc.
- Data grouping - group related results.
- URL pattern matching - run functions on matched URLs.
- Priority - reorder functions based on priority.
- Setup function - enable setup steps (clicking dialogs or login).
- Navigate function - enable navigation steps to move to other pages.
- Custom storage - option to save data to other formats or database.
- Async support - write async handlers.
- Option to use other parser backends aside from Playwright.
- BeautifulSoup4 -
pip install pydude[bs4] - Parsel -
pip install pydude[parsel] - lxml -
pip install pydude[lxml] - Selenium -
pip install pydude[selenium]
- BeautifulSoup4 -
- Option to follow all links indefinitely (Crawler/Spider).
- Events - attach functions to startup, pre-setup, post-setup and shutdown events.
- Option to save data on every page.
Supported Parser Backends
By default, Dude uses Playwright but gives you an option to use parser backends that you are familiar with. It is possible to use parser backends like BeautifulSoup4, Parsel, lxml, and Selenium.
Here is the summary of features supported by each parser backend.
<table> <thead> <tr> <td rowspan="2" style='text-align:center;'>Parser Backend</td> <td rowspan="2" style='text-align:center;'>Supports<br>Sync?</td> <td rowspan="2" style='text-align:center;'>Supports<br>Async?</td> <td colspan="4" style='text-align:center;'>Selectors</td> <td rowspan="2" style='text-align:center;'><a href="https://roniemartinez.github.io/dude/advanced/01_setup.html">Setup<br>Handler</a></td> <td rowspan="2" style='text-align:center;'><a href="https://roniemartinez.github.io/dude/advanced/02_navigate.html">Navigate<br>Handler</a></td> <td rowspan="2" style='text-align:center;'>Comments</td> </tr> <tr> <td>CSS</td> <td>XPath</td> <td>Text</td> <td>Regex</td> </tr> </thead> <tbody> <tr> <td>Playwright</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td></td> </tr> <tr> <td>BeautifulSoup4</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>🚫</td> <td>🚫</td> <td>🚫</td> <td>🚫</td> <td>🚫</td> <td></td> </tr> <tr> <td>Parsel</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>🚫</td> <td>🚫</td> <td></td> </tr> <tr> <td>lxml</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>🚫</td> <td>🚫</td> <td></td> </tr> <tr> <td>Pyppeteer</td> <td>🚫</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>🚫</td> <td>✅</td> <td>✅</td> <td>Not supported from 0.23.0</td> </tr> <tr> <td>Selenium</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>✅</td> <td>🚫</td> <td>✅</td> <td>✅</td> <td></td> </tr> </tbody> </table>Using the Docker image
Pull the docker image using the following command.
docker pull roniemartinez/dude
Assuming that script.py exist in the current directory, run Dude using the following command.
docker run -it --rm -v "$PWD":/code roniemartinez/dude dude scrape --url <url> script.py
Documentation
Read the complete documentation at https://roniemartinez.github.io/dude/. All the advanced and useful features are documented there.
Requirements
- ✅ Any dude should know how to work with selectors (CSS or XPath).
- ✅ Familiarity with any backends that you love (see Supported Parser Backends)
- ✅ Python decorators... you'll live, dude!
Why name this project "dude"?
- ✅ A Recursive acronym looks nice.
- ✅ Adding "uncomplicated" (like
ufw) into the name says it is a very simple framework. - ✅ Puns! I also think that if you want to do web scraping, there's probably some random dude around the corner who can make it very easy for you to start with it. 😊
Author
Contributors ✨
Thanks goes to these wonderful people (emoji key):
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> <!-- prettier-ignore-start --> <!-- markdownlint-disable --> <table> <tr> <td align="center"><a href="https://ron.sh"><img src="https://avatars.githubusercontent.com/u/2573537?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Ronie Martinez</b></sub></a><br /><a href="#maintenance-roniemartinez" title="Maintenance">🚧</a> <a href="https://github.com/roniemartinez/dude/commits?author=roniemartinez" title="Code">💻</a> <a href="https://github.com/roniemartinez/dude/commits?author=roniemartinez" title="Documentation">📖</a> <a href="#infra-roniemartinez" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a></td> </tr> </table> <!-- markdownlint-restore --> <!-- prettier-ignore-end --> <!-- ALL-CONTRIBUTORS-LIST:END -->This project follows the all-contributors specification. Contributions of any kind welcome!
Related Skills
node-connect
330.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
81.4kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
81.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
330.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
