<img src="https://github.com/user-attachments/assets/2c380638-b04a-4b04-b1c8-2958e4237a94" alt="Pydoll Logo" /> Async-native, fully typed, built for evasion and performance. <a href="https://github.com/autoscrape-labs/pydoll/stargazers"><img src="https://img.shields.io/github/stars/autoscrape-labs/pydoll?style=social"></a> <a href="https://codecov.io/gh/autoscrape-labs/pydoll" > <img src="https://codecov.io/gh/autoscrape-labs/pydoll/graph/badge.svg?token=40I938OGM9"/> </a> <img src="https://github.com/autoscrape-labs/pydoll/actions/workflows/tests.yml/badge.svg" alt="Tests"> <img src="https://github.com/autoscrape-labs/pydoll/actions/workflows/ruff-ci.yml/badge.svg" alt="Ruff CI"> <img src="https://github.com/autoscrape-labs/pydoll/actions/workflows/mypy.yml/badge.svg" alt="MyPy CI"> <img src="https://img.shields.io/badge/python-%3E%3D3.10-blue" alt="Python >= 3.10"> <a href="https://deepwiki.com/autoscrape-labs/pydoll"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a> <a href="https://pydoll.tech/">Documentation</a> · <a href="#getting-started">Getting Started</a> · <a href="#features">Features</a> · <a href="#support">Support</a>

Pydoll automates Chromium-based browsers (Chrome, Edge) by connecting directly to the Chrome DevTools Protocol over WebSocket. No WebDriver binary, no navigator.webdriver flag, no compatibility issues.

It combines a high-level API for stealthy automation with low-level CDP access for fine-grained control over network, fingerprinting, and browser behavior. And with its new Pydantic-powered extraction engine, it maps the DOM directly to structured Python objects, delivering an unmatched Developer Experience (DX).

Top Sponsors

Read a full review of Pydoll on <a href="https://substack.thewebscraping.club/p/pydoll-webdriver-scraping?utm_source=github&utm_medium=repo&utm_campaign=pydoll">The Web Scraping Club</a>, the #1 newsletter dedicated to web scraping.

Why Pydoll

Structured extraction: Define a Pydantic model, call tab.extract(), get typed and validated data back. No manual element-by-element querying.
Async and typed: Built on asyncio from the ground up, 100% type-checked with mypy. Full IDE autocompletion and static error checking.
Stealth built in: Human-like mouse movement, realistic typing, and granular browser preference control for fingerprint management.
Network control: Intercept requests to block ads/trackers, monitor traffic for API discovery, and make authenticated HTTP requests that inherit the browser session.
Shadow DOM and iframes: Full support for shadow roots (including closed) and cross-origin iframes. Discover, query, and interact with elements inside them using the same API.

Installation

pip install pydoll-python

No WebDriver binaries or external dependencies required.

Getting Started

1. Stateful Automation & Evasion

When you need to navigate, bypass challenges, or interact with dynamic UI, Pydoll's imperative API handles it with humanized timing by default.

import asyncio
from pydoll.browser import Chrome
from pydoll.constants import Key

async def google_search(query: str):
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.go_to('https://www.google.com')

        # Find elements and interact with human-like timing
        search_box = await tab.find(tag_name='textarea', name='q')
        await search_box.insert_text(query)
        await tab.keyboard.press(Key.ENTER)

        first_result = await tab.find(
            tag_name='h3',
            text='autoscrape-labs/pydoll',
            timeout=10,
        )
        await first_result.click()
        print(f"Page loaded: {await tab.title}")

asyncio.run(google_search('pydoll site:github.com'))

2. Structured Data Extraction

Once you reach the target page, switch to the declarative engine. Define what you want with a model, and Pydoll extracts it — typed, validated, and ready to use.

from pydoll.browser.chromium import Chrome
from pydoll.extractor import ExtractionModel, Field

class Quote(ExtractionModel):
    text: str = Field(selector='.text', description='The quote text')
    author: str = Field(selector='.author', description='Who said it')
    tags: list[str] = Field(selector='.tag', description='Tags')
    year: int | None = Field(selector='.year', description='Year', default=None)

async def extract_quotes():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.go_to('https://quotes.toscrape.com')

        quotes = await tab.extract_all(Quote, scope='.quote', timeout=5)

        for q in quotes:
            print(f'{q.author}: {q.text}')  # fully typed, IDE autocomplete works
            print(q.tags)                    # list[str], not a raw element
            print(q.model_dump_json())       # pydantic serialization built-in

asyncio.run(extract_quotes())

Models support CSS/XPath auto-detection, HTML attribute targeting, custom transforms, and nested models.

<details> <summary>Nested models, transforms, and attribute extraction</summary>

from datetime import datetime
from pydoll.extractor import ExtractionModel, Field

def parse_date(raw: str) -> datetime:
    return datetime.strptime(raw.strip(), '%B %d, %Y')

class Author(ExtractionModel):
    name: str = Field(selector='.author-title')
    born: datetime = Field(
        selector='.author-born-date',
        transform=parse_date,
    )

class Article(ExtractionModel):
    title: str = Field(selector='h1')
    url: str = Field(selector='.source-link', attribute='href')
    author: Author = Field(selector='.author-card', description='Nested model')

article = await tab.extract(Article, timeout=5)
article.author.born.year  # int — types are preserved all the way down

</details>

Features

<details> <summary>Humanized Mouse Movement</summary>

Mouse operations produce human-like cursor movement by default:

Bezier curve paths with asymmetric control points
Fitts's Law timing: duration scales with distance
Minimum-jerk velocity: bell-shaped speed profile
Physiological tremor: Gaussian noise scaled with velocity
Overshoot correction: ~70% chance on fast movements, then corrects back

await tab.mouse.move(500, 300)
await tab.mouse.click(500, 300)
await tab.mouse.drag(100, 200, 500, 400)

button = await tab.find(id='submit')
await button.click()

# Opt out when speed matters
await tab.mouse.click(500, 300, humanize=False)

Mouse Control Docs

</details> <details> <summary>Shadow DOM Support</summary>

Full Shadow DOM support, including closed shadow roots. Because Pydoll operates at the CDP level (below JavaScript), the closed mode restriction doesn't apply.

shadow = await element.get_shadow_root()
button = await shadow.query('.internal-btn')
await button.click()

# Discover all shadow roots on the page
shadow_roots = await tab.find_shadow_roots()
for sr in shadow_roots:
    checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)
    if checkbox:
        await checkbox.click()

Highlights:

Closed shadow roots work without workarounds
find_shadow_roots() discovers every shadow root on the page
timeout parameter for polling until shadow roots appear
deep=True traverses cross-origin iframes (OOPIFs)
Standard find(), query(), click() API inside shadow roots

Shadow DOM Docs

</details> <details> <summary>HAR Network Recording</summary>

Record network activity during a browser session and export as HAR 1.2. Replay recorded requests to reproduce exact API sequences.

from pydoll.browser.chromium import Chrome

async with Chrome() as browser:
    tab = await browser.start()

    async with tab.request.record() as capture:
        await tab.go_to('https://example.com')

    capture.save('flow.har')
    print(f'Captured {len(capture.entries)} requests')

    responses = await tab.request.replay('flow.har')

HAR Recording Docs

</details> <details> <summary>Page Bundles</summary>

Save the current page and all its assets (CSS, JS, images, fonts) as a .zip bundle for offline viewing. Optionally inline everything into a single HTML file.

await tab.save_b

Pydoll

Install / Use

README