Scrapling

一个自适应Web Scraping框架，能处理从单个请求到大规模爬取的一切需求。它的解析器能够从网站变化中学习，并在页面更新时自动重新定位您的元素。它的Fetcher能够开箱即用地绕过Cloudflare Turnstile等反机器人系统。它的Spider框架让您可以扩展到并发、多Session爬取，支持暂停/恢复和自动Proxy轮换——只需几行Python代码。一个库，零妥协。

Generate Convert Improve

Install / Use

/learn @dorisoy/Scrapling

About this skill

Quality Score

0/100

README

<h1 align="center"> <a href="https://scrapling.readthedocs.io"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/docs/assets/cover_dark.svg?sanitize=true"> <img alt="Scrapling Poster" src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/docs/assets/cover_light.svg?sanitize=true"> </picture> </a> Effortless Web Scraping for the Modern Web </h1> <a href="https://trendshift.io/repositories/14244" target="_blank"><img src="https://trendshift.io/api/badge/repositories/14244" alt="D4Vinci%2FScrapling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_AR.md">العربيه</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_ES.md">Español</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_DE.md">Deutsch</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_CN.md">简体中文</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_JP.md">日本語</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_RU.md">Русский</a> <a href="https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml" alt="Tests"> <img alt="Tests" src="https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml/badge.svg"></a> <a href="https://badge.fury.io/py/Scrapling" alt="PyPI version"> <img alt="PyPI version" src="https://badge.fury.io/py/Scrapling.svg"></a> <a href="https://pepy.tech/project/scrapling" alt="PyPI Downloads"> <img alt="PyPI Downloads" src="https://static.pepy.tech/personalized-badge/scrapling?period=total&units=INTERNATIONAL_SYSTEM&left_color=GREY&right_color=GREEN&left_text=Downloads"></a> <a href="https://discord.gg/EMgGbDceNQ" alt="Discord" target="_blank"> <img alt="Discord" src="https://img.shields.io/discord/1360786381042880532?style=social&logo=discord&link=https%3A%2F%2Fdiscord.gg%2FEMgGbDceNQ"> </a> <a href="https://x.com/Scrapling_dev" alt="X (formerly Twitter)"> <img alt="X (formerly Twitter) Follow" src="https://img.shields.io/twitter/follow/Scrapling_dev?style=social&logo=x&link=https%3A%2F%2Fx.com%2FScrapling_dev"> </a> <a href="https://pypi.org/project/scrapling/" alt="Supported Python versions"> <img alt="Supported Python versions" src="https://img.shields.io/pypi/pyversions/scrapling.svg"></a> <a href="https://scrapling.readthedocs.io/en/latest/parsing/selection/">Selection methods</a> · <a href="https://scrapling.readthedocs.io/en/latest/fetching/choosing/">Fetchers</a> · <a href="https://scrapling.readthedocs.io/en/latest/spiders/architecture.html">Spiders</a> · <a href="https://scrapling.readthedocs.io/en/latest/spiders/proxy-blocking.html">Proxy Rotation</a> · <a href="https://scrapling.readthedocs.io/en/latest/cli/overview/">CLI</a> · <a href="https://scrapling.readthedocs.io/en/latest/ai/mcp-server/">MCP</a>

Scrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.

Its parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation — all in a few lines of Python. One library, zero compromises.

Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.

from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)  # Fetch website under the radar!
products = p.css('.product', auto_save=True)                                        # Scrape data that survives website design changes!
products = p.css('.product', adaptive=True)                                         # Later, if the website structure changes, pass `adaptive=True` to find them!

Or scale up to full crawls

from scrapling.spiders import Spider, Response

class MySpider(Spider):
  name = "demo"
  start_urls = ["https://example.com/"]

  async def parse(self, response: Response):
      for item in response.css('.product'):
          yield {"title": item.css('h2::text').get()}

MySpider().start()

Platinum Sponsors

Do you want to be the first company to show up here? Click here

Sponsors

Do you want to show your ad here? Click here and choose the tier that suites you!

Key Features

Spiders — A Full Crawling Framework

🕷️ Scrapy-like Spider API: Define spiders with start_urls, async parse callbacks, and Request/Response objects.
⚡ Concurrent Crawling: Configurable concurrency limits, per-domain throttling, and download delays.
🔄 Multi-Session Support: Unified interface for HTTP requests, and stealthy headless browsers in a single spider — route requests to different sessions by ID.
💾 Pause & Resume: Checkpoint-based crawl persistence. Press Ctrl+C for a graceful shutdown; restart to resume from where you left off.
📡 Streaming Mode: Stream scraped items as they arrive via async for item in spider.stream() with real-time stats — ideal for UI, pipelines, and long-running crawls.
🛡️ Blocked Request Detection: Automatic detection and retry of blocked requests with customizable logic.
📦 Built-in Export: Export results through hooks and your own pipeline or the built-in JSON/JSONL with result.items.to_json() / result.items.to_jsonl() respectively.

Advanced Websites Fetching with Session Support

HTTP Requests: Fast and stealthy HTTP requests with the Fetcher class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP/3.
Dynamic Loading: Fetch dynamic websites with full browser automation through the DynamicFetcher class supporting Playwright's Chromium and Google's Chrome.
Anti-bot Bypass: Advanced stealth capabilities with StealthyFetcher and fingerprint spoofing. Can easily bypass all types of Cloudflare's Turnstile/Interstitial with automation.
Session Management: Persistent session support with FetcherSession, StealthySession, and DynamicSession classes for cookie and state management across requests.
Proxy Rotation: Built-in ProxyRotator with cyclic or custom rotation strategies across all session types, plus per-request proxy overrides.
Domain Blocking: Block requests to specific domains (and their

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。