ZenScraper
ZenScraper is an asynchronous scraper built with Python and Playwright designed for efficiently retrieving tweets from X.com (formerly Twitter). It supports scraping original tweets, retweets, and filtering tweets by date.
Install / Use
/learn @0Day3xpl0it/ZenScraperREADME
ZenScraper
ZenScraper is an asynchronous scraper built with PlayWright and json designed for efficiently retrieving tweets from X.com (formerly Twitter). It supports scraping original tweets, retweets, and filtering tweets by date.
Key Features
- Flexible Scraping: Choose to scrape original tweets, retweets, or both.
- Date Filtering: Filter tweets based on specific date ranges (
--since-after,--before). - Session Authentication: Uses cookies for authenticated scraping sessions.
- Configurable Output: Outputs scraped data to JSON format with structured metadata or a cleaned text format.
- Headless or Visible Mode: Operate in headless mode for automation or visible mode for debugging.
Requirements
- Python 3.8 or newer
- Playwright
Installation
Clone the repository and install the required dependencies:
git clone https://github.com/0Day3xpl0it/zenscraper.git
cd zenscraper
chmod +x *.py
pip install -r requirements.txt
playwright install
Next, generate an authenticated session cookie:
python3 grab_x_cookies.py
This script will create the x_cookies.json file necessary for authenticated scraping.
Usage
Basic command structure:
python3 zenscraper.py --username <username> [options]
Example with Time Filters
Scrape tweets from the @elonmusk account within a specific date range:
python3 zenscraper.py --username elonmusk --since-after 2025-01-01T00:00:00 --before 2025-02-01T00:00:00 --type tweets --output elonmusk_jan.json --scrolls 40 --max 200
This command collects up to 200 original tweets from January 2025, saving the output to elonmusk_jan.json.
Command-Line Options
| Option | Description | Default Value |
| --------------- | -------------------------------------------- | ------------------- |
| --username | (Required) X.com username to scrape | - |
| --type | Content type: tweets, retweets, bio, or all | all |
| --output | Output file (.json or .txt) | <username>.json |
| --since-after | Include tweets after this date (ISO 8601) | None |
| --before | Include tweets before this date (ISO 8601) | None |
| --scrolls | Number of scroll actions | 30 |
| --max | Maximum tweets to retrieve | 50 |
| --no-headless | Display browser during scraping | Headless by default |
| --delay | Add delay for throttling | 2 |
TODO
- Add functionality to expand full text for tweets and retweets (complete - 5/8/25)
- Add functionality to retrieve additional tweet data types (complete - 5/8/25)
- Add functionality to grab all user bio data (complete - 5/9/25)
- Add functionality to effectively grab replies and thread them to parent conversations
Important Notes
- A valid
x_cookies.jsonfile is required for authenticated scraping. - Include multiple user-agent strings in
user_agents.txtfor request rotation. - Date options do not currently work with retweets as the X search function doesn't show retweets.
- The scraper leverages asynchronous Playwright operations for optimal speed and efficiency.
- It is recommended to use a backup X account to perform scraping activities to prevent issues.
Contributing
Contributions are welcome! Open an issue or submit a pull request for improvements.
License
ZenScraper is licensed under the MIT License. See LICENSE for details.
