BAScraper

An asynchronous Python Reddit API wrapper for fetching posts, comments for data anlytics from Reddit. Utilizes PullPush and Arctic-Shift.

Generate Convert Improve

Install / Use

/learn @maxjo020418/BAScraper

About this skill

Quality Score

0/100

README

BAScraper

Introduction
Features
Installation and Basic Usage
- Usage Example
Parameters
- Initialization Parameters
- Fetch Parameters
  - PullPushAsync Fetch Parameters
    - Common Parameters
    - Mode-Specific Parameters
  - ArcticShiftAsync Fetch Parameters
    - Common Parameters
    - Mode-Specific Parameters
Rate Limits and Performance
- PullPush.io
- Arctic-Shift
Returned JSON Object Structure

[!WARNING] Usage (Classes and Functions) method has drastically changed and also the following README doc. The old docs are in ./BAScraper_old/README_old.md.

This new v0.2.x-a is only tested to the extent that I personally use, so full coverage testing has not been done. It also hasn't been published to PyPi (PyPi on v0.1.2), manually download for the newest v0.2-a please report unexpected issues that may occur.

An API wrapper for PullPush.io and Arctic-Shift - the 3rd party replacement APIs for Reddit. Nothing special.

After the 2023 Reddit API controversy, PushShift.io(and also wrappers such as PSAW and PMAW) is now only available to reddit admins and Reddit PRAW is honestly useless when trying to get a lots of data and data from a specific timeframe. This aims to help with that since these 3rd party services didn't have any official/unofficial python wrappers.

Features

Asynchronous operations for better performance. (updated from the old multithreaded approach)
Support for PullPush.io and Arctic Shift APIs.
Parameter customization for subreddit, comment, and submission searches.
Integrated rate-limit management.
Parameter schemes for data selection.

Also, please respect cool-down times and refrain from requesting very large amount of data. It stresses the server and can cause inconvenience for everyone.

For large amounts of data, head to ArcticShift's academic torrent zst dumps

Links to the services:

Installation and basic usage

you can install the package via pip

pip install BAScraper

Python 3.12+ is required

Usage Example

from BAScraper.BAScraper_async import PullPushAsync, ArcticShiftAsync
import asyncio

ppa = PullPushAsync(log_stream_level="DEBUG", task_num=2)
asa = ArcticShiftAsync(log_stream_level="DEBUG", task_num=10)


async def test1():
    print('TEST 1-1 - PullPushAsync basic fetching')
    result1 = await ppa.fetch(
        mode='submissions',
        subreddit='cars',
        get_comments=True,
        after='2024-07-01',
        before='2024-07-01T06:00:00',
        file_name='test1-1'
    )
    print('test 1 len:', len(result1))

    print('\nTEST 1-2 - PullPushAsync basic comment fetching')
    result2 = await ppa.fetch(
        mode='comments',
        subreddit='cars',
        after='2024-07-01',
        before='2024-07-01T06:00:00',
        file_name='test1-2'
    )
    print('test 2 len:', len(result2))


async def test2():
    print('TEST 2-1 - ArcticShiftAsync basic fetching')
    result1 = await asa.fetch(
        mode='submissions_search',
        subreddit='cars',
        # get_comments=True,  # can be uncommented to fetch comments
        after='2024-07-01',
        before='2024-07-05T03:00:00',
        file_name='test2-1',
        fields=['created_utc', 'title', 'url', 'id'],
        limit=0  # auto
    )
    print('test 1 len:', len(result1))

    print('\nTEST 2-2 - ArcticShiftAsync basic comment fetching')
    result2 = await asa.fetch(
        mode='comments_search',
        subreddit='cars',
        body='bmw honda benz',
        after='2024-07-01',
        before='2024-07-01T12:00:00',
        file_name='test2-2',
        limit=100,
        fields=['created_utc', 'body', 'id'],
    )
    print('test 2 len:', len(result2))

    print('\nTEST 2-3 - ArcticShiftAsync subreddits_search')
    result3 = await asa.fetch(
        mode='subreddits_search',
        subreddit_prefix='what',
        file_name='test2-3',
        limit=1000
    )
    print('test 3 len:', len(result3))

if __name__ == '__main__':
    if input('test pullpush?: ') == 'y':
        asyncio.run(test1())
    if input('test arcticshift?: ') == 'y':
        asyncio.run(test2())

# all results are saved to 'resultX.json' since the `file_name` field was specified. 
# it'll save all the results in the current directory since `save_dir` wasn't specified

[!NOTE] When using multiple requests, (as in multiple functions under PullPushAsync) it is highly recommended to use the functions under the same instance because all the request pool related variables would be shared in that case.

Also, when re-running scripts using this, pools recording the request status is reset every time. So keep in mind that unexpected soft/hard rate limits may occur when frequently (re-)running scripts. Consider waiting a few minutes or seconds before running scripts if needed.

Parameters

For more info on each of the parameters as well as additional info (TOS, extra tools, etc) visit the following links:

Initialization Parameters

for PullPushAsync.__init__ & ArcticShiftAsync.__init__

| Parameter | Type | Restrictions | Required | Default Value | Notes | |--------------------|-------|------------------------------------------------------------------------------------------|----------|-----------------------------------|-----------------------------------------------------------| | sleep_sec | int | Positive int | No | 1 | Cooldown time between each request. | | backoff_sec | int | Positive int | No | 3 | Backoff time for each failed request. | | max_retries | int | Positive int | No | 5 | Number of retries for failed requests before it gives up. | | timeout | int | Positive int | No | 10 | Time until it's considered as timeout error. | | pace_mode | str | One of 'auto-soft', 'auto-hard', 'manual' | No | 'auto-hard' | Sets the pace to mitigate rate-limiting. | | save_dir | str | Valid path | No | os.getcwd() (current directory) | Directory to save the results. | | task_num | int | Positive int | No | 3 | Number of async tasks to be made. | | log_stream_level | str | One of ['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | No | 'INFO' | Sets the log level for logs streamed on the terminal. | | log_level | str | Same as log_stream_level | No | 'DEBUG' | Sets the log level for logging (file). | | duplicate_action | str | One of 'keep_newest', 'keep_oldest', 'remove', 'keep_original', 'keep_removed' | No | 'keep_newest' | Decides handling of duplicates. |

Fetch Parameters (`fetch`)

`PullPushAsync.fetch` common parameters

| Parameter | Type | Restrictions | Required | Notes | |-------------|--------|-------------------------------------------------------------|----------|-------------------------------------------| | q | str | Quoted string for phrases | No | Search query for comments or submissions. | | ids | list | Maximum length: 100 | No | List of IDs to fetch. | | size | int | Must be <= 100 | No | Number of results to return. | | sort | str | Must be one of "asc", "desc" | No | Sorting order. | | sort_type | str | Must be one of "score", "num_comments", "created_utc" | No | Sorting criteria. | | author | str | None | No | Filter by author. | | subreddit | str | None

Related Skills

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

maxjo020418

View profile

View on GitHub

GitHub Stars23

CategoryDevelopment

Updated7d ago

Forks2

maxjo020418/BAScraper

Languages

Python

Security Score

95/100

Audited on Mar 29, 2026

No findings

BAScraper

Install / Use

README

BAScraper

Table of Contents

Features

Installation and basic usage

Usage Example

Parameters

Initialization Parameters

Fetch Parameters (fetch)

PullPushAsync.fetch common parameters

Related Skills

Fetch Parameters (`fetch`)

`PullPushAsync.fetch` common parameters