Discrawl
cli for discord with sqlite backend
Install / Use
/learn @steipete/DiscrawlREADME
discrawl 🛰️ — Mirror Discord into SQLite; search server history locally
discrawl mirrors Discord guild data into local SQLite so you can search, inspect, and query server history without depending on Discord search.
It is a bot-token crawler. No user-token hacks. Data stays local.
What It Does
- discovers every guild the configured bot can access
- syncs channels, threads, members, and message history into SQLite
- maintains FTS5 search indexes for fast local text search
- builds an offline member directory from archived profile payloads
- extracts small text-like attachments into the local search index
- records structured user and role mentions for direct querying
- tails Gateway events for live updates, with periodic repair syncs
- exposes read-only SQL for ad hoc analysis
- keeps schema multi-guild ready while preserving a simple single-guild default UX
Search defaults to all guilds. sync and tail default to the configured default guild when one exists, otherwise they fan out to all discovered guilds.
Requirements
- Go
1.26+ - a Discord bot token the bot can use to read the target guilds
- bot permissions for the channels you want archived
Discord Bot Setup
discrawl needs a real bot token. Not a user token.
Minimum practical setup:
- Create or reuse a Discord application in the Discord developer portal.
- Add a bot user to that application.
- Invite the bot to the target guilds.
- Enable these intents for the bot:
Server Members IntentMessage Content Intent
- Ensure the bot can at least:
- view channels
- read message history
Without those intents/permissions, sync, tail, member snapshots, or message content archiving will be partial or fail.
Bot Token Sources
Token resolution:
- OpenClaw config, if
discord.token_sourceis notenv DISCORD_BOT_TOKENor the configureddiscord.token_env
discrawl accepts either raw token text or a value prefixed with Bot . It normalizes that automatically.
Fastest env-only path:
export DISCORD_BOT_TOKEN="your-bot-token"
discrawl doctor
discrawl init
If you keep shell secrets in ~/.profile, add:
export DISCORD_BOT_TOKEN="your-bot-token"
Then reload your shell before running discrawl.
If you already use OpenClaw, discrawl can reuse the Discord token from ~/.openclaw/openclaw.json by default.
Default runtime paths:
- config:
~/.discrawl/config.toml - database:
~/.discrawl/discrawl.db - cache:
~/.discrawl/cache/ - logs:
~/.discrawl/logs/
Install
Homebrew (recommended):
brew install steipete/tap/discrawl # auto-taps steipete/tap
discrawl --version
Build from source:
git clone https://github.com/steipete/discrawl.git
cd discrawl
go build -o bin/discrawl ./cmd/discrawl
./bin/discrawl --version
Examples below assume discrawl is on PATH. If you built from source without installing it, replace discrawl with ./bin/discrawl.
Quick Start
Reuse an existing OpenClaw Discord bot config:
discrawl init --from-openclaw ~/.openclaw/openclaw.json
discrawl doctor
discrawl sync --full
discrawl search "panic: nil pointer"
discrawl tail
Multi-account OpenClaw setup:
discrawl init --from-openclaw ~/.openclaw/openclaw.json --account atlas
Env-only setup:
export DISCORD_BOT_TOKEN="..."
discrawl doctor
discrawl init
discrawl sync --full
init discovers accessible guilds and writes ~/.discrawl/config.toml. If exactly one guild is available, that guild becomes the default automatically.
doctor is the fastest sanity check:
- confirms config can be loaded
- shows where the token was resolved from
- verifies bot auth
- shows how many guilds the bot can access
- verifies DB + FTS wiring
Commands
init
Creates the local config and discovers accessible guilds.
discrawl init
discrawl init --from-openclaw ~/.openclaw/openclaw.json
discrawl init --from-openclaw ~/.openclaw/openclaw.json --account atlas
discrawl init --guild 123456789012345678
discrawl init --db ~/data/discrawl.db
When OpenClaw config tokens use ${ENV_VAR} placeholders, init and doctor resolve them before auth.
sync
Backfills guild state into SQLite.
discrawl sync --full
discrawl sync --full --all
discrawl sync --guild 123456789012345678
discrawl sync --guilds 123,456 --concurrency 8
discrawl sync --channels 111,222 --since 2026-03-01T00:00:00Z
sync already uses parallel channel workers. --concurrency overrides the default, and the default is auto-sized from GOMAXPROCS with a floor of 8 and a cap of 32.
--all ignores default_guild_id and fans out across every discovered guild the bot can access.
When --channels includes a forum channel id, discrawl expands that forum's threads and syncs their messages as part of the targeted run.
--since limits initial history/bootstrap and full-history backfill to messages at or after the given RFC3339 timestamp. It does not mark older history as complete, so a later sync --full without --since can continue the backfill.
Long runs now emit periodic progress logs to stderr so large backfills do not look hung.
If in-flight channels stop completing for a while, discrawl now emits message sync waiting heartbeat logs with the oldest active channel, per-channel page activity, and skip/defer counters, and every run ends with a message sync finished summary.
Each channel crawl also has a bounded runtime budget, so a pathological channel is deferred and retried on the next sync instead of pinning a worker forever.
Full sync member refresh is best-effort and currently gives up after five minutes without a caller-supplied deadline, so message sync completion is not held hostage by a slow guild member crawl.
When the archive is already complete, sync --full now reuses the stored backlog markers and limits steady-state refresh to live top-level channels plus active threads instead of revisiting every stored archived thread.
If a guild already has a local member snapshot, routine syncs reuse it and skip another full member crawl until that snapshot ages out.
tail
Runs the live Gateway tail and periodic repair loop.
discrawl tail
discrawl tail --guild 123456789012345678
discrawl tail --repair-every 30m
search
Runs FTS search over archived messages.
discrawl search "panic: nil pointer"
discrawl search --guild 123456789012345678 "payment failed"
discrawl search --channel billing --author steipete --limit 50 "invoice"
discrawl search --include-empty "GitHub"
discrawl --json search "websocket closed"
By default, search skips rows with no searchable content. Attachment text, attachment filenames, embeds, and replies still count as content. Use --include-empty to opt back in.
Search returns the newest matching messages first so large local archives stay responsive.
messages
Lists exact message slices by channel, author, and time range.
discrawl messages --channel maintainers --days 7 --all
discrawl messages --channel maintainers --hours 6 --all
discrawl messages --channel "#maintainers" --since 2026-03-01T00:00:00Z
discrawl messages --channel 1456744319972282449 --author steipete --limit 50
discrawl messages --channel maintainers --last 100 --sync
discrawl messages --channel maintainers --days 7 --all --include-empty
discrawl --json messages --channel maintainers --days 3
Notes:
--channelaccepts a channel id, exact name,#name, or partial name match--hoursis shorthand for "since now minus N hours"--daysis shorthand for "since now minus N days"--lastreturns the newestNmatching messages, then prints them oldest-to-newest--allremoves the safety limit; default is200--syncruns a blocking pre-query sync for the matching channel or guild scope before reading the local DB- rows with no displayable/searchable content are skipped by default;
--include-emptyopts back in - at least one filter is required
mentions
Lists structured user and role mentions.
discrawl mentions --channel maintainers --days 7
discrawl mentions --target steipete --type user --limit 50
discrawl mentions --target 1456406468898197625
discrawl --json mentions --type role --days 1
Notes:
--targetaccepts an id, exact name, or partial name match--typecan beuserorrole- same guild/time filters as
messages
sql
Runs read-only SQL against the local database.
discrawl sql 'select count(*) as messages from messages'
echo 'select guild_id, count(*) from messages group by guild_id' | discrawl sql -
members
discrawl members list
discrawl members show 123456789012345678
discrawl members show --messages 10 steipete
discrawl members search "peter"
discrawl members search "github"
discrawl members search "https://github.com/steipete"
Notes:
searchmatches names plus any offline profile fields present in the archived member payloadshowaccepts a user id or query; if it resolves to one member, it also shows recent messages- extracted profile fields may include
bio,pronouns,location,website,x,github, and discovered URLs - if the bot cannot see a field from Discord,
discrawlcannot invent it; this is strictly archive-based offline data
Typical workflow:
discrawl sync --full
discrawl members search "design engineer"
discrawl members search "github"
discrawl members show --messages 25 steipete
discrawl messages --author steipete --days 30 --all
Typical members show output:
guild=1456350064065904867
user=37658261826043904
username=steipete
display=Peter Steinberger
joined=2026-03-08T16:03:14Z
bot=false
x=steipete
github=steipete
website=https://steipete.me
bio=Builds native apps and tooling.
urls=https://steipete.me, https://github.com/steipete
message_count=1284
first_message=2026-02-01T09:00:00Z
last_message=2026-03-08T15:59:58Z
``
Related Skills
feishu-drive
349.0k|
things-mac
349.0kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
349.0kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
