Arroba
Python implementation of Bluesky PDS and AT Protocol, including repo, MST, and sync XRPC methods
Install / Use
/learn @snarfed/ArrobaREADME
arroba

Python implementation of Bluesky PDS and AT Protocol, including data repository, Merkle search tree, and XRPC methods.
You can build your own PDS on top of arroba with just a few lines of Python and run it in any WSGI server. You can build a more involved PDS with custom logic and behavior. Or you can build a different ATProto service, eg an AppView, relay (née BGS), or something entirely new!
Install from PyPI with pip install arroba.
Arroba is the Spanish word for the @ character ("at sign").
License: This project is placed in the public domain. You may also use it under the CC0 License.
Usage
Here's minimal example code for a multi-repo PDS on top of arroba and Flask:
from flask import Flask
from google.cloud import ndb
from lexrpc.flask_server import init_flask
from arroba import server
from arroba.datastore_storage import DatastoreStorage
from arroba.firehose import send_events
# for Google Cloud Datastore
ndb_client = ndb.Client()
server.storage = DatastoreStorage(ndb_client=ndb_client)
server.repo.callback = lambda _: send_events() # to subscribeRepos
app = Flask('my-pds')
init_flask(server.server, app)
def ndb_context_middleware(wsgi_app):
def wrapper(environ, start_response):
with ndb_client.context():
return wsgi_app(environ, start_response)
return wrapper
app.wsgi_app = ndb_context_middleware(app.wsgi_app)
See app.py for a more comprehensive example, including a CORS handler for OPTIONS preflight requests and a catch-all app.bsky.* XRPC handler that proxies requests to the AppView.
Overview
Arroba consists of these parts:
- Data structures:
- Storage:
Storageabstract base classDatastoreStorage(uses Google Cloud Datastore)- TODO: filesystem storage
- XRPC handlers:
- Utilities:
did: create and resolvedid:plcs,did:webs, and domain handlesdiff: find the deterministic minimal difference between twoMSTsutil: miscellaneous utilities for TIDs, AT URIs, signing and verifying signatures, generating JWTs, encoding/decoding, and more
Configuration
Configure arroba with these environment variables:
APPVIEW_HOST, defaultapi.bsky-sandbox.devRELAY_HOST, defaultbgs.bsky-sandbox.devPLC_HOST, defaultplc.bsky-sandbox.devPDS_HOST, where you're running your PDS
Optional, only used in com.atproto.repo, .server, and .sync XRPC handlers:
REPO_TOKEN, static token to use as bothaccessJwtandrefreshJwt, defaults to contents ofrepo_tokenfile. Not required to be an actual JWT. If not set, XRPC methods that require auth will return HTTP 501 Not Implemented.ROLLBACK_WINDOW, number of events to serve in thesubscribeReposrollback window, as an integer. Defaults to 50k.PRELOAD_WINDOW, number of events to preload into thesubscribeReposrollback window at startup, as an integer. Defaults to 4k.SUBSCRIBE_REPOS_BATCH_DELAY, minimum time to wait between datastore queries incom.atproto.sync.subscribeRepos, in seconds, as a float. Defaults to 0 if unset.SUBSCRIBE_REPOS_SKIPPED_SEQ_WINDOW, number of sequence numbers to wait before skipping emitting a missing one over the firehose. Defaults to 300, ie 5 minutes at 1qps emitted events.SUBSCRIBE_REPOS_SKIPPED_SEQ_DELAY, seconds to wait before skipping emitting a missing sequence number over the firehose. Defaults to 120, ie 2 minutes.BLOB_MAX_BYTES, maximum allowed size of blobs, in bytes. Defaults to 100MB.BLOB_REFETCH_DAYS, how often in days to refetch remote URL-based blobs datastore to check that they're still serving. May be integer or float. Defaults to 7. These re-fetches happen on demand, duringcom.atproto.sync.getBlobrequests.BLOB_REFETCH_TYPES, comma-separated list of MIME types (without subtypes, ie the part after/) to refetch blobs for. Defaults toimage.MEMCACHE_SEQUENCE_BATCH, integer, size of batch of sequence numbers to allocate fromAtpSequenceinto memcache. Defaults to 1000.MEMCACHE_SEQUENCE_BUFFER, integer, how close we should let memcache get to the current max allocated sequence number inAtpSequencebefore we allocate a new batch. Defaults to 100.
Changelog
2.1 - unreleased
datastore_storage:read_blocks_by_seq: set explicit 30s timeout on datastore query. Evidently, maybe, in rare cases, datastore queries can hang indefinitely if they don't have an explicit timeout (snarfed/bridgy-fed#2367).
did:write_plcetc: add new optionalnew_rotation_keykwarg. AcceptEllipticCurvePublicKeyforsigning_keyas well asEllipticCurvePrivateKey.
2.0 - 2026-02-07
Breaking changes:
- When creating a new repo, the first commit is now always empty.
Repo.create_from_commithas been removed; all repos should now be created withRepo.create. - Removed
Repo.apply_writes,format_commit,apply_commit, andwrites_to_commit_ops. Use the newStorage.commitmethod instead. - If no sequence number has ever been allocated for a given NSID,
Storage.last_seqnow returnsNone, and does not initialize the sequence.
Non-breaking changes:
- Add new feature to allocate sequence numbers from memcache, atomically, backed by the datastore in batches. Reduces datastore contention when writing commits at 5-10qps and higher. Enable by passing a
MemcacheSequencesto theDatastoreStorageconstructor; configure with theMEMCACHE_SEQUENCE_BUFFERandMEMCACHE_SEQUENCE_BATCHenvironment variables. - Add new
SUBSCRIBE_REPOS_SKIPPED_SEQ_WINDOWandSUBSCRIBE_REPOS_SKIPPED_SEQ_DELAYenvironment variables forsubscribeRepos(firehose) serving. AtpRemoteBlob:- Add
reposproperty to track which repos have which blobs. - Switch image handling to pymediainfo, drop Pillow dependency.
- Add
did:- Add new
rollback_plcfunction. resolve_handle: raiseValueErrorif the handle has an_(underscore), since Bluesky handles don't allow them.
- Add new
firehose:- Omit
prevDatain initial commits instead of setting it tonull. (It's not a nullable field insubscribeRepos#commit.)
- Omit
repo:- Add
lost_seqkwarg to repo callback for marking sequence numbers lost.
- Add
storage:- Add new abstract
Sequencesclass and concrete subclassesMemorySequences,DatastoreSequences, andMemcacheSequences. - Add new optional
sequenceskwarg toStorageand subclasses' constructors.
- Add new abstract
xrpc_repo:describe_repo: addapp.bsky.graph.listblock.
xrpc_sync:get_blob: periodically check remote blobs with HTTP GET requests to see if they're still serving.get_record: include MST covering proof blocks for record.- Implement
listBlobs. subscribeRepos/`fire
