SkillAgentSearch skills...

Arroba

Python implementation of Bluesky PDS and AT Protocol, including repo, MST, and sync XRPC methods

Install / Use

/learn @snarfed/Arroba
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

arroba Circle CI Coverage Status

Python implementation of Bluesky PDS and AT Protocol, including data repository, Merkle search tree, and XRPC methods.

You can build your own PDS on top of arroba with just a few lines of Python and run it in any WSGI server. You can build a more involved PDS with custom logic and behavior. Or you can build a different ATProto service, eg an AppView, relay (née BGS), or something entirely new!

Install from PyPI with pip install arroba.

Arroba is the Spanish word for the @ character ("at sign").

License: This project is placed in the public domain. You may also use it under the CC0 License.

Usage

Here's minimal example code for a multi-repo PDS on top of arroba and Flask:

from flask import Flask
from google.cloud import ndb
from lexrpc.flask_server import init_flask

from arroba import server
from arroba.datastore_storage import DatastoreStorage
from arroba.firehose import send_events

# for Google Cloud Datastore
ndb_client = ndb.Client()

server.storage = DatastoreStorage(ndb_client=ndb_client)
server.repo.callback = lambda _: send_events()  # to subscribeRepos

app = Flask('my-pds')
init_flask(server.server, app)

def ndb_context_middleware(wsgi_app):
    def wrapper(environ, start_response):
        with ndb_client.context():
            return wsgi_app(environ, start_response)
    return wrapper

app.wsgi_app = ndb_context_middleware(app.wsgi_app)

See app.py for a more comprehensive example, including a CORS handler for OPTIONS preflight requests and a catch-all app.bsky.* XRPC handler that proxies requests to the AppView.

Overview

Arroba consists of these parts:

Configuration

Configure arroba with these environment variables:

  • APPVIEW_HOST, default api.bsky-sandbox.dev
  • RELAY_HOST, default bgs.bsky-sandbox.dev
  • PLC_HOST, default plc.bsky-sandbox.dev
  • PDS_HOST, where you're running your PDS

Optional, only used in com.atproto.repo, .server, and .sync XRPC handlers:

  • REPO_TOKEN, static token to use as both accessJwt and refreshJwt, defaults to contents of repo_token file. Not required to be an actual JWT. If not set, XRPC methods that require auth will return HTTP 501 Not Implemented.
  • ROLLBACK_WINDOW, number of events to serve in the subscribeRepos rollback window, as an integer. Defaults to 50k.
  • PRELOAD_WINDOW, number of events to preload into the subscribeRepos rollback window at startup, as an integer. Defaults to 4k.
  • SUBSCRIBE_REPOS_BATCH_DELAY, minimum time to wait between datastore queries in com.atproto.sync.subscribeRepos, in seconds, as a float. Defaults to 0 if unset.
  • SUBSCRIBE_REPOS_SKIPPED_SEQ_WINDOW, number of sequence numbers to wait before skipping emitting a missing one over the firehose. Defaults to 300, ie 5 minutes at 1qps emitted events.
  • SUBSCRIBE_REPOS_SKIPPED_SEQ_DELAY, seconds to wait before skipping emitting a missing sequence number over the firehose. Defaults to 120, ie 2 minutes.
  • BLOB_MAX_BYTES, maximum allowed size of blobs, in bytes. Defaults to 100MB.
  • BLOB_REFETCH_DAYS, how often in days to refetch remote URL-based blobs datastore to check that they're still serving. May be integer or float. Defaults to 7. These re-fetches happen on demand, during com.atproto.sync.getBlob requests.
  • BLOB_REFETCH_TYPES, comma-separated list of MIME types (without subtypes, ie the part after /) to refetch blobs for. Defaults to image.
  • MEMCACHE_SEQUENCE_BATCH, integer, size of batch of sequence numbers to allocate from AtpSequence into memcache. Defaults to 1000.
  • MEMCACHE_SEQUENCE_BUFFER, integer, how close we should let memcache get to the current max allocated sequence number in AtpSequence before we allocate a new batch. Defaults to 100.
<!-- Only used in app.py: * `REPO_DID`, repo user's DID, defaults to contents of `repo_did` file * `REPO_HANDLE`, repo user's domain handle, defaults to `did:plc:*.json` file * `REPO_PASSWORD`, repo user's password, defaults to contents of `repo_password` file * `REPO_PRIVKEY`, repo user's private key in PEM format, defaults to contents of `privkey.pem` file -->

Changelog

2.1 - unreleased

  • datastore_storage:
    • read_blocks_by_seq: set explicit 30s timeout on datastore query. Evidently, maybe, in rare cases, datastore queries can hang indefinitely if they don't have an explicit timeout (snarfed/bridgy-fed#2367).
  • did:
    • write_plc etc: add new optional new_rotation_key kwarg. Accept EllipticCurvePublicKey for signing_key as well as EllipticCurvePrivateKey.

2.0 - 2026-02-07

Breaking changes:

  • When creating a new repo, the first commit is now always empty. Repo.create_from_commit has been removed; all repos should now be created with Repo.create.
  • Removed Repo.apply_writes, format_commit, apply_commit, and writes_to_commit_ops. Use the new Storage.commit method instead.
  • If no sequence number has ever been allocated for a given NSID, Storage.last_seq now returns None, and does not initialize the sequence.

Non-breaking changes:

  • Add new feature to allocate sequence numbers from memcache, atomically, backed by the datastore in batches. Reduces datastore contention when writing commits at 5-10qps and higher. Enable by passing a MemcacheSequences to the DatastoreStorage constructor; configure with the MEMCACHE_SEQUENCE_BUFFER and MEMCACHE_SEQUENCE_BATCH environment variables.
  • Add new SUBSCRIBE_REPOS_SKIPPED_SEQ_WINDOW and SUBSCRIBE_REPOS_SKIPPED_SEQ_DELAY environment variables for subscribeRepos (firehose) serving.
  • AtpRemoteBlob:
    • Add repos property to track which repos have which blobs.
    • Switch image handling to pymediainfo, drop Pillow dependency.
  • did:
  • firehose:
    • Omit prevData in initial commits instead of setting it to null. (It's not a nullable field in subscribeRepos#commit.)
  • repo:
    • Add lost_seq kwarg to repo callback for marking sequence numbers lost.
  • storage:
    • Add new abstract Sequences class and concrete subclasses MemorySequences, DatastoreSequences, and MemcacheSequences.
    • Add new optional sequences kwarg to Storage and subclasses' constructors.
  • xrpc_repo:
    • describe_repo: add app.bsky.graph.listblock.
  • xrpc_sync:
    • get_blob: periodically check remote blobs with HTTP GET requests to see if they're still serving.
    • get_record: include MST covering proof blocks for record.
    • Implement listBlobs.
    • subscribeRepos/`fire
View on GitHub
GitHub Stars76
CategoryDevelopment
Updated2h ago
Forks5

Languages

Python

Security Score

95/100

Audited on Apr 6, 2026

No findings