Gaia
A decentralized high-performance storage system
Install / Use
/learn @stacks-archive/GaiaREADME
Gaia: A decentralized high-performance storage system
This document describes the high-level design and implementation of the Gaia storage system, also briefly explained in the docs.stacks.co. It includes specifications for backend storage drivers and interactions between developer APIs and the Gaia service.
Developers who wish to use the Gaia storage system should see the stacks.js documentation here and in particular the storage package here.
Instructions on setting up, configuring and testing a Gaia Hub can be found here and here.
Overview
Gaia works by hosting data in one or more existing storage systems of the user's choice. These storage systems are typically cloud storage systems. We currently have driver support for S3 and Azure Blob Storage, but the driver model allows for other backend support as well. The point is, the user gets to choose where their data lives, and Gaia enables applications to access it via a uniform API.
Blockstack applications use the Gaia storage system to store data on behalf of a user. When the user logs in to an application, the authentication process gives the application the URL of a Gaia hub, which performs writes on behalf of that user. The Gaia hub authenticates writes to a location by requiring a valid authentication token, generated by a private key authorized to write at that location.
User Control: How is Gaia Decentralized?
Gaia's approach to decentralization focuses on user-control of data and storage. If a user can choose which gaia hub and which backend provider to store data with, then that is all the decentralization required to enable user-controlled applications.
In Gaia, the control of user data lies in the way that user data is accessed. When an application
fetches a file data.txt for a given user alice.id, the lookup will follow these steps:
- Fetch the zonefile for
alice.id, and read her profile URL from that zonefile - Fetch the Alice's profile and verify that it is signed by
alice.id's key - Read the application root URL (e.g.
https://gaia.alice.org/) out of the profile - Fetch file from
https://gaia.alice.org/data.txt
Because alice.id controls her zonefile, she can change where her profile is stored,
if the current storage of the profile is compromised. Similarly, if Alice wishes to change
her gaia provider, or run her own gaia node, she can change the entry in her profile.
For applications writing directly on behalf of Alice, they do not need to perform this lookup. Instead, the stack.js authentication flow provides Alice's chosen application root URL to the application. This authentication flow is also within Alice's control, because the authentication response must be generated by Alice's browser.
While it is true that many Gaia hubs will use backend providers like AWS or Azure, allowing users to easily operate their own hubs, which may select different backend providers (and we'd like to implement more backend drivers), enables truly user-controlled data, while enabling high performance and high availability for data reads and writes.
Write-to and Read-from URL Guarantees
A performance and simplicity oriented guarantee of the Gaia
specification is that when an application submits a write to a URL
https://myhub.service.org/store/foo/bar, the application is guaranteed to
be able to read from a URL https://myreads.com/foo/bar. While the
prefix of the read-from URL may change between the two, the suffix
must be the same as the write-to URL.
This allows an application to know exactly where a written file can be read from, given the read prefix. To obtain that read prefix, the Gaia service defines an endpoint:
GET /hub_info/
which returns a JSON object with a read_url_prefix.
For example, if my service returns:
{ ...,
"read_url_prefix": "https://myservice.org/read/"
}
I know that if I submit a write request to:
https://myservice.org/store/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json
That I will be able to read that file from:
https://myservice.org/read/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json
Address-based Access-Control
Access control in a gaia storage hub is performed on a per-address
basis. Writes to URLs /store/<address>/<file> are only allowed if
the writer can demonstrate that they control that address. This is
achieved via an authentication token, which is a message signed by
the private-key associated with that address. The message itself is a
challenge-text, returned via the /hub_info/ endpoint.
V1 Authentication Scheme
The V1 authentication scheme uses a JWT, prefixed with v1: as a
bearer token in the HTTP authorization field. The expected JWT payload
structure is:
{
'type': 'object',
'properties': {
'iss': { 'type': 'string' },
'exp': { 'type': 'IntDate' },
'iat': { 'type': 'IntDate' },
'gaiaChallenge': { 'type': 'string' },
'associationToken': { 'type': 'string' },
'salt': { 'type': 'string' }
}
'required': [ 'iss', 'gaiaChallenge' ]
}
In addition to iss, exp, and gaiaChallenge claims, clients may
add other properties (e.g., a salt field) to the payload, and they will
not affect the validity of the JWT. Rather, the validity of the JWT is checked
by ensuring:
- That the JWT is signed correctly by verifying with the pubkey hex provided as
iss - That
issmatches the address associated with the bucket. - That
gaiaChallengeis equal to the server's challenge text. - That the epoch time
expis greater than the server's current epoch time. - That the epoch time
iat(issued-at date) is greater than the bucket's revocation date (only if such a date has been set by the bucket owner).
Association Tokens
The association token specification is considered private, as it is mostly used for internal Gaia use cases. This means that this specification can change or become deprecated in the future.
Often times, a single user will use many different keys to store data. These keys may be generated on-the-fly. Instead of requiring the user to explicitly whitelist each key, the v1 authentication scheme allows the user to bind a key to an already-whitelisted key via an association token.
An association token is a JWT signed by a whitelisted key that, in turn,
contains the public key that signs the authentication JWT that contains it. Put
another way, the Gaia hub will accept a v1 authentication JWT if it contains an
associationToken JWT that (1) was sigend by a whitelisted address, and (2)
identifies the signer of the authentication JWT.
The association token JWT has the following structure in its payload:
{
'type': 'object',
'properties': {
'iss': { 'type': 'string' },
'exp': { 'type': 'IntDate' },
'iat': { 'type': 'IntDate' },
'childToAssociate': { 'type': 'string' },
'salt': { 'type': 'string' },
},
'required': [ 'iss', 'exp', 'childToAssociate' ]
}
Here, the iss field should be the public key of a whitelisted address.
The childtoAssociate should be equal to the iss field of the authentication
JWT. Note that the exp field is required in association tokens.
Legacy authentication scheme
In more detail, this signed message is:
BASE64({ "signature" : ECDSA_SIGN(SHA256(challenge-text)),
"publickey" : PUBLICKEY_HEX })
Currently, challenge-text must match the known challenge-text on the gaia storage hub. However, as future work enables more extensible forms of authentication, we could extend this to allow the auth token to include the challenge-text as well, which the gaia storage hub would then need to also validate.
Data storage format
A gaia storage hub will store the written data exactly as
given. This means that the storage hub does not provide many
different kinds of guarantees about the data. It does not ensure that
data is validly formatted, contains valid signatures, or is
encrypted. Rather, the design philosophy is that these concerns are
client-side concerns. Client libraries (such as stacks.js) are
capable of providing these guarantees, and we use a liberal definition of
the end-to-end principle to guide this design decision.
Operation of a Gaia Hub
Configuration files
A configuration TOML/JSON file should be stored either in the top-level directory
of the hub server, or a file location may be specified in the environment
variable CONFIG_PATH.
An example configuration file is provided in (./hub/config.sample.json) You can specify the logging level, the number of social proofs required for addresses to write to the system, the backend driver, the credentials for that backend driver, and the readURL for the storage provider.
Private hubs
A private hub services requests for a single user. This is controlled via whitelisting the addresses allowed to write files. In order to support application storage, because each application uses a different app- and user-specific address, each application you wish to use must be added to the whitelist separately.
Alternatively, the user's client must use the v1 authentication scheme and generate an association token for each app. The user should whitelist her address, and use her associated private key to sign each app's association token. This removes the need to whitelist each application, but with the caveat
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
