Dataloader
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
Install / Use
/learn @graphql/DataloaderREADME
DataLoader
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.
A port of the "Loader" API originally developed by [@schrockn][] at Facebook in 2010 as a simplifying force to coalesce the sundry key-value store back-end APIs which existed at the time. At Facebook, "Loader" became one of the implementation details of the "Ent" framework, a privacy-aware data entity loading and caching layer within web server product code. This ultimately became the underpinning for Facebook's GraphQL server implementation and type definitions.
DataLoader is a simplified version of this original idea implemented in JavaScript for Node.js services. DataLoader is often used when implementing a [graphql-js][] service, though it is also broadly useful in other situations.
This mechanism of batching and caching data requests is certainly not unique to Node.js or JavaScript, it is also the primary motivation for Haxl, Facebook's data loading library for Haskell. More about how Haxl works can be read in this blog post.
DataLoader is provided so that it may be useful not just to build GraphQL services for Node.js but also as a publicly available reference implementation of this concept in the hopes that it can be ported to other languages. If you port DataLoader to another language, please open an issue to include a link from this repository.
Getting Started
First, install DataLoader using npm.
npm install --save dataloader
To get started, create a DataLoader. Each DataLoader instance represents a
unique cache. Typically instances are created per request when used within a
web-server like [express][] if different users can see different things.
Note: DataLoader assumes a JavaScript environment with global ES6
PromiseandMapclasses, available in all supported versions of Node.js.
Batching
Batching is not an advanced feature, it's DataLoader's primary feature. Create loaders by providing a batch loading function.
const DataLoader = require('dataloader');
const userLoader = new DataLoader(keys => myBatchGetUsers(keys));
A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values<sup>*</sup>.
Then load individual values from the loader. DataLoader will coalesce all individual loads which occur within a single frame of execution (a single tick of the event loop) and then call your batch function with all requested keys.
const user = await userLoader.load(1);
const invitedBy = await userLoader.load(user.invitedByID);
console.log(`User 1 was invited by ${invitedBy}`);
// Elsewhere in your application
const user = await userLoader.load(2);
const lastInvited = await userLoader.load(user.lastInvitedID);
console.log(`User 2 last invited ${lastInvited}`);
A naive application may have issued four round-trips to a backend for the required information, but with DataLoader this application will make at most two.
DataLoader allows you to decouple unrelated parts of your application without sacrificing the performance of batch data-loading. While the loader presents an API that loads individual values, all concurrent requests will be coalesced and presented to your batch loading function. This allows your application to safely distribute data fetching requirements throughout your application and maintain minimal outgoing data requests.
Batch Function
A batch loading function accepts an Array of keys, and returns a Promise which
resolves to an Array of values or Error instances. The loader itself is provided
as the this context.
async function batchFunction(keys) {
const results = await db.fetchAllKeys(keys);
return keys.map(key => results[key] || new Error(`No result for ${key}`));
}
const loader = new DataLoader(batchFunction);
There are a few constraints this function must uphold:
- The Array of values must be the same length as the Array of keys.
- Each index in the Array of values must correspond to the same index in the Array of keys.
For example, if your batch function was provided the Array of keys: [ 2, 9, 6, 1 ],
and loading from a back-end service returned the values:
{ id: 9, name: 'Chicago' }
{ id: 1, name: 'New York' }
{ id: 2, name: 'San Francisco' }
Our back-end service returned results in a different order than we requested, likely
because it was more efficient for it to do so. Also, it omitted a result for key 6,
which we can interpret as no value existing for that key.
To uphold the constraints of the batch function, it must return an Array of values
the same length as the Array of keys, and re-order them to ensure each index aligns
with the original keys [ 2, 9, 6, 1 ]:
[
{ id: 2, name: 'San Francisco' },
{ id: 9, name: 'Chicago' },
null, // or perhaps `new Error()`
{ id: 1, name: 'New York' },
];
Batch Scheduling
By default DataLoader will coalesce all individual loads which occur within a
single frame of execution before calling your batch function with all requested
keys. This ensures no additional latency while capturing many related requests
into a single batch. In fact, this is the same behavior used in Facebook's
original PHP implementation in 2010. See enqueuePostPromiseJob in the
[source code][] for more details about how this works.
However sometimes this behavior is not desirable or optimal. Perhaps you expect
requests to be spread out over a few subsequent ticks because of an existing use
of setTimeout, or you just want manual control over dispatching regardless of
the run loop. DataLoader allows providing a custom batch scheduler to provide
these or any other behaviors.
A custom scheduler is provided as batchScheduleFn in options. It must be a
function which is passed a callback and is expected to call that callback in the
immediate future to execute the batch request.
As an example, here is a batch scheduler which collects all requests over a 100ms window of time (and as a consequence, adds 100ms of latency):
const myLoader = new DataLoader(myBatchFn, {
batchScheduleFn: callback => setTimeout(callback, 100),
});
As another example, here is a manually dispatched batch scheduler:
function createScheduler() {
let callbacks = [];
return {
schedule(callback) {
callbacks.push(callback);
},
dispatch() {
callbacks.forEach(callback => callback());
callbacks = [];
},
};
}
const { schedule, dispatch } = createScheduler();
const myLoader = new DataLoader(myBatchFn, { batchScheduleFn: schedule });
myLoader.load(1);
myLoader.load(2);
dispatch();
Caching
DataLoader provides a memoization cache for all loads which occur in a single
request to your application. After .load() is called once with a given key,
the resulting value is cached to eliminate redundant loads.
Caching Per-Request
DataLoader caching does not replace Redis, Memcache, or any other shared
application-level cache. DataLoader is first and foremost a data loading mechanism,
and its cache only serves the purpose of not repeatedly loading the same data in
the context of a single request to your Application. To do this, it maintains a
simple in-memory memoization cache (more accurately: .load() is a memoized function).
Avoid multiple requests from different users using the DataLoader instance, which could result in cached data incorrectly appearing in each request. Typically, DataLoader instances are created when a Request begins, and are not used once the Request ends.
For example, when using with [express][]:
function createLoaders(authToken) {
return {
users: new DataLoader(ids => genUsers(authToken, ids)),
};
}
const app = express();
app.get('/', function (req, res) {
const authToken = authenticateUser(req);
const loaders = createLoaders(authToken);
res.send(renderPage(req, loaders));
});
app.listen();
Caching and Batching
Subsequent calls to .load() with the same key will result in that key not
appearing in the keys provided to your batch function. However, the resulting
Promise will still wait on the current batch to complete. This way both cached
and uncached requests will resolve at the same time, allowing DataLoader
optimizations for subsequent dependent loads.
In the example below, User 1 happens to be cached. However, because User 1
and 2 are loaded in the same tick, they will resolve at the same time. This
means both user.bestFriendID loads will also happen in the same tick which
results in two total requests (the same as if User 1 had not been cached).
userLoader.prime(1, { bestFriend: 3 });
async function getBestFriend(userID) {
const user = await userLoader.load(userID);
return await userLoader.load(user.bestFriendID);
}
// In one part of your application
getBestFriend(1);
// Elsewhere
getBestFriend(2);
Without this optimization, if the cached User 1 resolved immediately, this
could result in three total requests since each user.bestFriendID load would
happen at different times.
Clearing Cache
In certain uncommon cases, clearing the request cache may be necessary.
The most common example when clearing the loader's cache is necessary is after a mutation or update within the same request, when a cached value could be out of date and future loads
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
