SkillAgentSearch skills...

HaxlSharp

Automatically concurrent data fetching and request deduplication in C#.

Install / Use

/learn @joashc/HaxlSharp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

HaxlSharp

A C# implementation of Haxl for composable data fetching with automatic concurrency and request deduplication. Not affiliated with Facebook in any way!

Table of Contents

Quick start

Install from nuget: https://www.nuget.org/packages/HaxlSharp

Before you can use the library, you'll need to write a thin layer to get your existing data sources integrated with HaxlSharp- see the Integration section, or you can check out an example application using HaxlFetch here.

Once that's done, you can write your data fetches in a sequential way, and the framework will automatically perform requests as concurrently as possible, and do request deduplication.

What's wrong with async/ await?

Async/ await is great for writing sequential-looking code when you're only waiting for a single asynchronous request at a time, allowing us to write code without worrying about asynchronicity. But we often want to combine information from multiple data sources, like different calls on the same API, or multiple remote APIs.

The async/ await abstraction breaks down in these situations (and Javascript's async/await is no different). To illustrate, let's say we have a blogging site, and a post's metadata and content are retrieved using separate API calls. We could use async/ await to fetch both these pieces of information:

public async Task<PostDetails> GetPostDetails(int postId)
{
    var postInfo = await FetchPostInfo(postId);
    var postContent = await FetchPostContent(postId);
    return new PostDetails(postInfo, postContent);
}

Here, we're making two successive await calls, which means the execution will be suspended at the first request- FetchPostInfo- and only begin executing the second request- FetchPostContent- once the first request has completed.

But fetching FetchPostContent doesn't require the result of FetchPostInfo, which means we could have started both these requests concurrently! The "correct" way to write it is:

var postInfoTask = FetchPostInfo(postId);
var postContentTask = FetchPostContent(postId);
return new PostDetails(await postInfo, await postContent);

But now we are dealing with tasks instead of their values; it's up to the programmer to ensure the task is awaited as late as possible. Async/ await is a good abstraction for asynchronous code, but writing concurrent code requires us to mix code that describes what we want to fetch with how we want to fetch it.

Composing async methods

To make matters worse, we can easily call our inefficient GetPostDetails method in a way that compounds the oversequentialization:

public async Task<IEnumerable<PostDetails> LatestPostContent()
{
  var latest = await GetTwoLatestPostIds();
  var first = await GetPostDetails(latest.Item1);
  var second = await GetPostDetails(latest.Item2);
  return new List<PostContent>{first, second};
}

This code will sequentially execute four calls that could have been executed concurrently! We should actually write our code like this:

var latest = await GetTwoLatestPostIds();
var first = GetPostDetails(latest.Item1);
var second = GetPostDetails(latest.Item2);
return new List<PostContent> { await first, await second };

What's wrong with Task.WhenAll/ Promise.all?

We can manually add concurrency by giving up sequential-looking code that doesn't make a distinction between async values and "normal" values. In practice, this means dealing with both tasks and their awaited values, and sprinkling our code with Task.WhenAll.

But hang on, async/await was designed to solve these problems:

  • Writing asynchronous code is error-prone
  • Asynchronous code obscures the meaning of what we're trying to achieve
  • Programmers are bad at reasoning about asynchronous code

Giving up our sequential abstraction means these exact problems have reemerged in the context of concurrency!

  • Writing concurrent code is error-prone
  • Concurrent code obscures the meaning of what we're trying to achieve
  • Programmers are bad at reasoning about concurrent code

Haxl: reclaiming the sequential abstraction

Haxl allows us to write code that looks like it operates sequentially on "normal values", but is capable of being analyzed to determine the requests we can fetch concurrently, and then automatically batch these requests into a list.

This has a number of advantages over async/await and Task.WhenAll:

  • We can write code that uses the results of asynchronous requests, without the risk of losing concurrency.
  • Multiple requests to a single endpoint can be batched and handled more efficiently- for example, multiple concurrent requests to an SQL database could be rewritten into a single SELECT statement.
  • We only fetch duplicate requests once, even if the duplicate requests are started concurrently- something we can't achieve with async or Task.WhenAll.
  • Only fetching data once ensures data remains consistent within a request.

Taken together, these advantages leave programmers free to compose complex data fetches, without worrying about concurrency or duplication. It also lessens the need to traverse large parts of the stack to write specialized data fetching methods. Only a small number of "primitive requests" need to be written across the stack; the rest can be composed from these primitives as necessary.

Let's rewrite GetPostDetails using HaxlSharp:

Fetch<PostDetails> GetPostDetails(int postId) =>
    from info in FetchPostInfo(postId)
    from content in FetchPostContent(postId)
    select new PostDetails(info, content);

The framework can automatically work out that these calls can be parallelized. Here's the debug log from when we fetch GetPostDetails(1):

==== Batch ====
Fetched 'info': PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}
Fetched 'content': Post 1

==== Result ====
PostDetails { Info: PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}, Content: 'Post 1' }

Both requests were automatically placed in a single batch and fetched concurrently!

Composing requests

Let's compose our new GetPostDetails function:

Fetch<List<PostDetails>> GetLatestPostDetails() =>
  from latest in FetchTwoLatestPostIds()
  // We must wait here
  from first in GetPostDetails(latest.Item1)
  from second in GetPostDetails(latest.Item2)
  select new List<PostDetails> { first, second };

If we fetch this, we get:

==== Batch ====
Fetched 'latest': (0, 1)

==== Batch ====
Fetched 'content': Post 1
Fetched 'info': PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}
Fetched 'content': Post 0
Fetched 'info': PostInfo { Id: 0, Date: 11/06/2016, Topic: 'Topic 0'}

==== Result ====
[ PostDetails { Info: PostInfo { Id: 0, Date: 11/06/2016, Topic: 'Topic 0'}, Content: 'Post 0' },
PostDetails { Info: PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}, Content: 'Post 1' } ]

The framework has worked out that we have to wait for the first call's result before continuing, because we rely on this result to execute our subsequent calls. But the subsequent two calls only depend on latest, so once latest is fetched, they can both be fetched concurrently!

Note that we made two parallelizable calls to GetPostDetails, which is itself made up of two parallelizable requests. These requests were "pulled out" and placed into a single batch of four concurrent requests. Let's see what happens if we rewrite GetPostDetails so that it must make two sequential requests:

Fetch<PostDetails> GetPostDetails(int postId) =>
    from info in FetchPostInfo(postId)
    // We need to wait for the result of info before we can get this id!
    from content in FetchPostContent(info.Id)
    select new PostDetails(info, content);

now when we fetch GetLatestPostDetails, we get:

==== Batch ====
Fetched 'latest': (0, 1)

==== Batch ====
Fetched 'info': PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}
Fetched 'info': PostInfo { Id: 0, Date: 11/06/2016, Topic: 'Topic 0'}

==== Batch ====
Fetched 'content': Post 1
Fetched 'content': Post 0

==== Result ====
[ PostDetails { Info: PostInfo { Id: 0, Date: 11/06/2016, Topic: 'Topic 0'}, Content: 'Post 0' },
PostDetails { Info: PostInfo { Id: 1, Date: 10/06/2016, Topic: 'Topic 1'}, Content: 'Post 1' } ]

The info requests within GetPostDetails can be fetched with just the result of latest, so they were batched together. The remaining content batch can resume once the info batch completes.

Request deduplication

Because we lazily compose our requests, we can keep track of every subrequest within a particular request, and only fetch a particular subrequest once, even if they're part of the same bat

View on GitHub
GitHub Stars140
CategoryDevelopment
Updated9mo ago
Forks3

Languages

C#

Security Score

87/100

Audited on Jun 30, 2025

No findings