Scientist.net

A .NET library for carefully refactoring critical paths. It's a port of GitHub's Ruby Scientist library

Generate Convert Improve

Install / Use

/learn @scientistproject/Scientist.net

About this skill

Quality Score

0/100

README

Scientist.NET

A .NET Port of the Scientist library for carefully refactoring critical paths.

To give it a twirl, use NuGet to install: Install-Package Scientist

How do I science?

Let's pretend you're changing the way you handle permissions in a large web app. Tests can help guide your refactoring, but you really want to compare the current and refactored behaviors under load.

using GitHub;

...

public bool CanAccess(IUser user)
{
    return Scientist.Science<bool>("widget-permissions", experiment =>
    {
        experiment.Use(() => IsCollaborator(user)); // old way
        experiment.Try(() => HasAccess(user)); // new way
    }); // returns the control value
}

Wrap a Use block around the code's original behavior, and wrap Try around the new behavior. Invoking Scientist.Science<T> will always return whatever the Use block returns, but it does a bunch of stuff behind the scenes:

It decides whether or not to run the Try block,
By default randomises the order in which Use and Try blocks are run,
Measures the durations of all behaviors,
Compares the result of Try to the result of Use,
Swallows (but records) any exceptions raised in the Try block, and
Publishes all this information.

The Use block is called the control. The Try block is called the candidate.

If you don't declare any Try blocks, none of the Scientist machinery is invoked and the control value is always returned.

Making science useful

Publishing results

What good is science if you can't publish your results?

By default results are published in an in-memory publisher. To override this behavior, create your own implementation of IResultPublisher:

public class MyResultPublisher : IResultPublisher
{
    public Task Publish<T, TClean>(Result<T, TClean> result)
    {
        Logger.Debug($"Publishing results for experiment '{result.ExperimentName}'");
        Logger.Debug($"Result: {(result.Matched ? "MATCH" : "MISMATCH")}");
        Logger.Debug($"Control value: {result.Control.Value}");
        Logger.Debug($"Control duration: {result.Control.Duration}");
        foreach (var observation in result.Candidates)
        {
            Logger.Debug($"Candidate name: {observation.Name}");
            Logger.Debug($"Candidate value: {observation.Value}");
            Logger.Debug($"Candidate duration: {observation.Duration}");
        }

        if (result.Mismatched)
        {
            // saved mismatched experiments to DB
            DbHelpers.SaveExperimentResults(result);
        }

        return Task.FromResult(0);
    }
}

Then set Scientist to use it before running the experiments:

Scientist.ResultPublisher = new MyResultPublisher();

As of v1.0.2, A IResultPublisher can also be wrapped in FireAndForgetResultPublisher so that result publishing avoids any delays in running experiments and is delegated to another thread:

Scientist.ResultPublisher = new FireAndForgetResultPublisher(new MyResultPublisher(onPublisherException));

Controlling comparison

Scientist compares control and candidate values using ==. To override this behavior, use Compare to define how to compare observed values instead:

public IUser GetCurrentUser(string hash)
{
    return Scientist.Science<IUser>("get-current-user", experiment =>
    {
        experiment.Compare((x, y) => x.Name == y.Name);

        experiment.Use(() => LookupUser(hash));
        experiment.Try(() => RetrieveUser(hash));
    });
}

Adding context

Results aren't very useful without some way to identify them. Use the AddContext method to add to the context for an experiment:

public IUser GetUserByName(string userName)
{
    return Scientist.Science<IUser>("get-user-by-name", experiment =>
    {
        experiment.AddContext("username", userName);

        experiment.Use(() => FindUser(userName));
        experiment.Try(() => GetUser(userName));
    });
}

AddContext takes a string identifier and an object value, and adds them to an internal Dictionary. When you publish the results, you can access the context by using the Contexts property:

public class MyResultPublisher : IResultPublisher
{
    public Task Publish<T, TClean>(Result<T, TClean> result)
    {
        foreach (var kvp in result.Contexts)
        {
            Console.WriteLine($"Key: {kvp.Key}, Value: {kvp.Value}");
        }
        return Task.FromResult(0);
    }
}

Expensive setup

If an experiment requires expensive setup that should only occur when the experiment is going to be run, define it with the BeforeRun method:

public int DoSomethingExpensive()
{
    return Scientist.Science<int>("expensive-but-worthwile", experiment =>
    {
        experiment.BeforeRun(() => ExpensiveSetup());

        experiment.Use(() => TheOldWay());
        experiment.Try(() => TheNewWay());
    });
}

Keeping it clean

Sometimes you don't want to store the full value for later analysis. For example, an experiment may return IUser instances, but when researching a mismatch, all you care about is the logins. You can define how to clean these values in an experiment:

public IUser GetUserByEmail(string emailAddress)
{
    return Scientist.Science<IUser, string>("get-user-by-email", experiment =>
    {
        experiment.Use(() => OldApi.FindUserByEmail(emailAddress));
        experiment.Try(() => NewApi.GetUserByEmail(emailAddress));
        
        experiment.Clean(user => user.Login);
    });
}

And this cleaned value is available in the final published result:

public class MyResultPublisher : IResultPublisher
{
    public Task Publish<T, TClean>(Result<T, TClean> result)
    {
        // result.Control.Value = <IUser object>
        IUser user = (IUser)result.Control.Value;
        Console.WriteLine($"Login from raw object: {user.Login}");
        
        // result.Control.CleanedValue = "user name"
        Console.WriteLine($"Login from cleaned object: {result.Control.CleanedValue}");
        
        return Task.FromResult(0);
    }
}

Ignoring mismatches

During the early stages of an experiment, it's possible that some of your code will always generate a mismatch for reasons you know and understand but haven't yet fixed. Instead of these known cases always showing up as mismatches in your metrics or analysis, you can tell an experiment whether or not to ignore a mismatch using the Ignore method. You may include more than one block if needed:

public bool CanAccess(IUser user)
{
    return Scientist.Science<bool>("widget-permissions", experiment =>
    {
        experiment.Use(() => IsCollaborator(user));
        experiment.Try(() => HasAccess(user));

        // user is staff, always an admin in the new system
        experiment.Ignore((control, candidate) => user.IsStaff);
        // new system doesn't handle unconfirmed users yet
        experiment.Ignore((control, candidate) => control && !candidate && !user.ConfirmedEmail);
    });
}

The ignore blocks are only called if the values don't match. If one observation raises an exception and the other doesn't, it's always considered a mismatch. If both observations raise different exceptions, that is also considered a mismatch.

Enabling/disabling experiments

Sometimes you don't want an experiment to run. Say, disabling a new codepath for anyone who isn't staff. You can disable an experiment by setting a RunIf block. If this returns false, the experiment will merely return the control value. Otherwise, it defers to the global Scientist.Enabled method.

public decimal GetUserStatistic(IUser user)
{
    return Scientist.Science<decimal>("new-statistic-calculation", experiment =>
    {
        experiment.RunIf(() => user.IsTestSubject);

        experiment.Use(() => CalculateStatistic(user));
        experiment.Try(() => NewCalculateStatistic(user));
    });
}

Ramping up experiments

As a scientist, you know it's always important to be able to turn your experiment off, lest it run amok and result in villagers with pitchforks on your doorstep. You can set a global switch to control whether or not experiments is enabled by using the Scientist.Enabled method.

int percentEnabled = 10;
Random rand = new Random();
Scientist.Enabled(() =>
{
    return rand.Next(100) < percentEnabled;
});

This code will be invoked for every method with an experiment every time, so be sensitive about its performance. For example, you can store an experiment in the database but wrap it in various levels of caching.

Running candidates in parallel (asynchronous)

Scientist runs tasks synchronously by default. This can end up doubling (more or less) the time it takes the original method call to complete, depending on how many candidates are added and how long they take to run.

In cases where Scientist is used for production refactoring, for example, this ends up causing the calling method to return slower than before which may affect the performance of your original code. However, if the candidates can be run at the same time as the control method without affecting each other, then they can be run in parallel so the Scientist call will only take as long as the slowest task (plus a tiny bit of overhead):

await Scientist.ScienceAsync<int>(
	"ExperimentName",
	3, // number of tasks to run concurrently 
	experiment => {
        experiment.Use(async () => await StartRunningSomething(myData));
        experiment.Try(asyn

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。