SharpToken

SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.

Generate Convert Improve

Install / Use

/learn @dmitry-brazhenko/SharpToken

About this skill

Quality Score

0/100

README

SharpToken

SharpToken is a C# library that serves as a port of the Python tiktoken library. It provides functionality for encoding and decoding tokens using GPT-based encodings. This library is built for .NET 6, .NET 8 and .NET Standard 2.0, making it compatible with a wide range of frameworks.

[!Important] The functionality in SharpToken has been added to Microsoft.ML.Tokenizers. Microsoft.ML.Tokenizers is a tokenizer library being developed by the .NET team and going forward, the central place for tokenizer development in .NET. By using Microsoft.ML.Tokenizers, you should see improved performance over existing tokenizer library implementations, including SharpToken. A stable release of Microsoft.ML.Tokenizers is expected alongside the .NET 9.0 release (November 2024). Instructions for migration can be found at https://github.com/dotnet/machinelearning/blob/main/docs/code/microsoft-ml-tokenizers-migration-guide.md.

Installation

To install SharpToken, use the NuGet package manager:

Install-Package SharpToken

Or, if you prefer using the .NET CLI:

dotnet add package SharpToken

For more information, visit the NuGet package page.

Usage

To use SharpToken in your project, first import the library:

using SharpToken;

Next, create an instance of GptEncoding by specifying the desired encoding or model:

// Get encoding by encoding name
var encoding = GptEncoding.GetEncoding("cl100k_base");

// Get encoding by model name
var encoding = GptEncoding.GetEncodingForModel("gpt-4");

You can then use the Encode method to encode a string:

var encoded = encoding.Encode("Hello, world!"); // Output: [9906, 11, 1917, 0]

And use the Decode method to decode the encoded tokens:

var decoded = encoding.Decode(encoded); // Output: "Hello, world!"

SharpToken also provides a high performance count method. It is usefull to check prompt size before sending it to a LLM or to use it in a TextSplitter/Chunker for RAG.

var count = encoding.CountTokens("Hello, world!"); // Output: 4

Supported Models

SharpToken currently supports the following models:

r50k_base
p50k_base
p50k_edit
cl100k_base
o200k_base
o200k_harmony
claude

You can use any of these encodings when creating an instance of GptEncoding:

var r50kBaseEncoding = GptEncoding.GetEncoding("r50k_base");
var p50kBaseEncoding = GptEncoding.GetEncoding("p50k_base");
var p50kEditEncoding = GptEncoding.GetEncoding("p50k_edit");
var cl100kBaseEncoding = GptEncoding.GetEncoding("cl100k_base");
var o200kBaseEncoding = GptEncoding.GetEncoding("o200k_base");
var o200kHarmonyEncoding = GptEncoding.GetEncoding("o200k_harmony");
var claudeEncoding = GptEncoding.GetEncoding("claude");

Claude Model Support

The claude encoding uses Anthropic's official tokenizer vocabulary with NFKC normalization. It is accurate for pre-Claude 3 models and a rough approximation for Claude 3+.

var encoding = GptEncoding.GetEncodingForModel("claude-3.5-sonnet");
var count = encoding.CountTokens("Hello, Claude!");

All claude-* model names are supported (e.g. claude-3-opus, claude-3.5-sonnet, claude-3.7-sonnet, claude-4-sonnet).

Model Prefix Matching

Apart from specifying direct model names, SharpToken also provides functionality to map model names based on specific prefixes. This allows users to retrieve an encoding based on a model's prefix.

Here are the current supported prefixes and their corresponding encodings:

| Model Prefix | Encoding | | ---------------- | ------------- | | claude- | claude | | gpt-5 | o200k_base | | gpt-4o | o200k_base | | gpt-4- | cl100k_base | | gpt-3.5-turbo- | cl100k_base | | gpt-35-turbo | cl100k_base |

Examples of model names that fall under these prefixes include:

For the prefix claude-: claude-3-opus-20240229, claude-3.5-sonnet-20241022, etc.
For the prefix gpt-5: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, gpt-5-thinking, gpt-5-2024-08-07, etc.
For the prefix gpt-4o: gpt-4o, gpt-4o-2024-05-13, etc.
For the prefix gpt-4-: gpt-4-0314, gpt-4-32k, etc.
For the prefix gpt-3.5-turbo-: gpt-3.5-turbo-0301, gpt-3.5-turbo-0401, etc.
For the Azure deployment name gpt-35-turbo.

To retrieve the encoding name based on a model name or its prefix, you can use the GetEncodingNameForModel method:

string encodingName = Model.GetEncodingNameForModel("claude-3.5-sonnet"); // Returns "claude"
string encodingName = Model.GetEncodingNameForModel("gpt-4-0314");        // Returns "cl100k_base"

If the provided model name doesn't match any direct model names or prefixes, an exception is thrown.

Understanding Encoded Values

When you encode a string using the Encode method, the returned value is a list of integers that represent tokens in the specified encoding. These tokens are a compact way of representing the input text and can be processed more efficiently by various algorithms.

For example, encoding the text "Hello world!" using the cl100k_base encoding might produce the following list of integers:

var encoded = cl100kBaseEncoding.Encode("Hello world!"); // Output: [9906, 1917, 0]

You can then use the Decode method to convert these tokenized integer values back into the original text:

var decoded = cl100kBaseEncoding.Decode(encoded); // Output: "Hello world!"

With SharpToken, you can seamlessly switch between different encodings to find the one that best suits your needs. Just remember to use the same encoding for both the Encode and Decode methods to ensure accurate results.

Advanced usage

Custom Allowed Sets

SharpToken allows you to specify custom sets of allowed special tokens when encoding text. To do this, pass a HashSet<string> containing the allowed special tokens as a parameter to the Encode method:

const string encodingName = "cl100k_base";
const string inputText = "Some Text <|endofprompt|>";
var allowedSpecialTokens = new HashSet<string> { "<|endofprompt|>" };

var encoding = GptEncoding.GetEncoding(encodingName);
var encoded = encoding.Encode(inputText, allowedSpecialTokens);
var expectedEncoded = new List<int> { 8538, 2991, 220, 100276 };

Assert.Equal(expectedEncoded, encoded);

Custom Disallowed Sets

Similarly, you can specify custom sets of disallowed special tokens when encoding text. Pass a HashSet<string> containing the disallowed special tokens as a parameter to the Encode method:

const string encodingName = "cl100k_base";
const string inputText = "Some Text";

var encoding = GptEncoding.GetEncoding(encodingName);

void TestAction()
{
    encoding.Encode(inputText, disallowedSpecial: new HashSet<string> { "Some" });
}

Assert.Throws<ArgumentException>(TestAction);

In this example, an ArgumentException is thrown because the input text contains a disallowed special token

Testing and Validation

SharpToken includes a set of test cases in the TestPlans.txt file to ensure its compatibility with the Python tiktoken library. These test cases validate the functionality and behavior of SharpToken, providing a reliable reference for developers. Running the unit tests and verifying the test cases helps maintain consistency between the C# SharpToken library and the original Python implementation.

Performance Compared to TiktokenSharp and TokenizerLib

SharpToken is the fastest library with the lowest allocations!

<details> <summary>Benchmark Code</summary>

[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net471)]
[RPlotExporter]
[MemoryDiagnoser]
public class CompareBenchmark
{
    private GptEncoding _sharpToken;
    private TikToken _tikToken;
    private ITokenizer _tokenizer;
    private Tokenizer _mlTokenizer;
    private string _kLongText;

    [GlobalSetup]
    public async Task Setup()
    {
        _sharpToken = GptEncoding.GetEncoding("cl100k_base");
        _tikToken = await TikToken.GetEncodingAsync("cl100k_base").ConfigureAwait(false);
        _tokenizer = await TokenizerBuilder.CreateByModelNameAsync("gpt-4").ConfigureAwait(false);
        _kLongText = "King Lear, one of Shakespeare's darkest and most savage plays, tells the story of the foolish and Job-like Lear, who divides his kingdom, as he does his affections, according to vanity and whim. Lear’s failure as a father engulfs himself and his world in turmoil and tragedy.";
    }

    [Benchmark]
    public int SharpToken()
    {
        var sum = 0;
        for (var i = 0; i < 10000; i++)
        {
            var encoded = _sharpToken.Encode(_kLongText);
            var decoded = _sharpToken.

Related Skills

diffs

341.8k

Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

1.9k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-designer

Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>

dmitry-brazhenko

View profile

View on GitHub

GitHub Stars255

CategoryDesign

Updated1d ago

Forks19

dmitry-brazhenko/SharpToken

Languages

Security Score

100/100

Audited on Mar 29, 2026

No findings