Tokenizer
Pure Go implementation of OpenAI's tiktoken tokenizer
Install / Use
/learn @tiktoken-go/TokenizerREADME
Tokenizer
This is a pure go port of OpenAI's tokenizer.
<a href="https://www.buymeacoffee.com/mwahlmann" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-blue.png" alt="Buy Me A Coffee" height="41" width="174"></a>
Usage
package main
import (
"fmt"
"github.com/tiktoken-go/tokenizer"
)
func main() {
enc, err := tokenizer.Get(tokenizer.Cl100kBase)
if err != nil {
panic("oh oh")
}
// this should print a list of token ids
ids, _, _ := enc.Encode("supercalifragilistic")
fmt.Println(ids)
// this should print the original string back
text, _ := enc.Decode(ids)
fmt.Println(text)
}
Alternatively you can use the included command-line tool
> tokenizer -h
Usage of tokenizer:
-decode string
tokens to decode
-encode string
text to encode
-token string
text to calculate token
> tokenizer -encode supercalifragilistic
Todo
- ✅ port code
- ✅ o200k_base encoding
- ✅ cl100k_base encoding
- ✅ r50k_base encoding
- ✅ p50k_base encoding
- ✅ p50k_edit encoding
- ✅ tests
- ❌ handle special tokens
- ❌ gpt-2 model
Caveats
This library embeds OpenAI's vocabularies—which are not small (~4Mb)— as go maps. This is different than what the way python version of tiktoken works, which downloads the dictionaries and puts them in a cache folder.
However, since the dictionaries are compiled during the go build process the performance and start-up times should be better than downloading and loading them at runtime.
Alternatives
Here is a list of other libraries that do something similar.
- https://github.com/sugarme/tokenizer (A different tokenizer algorithm than OpenAI's)
- https://github.com/pandodao/tokenizer-go (deprecated, calls into JavaScript)
- https://github.com/pkoukk/tiktoken-go
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
