Rustpotion
Blazingly fast word embeddings with Tokenlearn
Install / Use
/learn @Aveygo/RustpotionREADME
RustPotion
Oxidized Tokenlearn for blazingly fast word embeddings.
Example
cargo add rustpotion
use rustpotion::{RustPotion, PotionModel};
use std::path::Path;
fn main() {
let model = RustPotion::new(PotionModel::BASE2M, Path::new("models"));
model.encode("test");
}
Why
> be me
> saw cool project
> said "why not rust"
> Now cool project is in rust
Speed
Because tokenlearn is so blazingly fast (mainly cause it's only just an average of some word vectors), the limiting factor is actually the tokenizer implementation.
That's why it's good news that we get ~27MB/s of input sentences for potion-base-2M, which on par, if not marginally better, with most other high performing tokenizers.
I will note that I used a custom tokenization function so it might not produce the same results for hyper-specific edge cases (eg: weird unicode characters), but otherwise should be good enough for 99.99% of inputs.
Accuracy
Here is the expected performance of tokenlearn.
| Name | MTEB Score | | --- | --- | | potion-base-8M | 50.03 | | potion-base-4M | 48.23 | | potion-base-2M | 44.77 |
Limitations
- Only english; unicode slicing is pain
- No python bindings, just use Tokenlearn (it's secretly rust if you look deep enough)
- RustPotion::encode_many is multithreaded and will use all available resources
- No limit on sentence length, but performance starts to dip after 500 tokens (~100 words) so be careful.
Warning
If you feed in a an empty string:
model.encode("");
The resulting embedding will be a vector of length zero. May or may not be intended behavior, especially if you try to apply normalization.
Related Skills
node-connect
335.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.7kCommit, push, and open a PR
