PyPruningRadixTrie
PyPruningRadixTrie - Python version of super fast Radix trie for prefix search & auto-complete
Install / Use
/learn @otto-de/PyPruningRadixTrieREADME
PyPruningRadixTrie
Python Port of Pruning Radix Trie by Wolf Garbe.
Changes compared to original version
- Removed parameter to disable pruning behavior.
- See
test/non_pruning_radix_trie.pyfor a non-pruning version that you can use to see the speed improvement.
- See
- Added outline for more generic
InputProviderand providers that read CSV or JSON as examples
What and Why
A Trie is a tree data structure that is commonly used for searching terms
that start with a given prefix.
It starts with an empty string at the base of the trie, the root node.
Adding a new entry to the trie creates a new branch. This branch shares already present characters with existing nodes
and creates new nodes when it's prefix diverges from the present entries.
# trie containing flower & flowchart (1 char = 1 node)
'' - f - l - o - w - e - r
|
c - h - a - r - t
A RadixTrie is the space optimized version of a Trie.
It combines the nodes with only one sub-node into one, containing more than one character.
# radix trie containing flower & flowchart
'' - flow - er
|
chart
The prefix Pruning references the algorithm to query the RadixTrie.
In order for the pruning to work, the nodes in RadixTrie are stored in a sorted manner.
This structure allows cutting off unpromising branches during querying the trie which makes the algorithm way faster
compared to a non-pruning trie.
Usage
Get from PyPI:
pip install pypruningradixtrie
Create the PRT:
# empty trie
trie = PruningRadixTrie()
# fill with data from CSV file on creation
trie = PruningRadixTrie('./test_data.csv', CSVInputProvider(',', lambda x: float(x[1])))
Add entries:
CSV:
# fill with data from CSV file, score is at position 1, term at position 0
fill_trie_from_file(trie, './test_data.csv', CSVInputProvider(',', lambda x: float(x[1]), 0))
JSON:
# define a functon to calculate the score out of a JSON entry
def score_fun(json_entry: Dict[str, Any]) -> float:
return json_entry["pages"] * json_entry["year"] / 10.0
# "title" = key for term to insert into PRT
fill_trie_from_file(trie, './test_data.json', JSONInputProvider("title", score_fun))
Single Entry:
# insert single entry
insert_term(trie, term="flower", score=20)
Use the PRT:
# get the top 10 entries that start with 'flower'
trie.get_top_k_for_prefix('flower', 10)
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
