TinySearchEngine
Vector space model implemented in a few lines
Install / Use
/learn @felipehummel/TinySearchEngineREADME
Tiny Search
How many lines of code it takes to write a reasonable, understandable, full-text search engine? The code in this repository can give an easy and fast overview on the Vector Space Model (tf-idf). Feel free to contribute with improvements and other language implementations.
Other languages
Feel free to submit pull requests with implementations in any other languages. You can follow the same requirements of the Scala version:
- in-memory index;
- norms and IDF calculated online;
- default OR operator between query terms;
- index a document per line from a single file.
- read stopwords from a file
Scala
There are two Scala versions of the Vector Space Model. They are similar, except that "freakinTinySearch.scala" squeezes some more lines by getting rid of classes.
Warnings:
- I only tested the Scala code with 2.9.
- This is not intented for real world production code. It is just for fun and educational purposes.
- The Scala code calculates document norm and term IDF on-the-fly while processing the query. This is far from optimal, but it makes things shorter.
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
