Tinysearch
Semantic Search Engine using BERT embeddings
Install / Use
/learn @manishpatel005/TinysearchREADME
tinysearch
Semantic Search Engine using BERT embeddings This is a project done as a part of CSCE 636 (Neural Networks).
"Please note that support for this repo has been closed. "
Existing search engines use keyword matching or tf-idf based matching to map the query to the web-documentsand rank them. They also consider other factors such as pagerank, hubs-and-authority scores, knowledge graphs to make theresults more meaningful. However, the existing search enginesfail to capture the meaning of query when it becomes largeand complex. BERT, introduced by Google in 2018, provides embeddings for words as well as sentences. In this project, I have developed a semantics-oriented search engine using neural networks and BERT embeddings that can search for query and rank the documents in the order of the most meaningful to least-meaningful. The results shows improvement over one existing search engine for complex queries for given set of documents.
Install Dependencies:
- pip install bert-serving-server from here
- pip install tensorflow
- pip install tkinter
- pip install keras
How to run:
-
Download the two folders (uncased and model) from this zipped file from drive.
-
Run the bert-serving server as follows:
bert-serving-start -model_dir=uncased_L-12_H-768_A-12/ -tuned_model_dir=model/ -ckpt_name=model.ckpt-78 -num_worker=1 -pooling_strategy=CLS_TOKEN -max_seq_len=125 -num_worker=4 -
To train the model run:
python generate_embeddings.pyand thenpython train.py. Generate embeddings fetches the embeddings of quora-question-pairs and saves them. The train file loads the embeddings and trains the neural network model. -
To run the GUI type:
python gui_v4.py
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
