QuestionPairing
Mern Stack + Machine Learning
Install / Use
/learn @kashaudhan/QuestionPairingREADME
Quora Question Pairing system
Mernstack Machine Learning Application
Fullstack web based machine learning application which tells if the two input/selected questions have similar meaning/intent.
Features:
- Given two input questions the app predicts if they have same meaning/intent.
- The two input questions will be stored in the MongoDB database.
- The questions in the database will be rendered in the UI.
- Form the given list of questions user can select any two questions and can ask to predict for the same and it will give the response accordingly.
Technologies used:
- Frontend:-
React.jsandMaterial UI - Backend: -
Nodejs,Express.js,MongoDB - Machine Learning:-
Python,Ensemble Learning Algorithms,Data Analysis
How this application works:
- On submitting the two input questions it gets stored in the database using the
post()method. - Simultaneously those questions gets passed as parameter to the python script.
- Python script on the server processes the input and gives the predicted result.
- The predicted result gets rendered in the UI.
- And the questions in the database fetched from the database using
get()method to re-render in the UI. - If the user opts to select any two questions from the rendered list then those selected questions is passed to the server side to process.
- After processing the result is displayed on the UI.
To run the python script on server side I have used Nodejs' child_process() method.

or click here to see video demo.
Now Machine Learning part:
- Dataset is taken from Kaggle
- The final training data was prepared after doing some cleaning(removing punctuation, stemming, lemmatisation, etc) and preprocessing(cosine similarity, polarity, question length, etc).
- I have used python's
nltklibrary to do the preprocessing. - Feature Engineering: to generate features out of the cleaned and preprocessed data.
- To train the model I have use lightGBM, RandomForest and XGBoost algorithm. And out of all three XGBoost performs best.
- So, XGBoost model was selected as the final for prediction.
- Model evaluation metric: Log Loss. Log loss from XGBoost 0.428
- To get the code for the ML part goto my other repo
How Cosine Similarity works:
Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.
To read more click here
How Polarity works:
Polarity analysis takes into account the amount of positive or negative terms that appear in a given sentence. It is useful to some extent, since it does a good job of structuring data sets. If two questions have different polarity they have more chances of being different or vice-versa.
To read more click here.
To run this project on your machine:
- Clone this repository to your local machine.
- Make sure you have
node >= v14.15.3installed. - To install the dependencies run
npm install. - run the above command in the backend folder too.
- Now to start server run
nodemon serverin he backend folder. - To start app run
npm startin another terminal. - To run the python script you must have
python >= 3.7installed. - Make sure your server is running before asking the app to predict.
Related Skills
claude-opus-4-5-migration
104.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
345.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
50.6k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.8kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
