PrediChurn
A full ML pipeline for customer churn prediction in telecom, banking, or SaaS. Includes robust data cleaning, automatic feature engineering, model training/tuning (Logistic Regression, RF, XGBoost), interpretability, and interactive dashboards for actionable business retention insights.
Install / Use
/learn @vishnupriyanpr/PrediChurnREADME
PrediChurn 🚦 – End-to-End Customer Churn Prediction Suite
"Transforming churn risk into retention strategies with advanced ML."
🔍 Powered by: XGBoost, Random Forest, Optuna, SHAP
🧑💻 Engineered by: vishnupriyanpr
Overview 🚀
PrediChurn is a robust, modular machine learning pipeline for customer churn prediction. Designed for telecom, SaaS, and banking datasets, it automates data wrangling, business-driven feature engineering, model selection, evaluation, and produces clear business insights and analytics dashboards. Its outputs guide retention teams toward targeted, ROI-driven customer strategies.
Key Features 🧠
- 🔄 Multi-model engine: Logistic Regression, Random Forest, XGBoost—all Optuna-optimized
- 🛠️ Feature engineering: Tenure, ARPU, contract/payment, and behavior features with full NaN/infinite safety
- 🔍 Explainable AI: SHAP for both global and local churn driver visualization
- 📊 Business metrics: Churn rate, “revenue at risk”, “potential revenue saved”, intervention ROI
- 📑 Automated reporting: Executive summaries, actionable recommendations, and visualization outputs
ML Pipeline Details 🏗️
1. Data Preparation
- Loads raw CSV data
- Cleans missing values and outliers
- Encodes categoricals
- Scales numerical data
2. Feature Engineering
- Generates >10 additional business-focused features (e.g., avg_charges_per_tenure, high_value_customer)
- Handles division-by-zero/NaN/infinite edge cases
3. Modeling and Optimization
- Trains Logistic Regression, Random Forest, and XGBoost models
- Balances training data with SMOTE for rare churn events
- Hyperparameter tuning via Optuna for best ROC-AUC
4. Evaluation
- Measures: accuracy, precision, recall, ROC-AUC
- Generates confusion matrix, ROC, Precision-Recall plots
5. Explainability
- Computes and saves SHAP summary and bar plots
- Ranks top churn features both globally and per-customer
6. Business Analytics
- Calculates "revenue at risk", "potential savings", intervention efficiency
- Generates markdown and visual HTML reports
- Top churn drivers and segment-wise actionable steps
Workflow 🔁
-
Clone Project & Install
git clone https://github.com/vishnupriyanpr/churnguard-ai.git cd churnguard-ai pip install -r requirements.txt -
Prepare Dataset
- Place your CSV data in
data/raw/telco_churn.csv(Kaggle Telco Churn format recommended)
- Place your CSV data in
-
Run Pipeline
python main.py -
View Outputs
- Metrics, SHAP PNGs, and business report: in
reports/ - Model artifacts: in
models/
- Metrics, SHAP PNGs, and business report: in
Workflow ER Diagram 🗺️
erDiagram
RAW_DATA {
string customerID
string features
string churn_label
}
PROCESSED_DATA {
string encoded_features
string target
}
ENGINEERED_DATA {
string new_features
}
TRAIN_DATA {
string balanced_features
string balanced_target
}
MODEL {
string model_type
string hyperparameters
string trained_weights
}
METRICS {
float accuracy
float precision
float recall
float roc_auc
}
SHAP_PLOTS {
string summary_plot
string feature_importance
}
BUSINESS_REPORT {
string revenue_at_risk
string recommendations
string top_drivers
}
RAW_DATA ||--o{ PROCESSED_DATA : cleaned_and_preprocessed
PROCESSED_DATA ||--o{ ENGINEERED_DATA : feature_engineered
ENGINEERED_DATA ||--o{ TRAIN_DATA : balanced_with_SMOTE
TRAIN_DATA ||--o{ MODEL : trained_to
MODEL ||--o{ METRICS : generates
MODEL ||--o{ SHAP_PLOTS : explains
METRICS ||--o{ BUSINESS_REPORT : summarized_in
SHAP_PLOTS ||--o{ BUSINESS_REPORT : visualized_in
Key Results (Latest Run) 📊
- Accuracy: 78.1%
- Precision: 57.9%
- Recall: 65.0%
- ROC-AUC: 0.822
- Churn Rate: 26.5%
- Revenue at Risk: $374,000
- Potential Revenue Saved: $72,900
- Intervention Efficiency: 57.2%
- Top Churn Drivers:
- avg_charges_per_tenure (0.132)
- MonthlyCharges (0.083)
- charges_trend (0.076)
- TotalCharges (0.076)
- price_per_month_ratio (0.075)
🧾 Business Recommendations
- Immediate Action: Target high-risk (churn prob > 70%) with retention offers
- Monitor Medium-Risk: Engage the 30–70% churn probability group
- Feature Focus: Optimize avg_charges_per_tenure and related drivers
- Ongoing Scoring: Recompute churn risk monthly for all customers
Project Structure 📁
churnguard-ai/
├── data/
│ ├── raw/
│ └── processed/
├── models/
├── reports/
├── src/
│ ├── data_loader.py
│ ├── data_preprocessor.py
│ ├── feature_engineer.py
│ ├── model_trainer.py
│ ├── model_evaluator.py
│ └── utils.py
├── main.py
├── requirements.txt
└── README.md
Output 🖼
License 📜
MIT License — use, modify, and scale freely!
Credits 🙌
<div align="center"> <table style="width:100%;"> <tr> <td align="center" style="width:50%;"> <a href="https://github.com/vishnupriyanpr"> <img src="https://github.com/vishnupriyanpr.png?size=120" width="120px;" alt="Vishnupriyan P R"/> </a> </td> <td align="center" style="width:50%;"> <blockquote> <p>“Tools should disappear into the background and let you build.”</p> <footer>— Vishnupriyan P R, <i>caffeinated coder ☕</i></footer> </blockquote> </td> </tr> </table> </div>Related Skills
claude-opus-4-5-migration
108.0kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
347.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
50.8k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.8kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
