RwandaNameGenderModel
A lightweight machine learning model for gender prediction based on Rwandan names using character-level n-gram features and logistic regression.
Install / Use
/learn @benax-rw/RwandaNameGenderModelREADME
RwandaNameGenderModel
RwandaNameGenderModel is a machine learning model that predicts gender based on Rwandan names — whether a first name, surname, or both in any order. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions — achieving 96%+ accuracy on both validation and test sets.
🧠 Summary
- Type: Classic ML (Logistic Regression)
- Input: Rwandan name (flexible: single or full name)
- Vectorization: Character-level n-grams (2–3 chars)
- Framework: scikit-learn
- Training Set: 66,735 names (out of 83,419)
- Validation/Test Accuracy: ~96.6%
📁 Project Structure
RwandaNameGenderModel/
├── dataset/
│ └── rwandan_names.csv
├── model/
│ ├── logistic_model.joblib
│ └── vectorizer.joblib
├── logs/
│ └── metrics_log.txt
├── train.py
├── inference.py
├── README.md
└── requirements.txt
🚀 Quickstart
1. Install requirements
pip install -r requirements.txt
2. Train the model
python train.py
3. Predict gender from a name using script
Run interactive inference with:
python inference.py
4. Predict gender from a name using Python code
from joblib import load
model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")
def predict_gender(name):
X = vectorizer.transform([name])
return model.predict(X)[0]
# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel") # Output: "male"
predict_gender("Baziramwabo") # Output: "male"
predict_gender("Baziramwabo Gabriel") # Output: "male"
predict_gender("Gabriel Baziramwabo") # Output: "male"
📈 Performance
| Dataset | Accuracy | Precision | Recall | F1-Score | |------------|----------|-----------|--------|----------| | Validation | 96.72% | 96.90% | 96.53% | 96.72% | | Test | 96.64% | 96.94% | 96.34% | 96.64% |
Metrics are logged in both logs/metrics_log.txt and TensorBoard format.
🌍 Use Cases
- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora
🛡️ Ethical Note
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.
📄 License
This project is maintained by Gabriel Baziramwabo and is open for research and educational use. For commercial use, please contact the author.
🤝 Contributing
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!
🔗 Links
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
