CATTLEX
AI POWERED LIVESTOCK HEALTH MONITORING
Install / Use
/learn @Jeyabalan1304/CATTLEXREADME
Two-Stage Hierarchical Cattle Disease Classification Pipeline
Overview
This project implements a hierarchical machine learning pipeline for the classification of cattle diseases based on clinical symptom scores. The system operates in two distinct stages:
- Stage 1: Category Classification: Uses Logistic Regression to classify symptoms into broad disease categories (e.g., Respiratory, Digestive, Infectious, etc.).
- Stage 2: Specific Disease Identification: Uses Random Forest classifiers targeted to the predicted category to identify the specific disease.
This hierarchical approach is designed to improve model interpretability and handle the complexities of multi-class disease diagnosis more effectively than a flat classification model.
Project Structure
- hierarchical_cattle_disease_classification.ipynb: The main Jupyter notebook containing data exploration, model training, and performance evaluation.
- run_hierarchical_classification.py: A production-ready Python script that executes the complete two-stage training and validation pipeline.
- validated_cattlex_dataset.csv: The validated dataset containing symptom scores and disease labels.
- notebook_script.txt: A text version of the notebook logic.
- IEEE_single_column_high_clarity.pdf: Supporting documentation/technical paper.
- stage1_confusion_matrix.png: Visual representation of the Stage 1 classification performance.
Dataset Details
The dataset validated_cattlex_dataset.csv consists of approximately 2,044 samples.
Features
The models use 5 clinical aggregated symptom scores:
respiratory_scoredigestive_scoremobility_scoreskin_scoresystemic_score
Targets
- Stage 1:
disease_category(6 unique categories) - Stage 2:
disease_name(26 unique diseases)
Installation & Setup
Ensure you have Python 3 installed. You can install the required dependencies using pip:
pip install pandas numpy scikit-learn matplotlib seaborn
How to Run
Using the Notebook
Open hierarchical_cattle_disease_classification.ipynb in VS Code or Jupyter Lab and run all cells to see the full analysis, training process, and visualizations.
Using the Script
To run the automated pipeline from the terminal, execute:
python run_hierarchical_classification.py
This will:
- Load the dataset.
- Train and evaluate the Stage 1 Logistic Regression model using Stratified K-Fold.
- Train and evaluate Stage 2 Random Forest models for each category.
- Output classification reports and performance metrics.
Performance
The pipeline uses Stratified K-Fold Cross-Validation to ensure robust performance metrics, focusing on F1-score to balance precision and recall across all disease classes.
