SkillAgentSearch skills...

CATTLEX

AI POWERED LIVESTOCK HEALTH MONITORING

Install / Use

/learn @Jeyabalan1304/CATTLEX
About this skill

Quality Score

0/100

Category

Operations

Supported Platforms

Universal

README

Two-Stage Hierarchical Cattle Disease Classification Pipeline

Overview

This project implements a hierarchical machine learning pipeline for the classification of cattle diseases based on clinical symptom scores. The system operates in two distinct stages:

  1. Stage 1: Category Classification: Uses Logistic Regression to classify symptoms into broad disease categories (e.g., Respiratory, Digestive, Infectious, etc.).
  2. Stage 2: Specific Disease Identification: Uses Random Forest classifiers targeted to the predicted category to identify the specific disease.

This hierarchical approach is designed to improve model interpretability and handle the complexities of multi-class disease diagnosis more effectively than a flat classification model.

Project Structure

Dataset Details

The dataset validated_cattlex_dataset.csv consists of approximately 2,044 samples.

Features

The models use 5 clinical aggregated symptom scores:

  • respiratory_score
  • digestive_score
  • mobility_score
  • skin_score
  • systemic_score

Targets

  • Stage 1: disease_category (6 unique categories)
  • Stage 2: disease_name (26 unique diseases)

Installation & Setup

Ensure you have Python 3 installed. You can install the required dependencies using pip:

pip install pandas numpy scikit-learn matplotlib seaborn

How to Run

Using the Notebook

Open hierarchical_cattle_disease_classification.ipynb in VS Code or Jupyter Lab and run all cells to see the full analysis, training process, and visualizations.

Using the Script

To run the automated pipeline from the terminal, execute:

python run_hierarchical_classification.py

This will:

  1. Load the dataset.
  2. Train and evaluate the Stage 1 Logistic Regression model using Stratified K-Fold.
  3. Train and evaluate Stage 2 Random Forest models for each category.
  4. Output classification reports and performance metrics.

Performance

The pipeline uses Stratified K-Fold Cross-Validation to ensure robust performance metrics, focusing on F1-score to balance precision and recall across all disease classes.

View on GitHub
GitHub Stars50
CategoryOperations
Updated1mo ago
Forks0

Languages

Jupyter Notebook

Security Score

80/100

Audited on Feb 12, 2026

No findings