Metrics

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Generate Convert Improve

Install / Use

/learn @benhamner/Metrics

About this skill

Quality Score

0/100

README

Note: the current releases of this toolbox are a beta release, to test working with Haskell's, Python's, and R's code repositories.

Build Status

Metrics provides implementations of various supervised machine learning evaluation metrics in the following languages:

Python easy_install ml_metrics
R install.packages("Metrics") from the R prompt
Haskell cabal install Metrics
MATLAB / Octave (clone the repo & run setup from the MATLAB command line)

For more detailed installation instructions, see the README for each implementation.

EVALUATION METRICS

<table> <tr><td>Evaluation Metric</td><td>Python</td><td>R</td><td>Haskell</td><td>MATLAB / Octave</td></tr> <tr><td>Absolute Error (AE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Average Precision at K (APK, AP@K)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Area Under the ROC (AUC)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Classification Error (CE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>F1 Score (F1)</td><td> </td><td>✓</td><td> </td><td></td></tr> <tr><td>Gini</td><td> </td><td> </td><td> </td><td>✓</td></tr> <tr><td>Levenshtein</td><td>✓</td><td> </td><td>✓</td><td>✓</td></tr> <tr><td>Log Loss (LL)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Log Loss (LogLoss)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Absolute Error (MAE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Average Precision at K (MAPK, MAP@K)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Quadratic Weighted Kappa</td><td>✓</td><td>✓</td><td> </td><td>✓</td></tr> <tr><td>Mean Squared Error (MSE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Squared Log Error (MSLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Normalized Gini</td><td> </td><td> </td><td> </td><td>✓</td></tr> <tr><td>Quadratic Weighted Kappa</td><td>✓</td><td>✓</td><td> </td><td>✓</td></tr> <tr><td>Relative Absolute Error (RAE)</td><td> </td><td>✓</td><td> </td><td> </td></tr> <tr><td>Root Mean Squared Error (RMSE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Relative Squared Error (RSE)</td><td> </td><td>✓</td><td> </td><td> </td></tr> <tr><td>Root Relative Squared Error (RRSE)</td><td> <td>✓</td> </td><td> </td><td></td></tr> <tr><td>Root Mean Squared Log Error (RMSLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Squared Error (SE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Squared Log Error (SLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> </table>

TO IMPLEMENT

F1 score
Multiclass log loss
Lift
Average Precision for binary classification
precision / recall break-even point
cross-entropy
True Pos / False Pos / True Neg / False Neg rates
precision / recall / sensitivity / specificity
mutual information

HIGHER LEVEL TRANSFORMATIONS TO HANDLE

GroupBy / Reduce
Weight individual samples or groups

PROPERTIES METRICS CAN HAVE

(Nonexhaustive and to be added in the future)

Min or Max (optimize through minimization or maximization)
Binary Classification
- Scores predicted class labels
- Scores predicted ranking (most likely to least likely for being in one class)
- Scores predicted probabilities
Multiclass Classification
- Scores predicted class labels
- Scores predicted probabilities
Regression
Discrete Rater Comparison (confusion matrix)

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

13.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

benhamner

View profile

View on GitHub

GitHub Stars1.7k

CategoryEducation

Updated16d ago

Forks454

benhamner/Metrics

Languages

Python

Security Score

80/100

Audited on Mar 12, 2026

No findings