Metrics
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave
Install / Use
/learn @benhamner/MetricsREADME
Note: the current releases of this toolbox are a beta release, to test working with Haskell's, Python's, and R's code repositories.

Metrics provides implementations of various supervised machine learning evaluation metrics in the following languages:
- Python
easy_install ml_metrics - R
install.packages("Metrics")from the R prompt - Haskell
cabal install Metrics - MATLAB / Octave (clone the repo & run setup from the MATLAB command line)
For more detailed installation instructions, see the README for each implementation.
EVALUATION METRICS
<table> <tr><td>Evaluation Metric</td><td>Python</td><td>R</td><td>Haskell</td><td>MATLAB / Octave</td></tr> <tr><td>Absolute Error (AE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Average Precision at K (APK, AP@K)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Area Under the ROC (AUC)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Classification Error (CE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>F1 Score (F1)</td><td> </td><td>✓</td><td> </td><td></td></tr> <tr><td>Gini</td><td> </td><td> </td><td> </td><td>✓</td></tr> <tr><td>Levenshtein</td><td>✓</td><td> </td><td>✓</td><td>✓</td></tr> <tr><td>Log Loss (LL)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Log Loss (LogLoss)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Absolute Error (MAE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Average Precision at K (MAPK, MAP@K)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Quadratic Weighted Kappa</td><td>✓</td><td>✓</td><td> </td><td>✓</td></tr> <tr><td>Mean Squared Error (MSE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Mean Squared Log Error (MSLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Normalized Gini</td><td> </td><td> </td><td> </td><td>✓</td></tr> <tr><td>Quadratic Weighted Kappa</td><td>✓</td><td>✓</td><td> </td><td>✓</td></tr> <tr><td>Relative Absolute Error (RAE)</td><td> </td><td>✓</td><td> </td><td> </td></tr> <tr><td>Root Mean Squared Error (RMSE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Relative Squared Error (RSE)</td><td> </td><td>✓</td><td> </td><td> </td></tr> <tr><td>Root Relative Squared Error (RRSE)</td><td> <td>✓</td> </td><td> </td><td></td></tr> <tr><td>Root Mean Squared Log Error (RMSLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Squared Error (SE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> <tr><td>Squared Log Error (SLE)</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td></tr> </table>TO IMPLEMENT
- F1 score
- Multiclass log loss
- Lift
- Average Precision for binary classification
- precision / recall break-even point
- cross-entropy
- True Pos / False Pos / True Neg / False Neg rates
- precision / recall / sensitivity / specificity
- mutual information
HIGHER LEVEL TRANSFORMATIONS TO HANDLE
- GroupBy / Reduce
- Weight individual samples or groups
PROPERTIES METRICS CAN HAVE
(Nonexhaustive and to be added in the future)
- Min or Max (optimize through minimization or maximization)
- Binary Classification
- Scores predicted class labels
- Scores predicted ranking (most likely to least likely for being in one class)
- Scores predicted probabilities
- Multiclass Classification
- Scores predicted class labels
- Scores predicted probabilities
- Regression
- Discrete Rater Comparison (confusion matrix)
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
