Pdll

Pairwise Difference Learning (PDL) is a meta-learning framework that leverages pairwise differences to transform multiclass problems into binary tasks. This repository includes the original PDL Classifier implementation, along with extended versions for regression and weighted learning scenarios.

Generate Convert Improve

Install / Use

/learn @Karim-53/Pdll

About this skill

Quality Score

0/100

README

Pairwise difference learning library (pdll)

Pairwise Difference Learning (PDL) library is a python module. It contains a scikit-learn compatible implementation of PDL Classifier, as described in Belaid et al. 2024

PDL Classifier or PDC is a meta learner that can reduce multiclass classification problem into a binary classification problem (similar/different).

Installation

To install the package, run the following command:

pip install -U pdll

Usage

from pdll import PairwiseDifferenceClassifier

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_blobs

# Generate random data with 2 features, 10 points, and 3 classes
X, y = make_blobs(n_samples=10, n_features=2, centers=3, random_state=0)

pdc = PairwiseDifferenceClassifier(estimator=RandomForestClassifier())
pdc.fit(X, y)
print('score:', pdc.score(X, y))

y_pred = pdc.predict(X)
proba_pred = pdc.predict_proba(X)

Please consult examples/ directory for more examples.

How does it work?

The PDL algorithm works by transforming the multiclass classification problem into a binary classification problem. The algorithm works as follows:

Example 1: Graphical abstract

Example 2: PDC trained on the Iris dataset

<details> <summary>Clic to show</summary> We provide a minimalist classification example using the Iris dataset. The dataset is balanced, so the prior probabilities of each of the 3 classes are equal: p(Setosa) = p(Versicolour) = p(Virginica) = 1/3

Three Anchor Points

Flower 1: y1 = Setosa
Flower 2: y2 = Versicolour
Flower 3: y3 = Virginica

One Query Point

Flower Q: yq (unknown target)

Pairwise Predictions The model predicts the likelihood that both points have a similar class:

g_sym(Flower Q, Flower 1) = 0.6
g_sym(Flower Q, Flower 2) = 0.3
g_sym(Flower Q, Flower 3) = 0.0

Given the above data, the first step is to update the priors.

Posterior using Flower 1:

p_post,1(Setosa) = 0.6
p_post,1(Versicolour) = (1/3 * (1 - 0.6)) / (1 - 1/3) = 0.2
p_post,1(Virginica) = (1/3 * (1 - 0.6)) / (1 - 1/3) = 0.2

Similarly, we calculate for anchors 2 and 3:

p_post,2(Setosa) = 0.35
p_post,2(Versicolour) = 0.30
p_post,2(Virginica) = 0.35
p_post,3(Setosa) = 0.5
p_post,3(Versicolour) = 0.5
p_post,3(Virginica) = 0.0

Averaging over the three predictions:

Finally, the predicted class is the most likely prediction:

ŷ_q = arg max_{y ∈ Y} p_post(y) = Setosa

</details>

Evaluation

To reproduce the experiment of the paper, please run run_benchmark.py with a base learner and a dataset number, between 0 and 99. Example:

python run_benchmark.py --model DecisionTreeClassifier --data 0

Scores will be stored in ./results/tmp/ directory.

Experiment

We use 99 datasets from the OpenML repository. We compare the performance of the PDC algorithm with 7 base learners. We use the macro F1 score as a metric. The search space is inspired from TPOT a state-of-the-art library in optimizing Sklearn pipelines

<details> <summary>Description of the search space per estimator</summary>

| Estimator | # parameters | # combinations | |------------------------|--------------|----------------| | DecisionTree | 4 | 350 | | RandomForest | 7 | 1000 | | ExtraTree | 6 | 648 | | HistGradientBoosting | 6 | 486 | | Bagging | 6 | 96 | | ExtraTrees | 7 | 1000 | | GradientBoosting | 5 | 900 |

</details> <details> <summary>Search space per estimator</summary>

| Estimator | Parameter |----------------------------|----------------- | DecisionTreeClassifier | criterion | | max depth | | min samples split | | min samples leaf | RandomForestClassifier | criterion | | min samples split | | max features | | min samples leaf | | bootstrap | ExtraTreeClassifier | criterion | | min samples split | | min samples leaf | | max features | | max leaf nodes | | min impurity decrease | HistGradientBoostingClassifier | | learning rate | | max leaf nodes | | min samples leaf | | l2 regularization | | max bins | BaggingClassifier | | max samples | | max features | | bootstrap | | bootstrap features | ExtraTreesClassifier | criterion | | max features | | min samples split | | min samples leaf | | bootstrap | GradientBoostingClassifier | | min samples split | | min samples leaf | | subsample | | max features | Values | -------|--------------------------------------------------------| | gini, entropy | | None, 1, 2, 4, 6, 8, 11 | | 2, 4, 8, 16, 21 | | 1, 2, 4, 10, 21 | | gini, entropy | | 2, 4, 8, 16, 21 | | sqrt, 0.05, 0.17, 0.29, 0.41, 0.52, 0.64, 0.76, 0.88, 1.0 | | 1, 2, 4, 10, 21 | | True, False | | gini, entropy | | 2, 5, 10 | | 1, 2, 4 | | sqrt, log2, None | | None, 2, 12, 56 | | 0.0, 0.1, 0.5 | | max iter | 100, 10 | | 0.1, 0.01, 1 | | 31, 3, 256 | | 20, 4, 64 | | 0, 0.01, 0.1 | | 255, 2, 64 | | n estimators | 10, 5, 100, 256 | | 1.0, 0.5 | | 0.5, 0.9, 1.0 | | True, False | | False, True | | gini, entropy | | sqrt, 0.05, 0.17, 0.29, 0.41, 0.52, 0.64, 0.76, 0.88, 1.0 | | 2, 4, 8, 16, 21 | | 1, 2, 4, 10, 21 | | False, True | | learning rate | 0.1, 0.01, 1 | | 2, 4, 8, 16, 21 | | 1, 2, 4, 10, 21 | | 1.0, 0.05, 0.37, 0.68 | | None, 0.15, 0.68 |

</details> <details> <summary>OpenML benchmark datasets</summary>

| data_id | NumberOfClasses | NumberOfInstances | NumberOfFeatures | NumberOfSymbolicFeatures | NumberOfFeatures_post_processing | MajorityClassSize | MinorityClassSize | |----------:|------------------:|--------------------:|-------------------:|---------------------------:|-----------------------------------:|--------------------:|--------------------:| | 43 | 2 | 306 | 4 | 2 | 3 | 225 | 81 | | 48 | 3 | 151 | 6 | 3 | 5 | 52 | 49 | | 59 | 2 | 351 | 35 | 1 | 34 | 225 | 126 | | 61 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 164 | 2 | 106 | 58 | 58 | 57 | 53 | 53 | | 333 | 2 | 556 |

Related Skills

claude-opus-4-5-migration

104.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

345.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

feishu-drive

345.4k

things-mac

345.4k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)