SkillAgentSearch skills...

BanditPAM

BanditPAM C++ implementation and Python package

Install / Use

/learn @motiwari/BanditPAM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BanditPAM: Almost Linear-Time $k$-Medoids Clustering

Linux - build package and run tests Linux - build source distribution and wheels Mac ARM64 - build package and run tests Mac ARM64 - Run CMake Build and Tests Mac Intel - build package and run tests Mac Intel - Run CMake Build and Tests MacOS - build wheels pages-build-deployment R-CMD-check.yaml Run style checks

This repo contains a high-performance implementation of BanditPAM from BanditPAM: Almost Linear-Time k-Medoids Clustering and BanditPAM++: Faster k-medoids Clustering. The code can be called directly from Python, R, or C++.

If you use this software, please cite:

Mo Tiwari, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, Ilan Shomorony. "BanditPAM: Almost Linear Time k-medoids Clustering via Multi-Armed Bandits" Advances in Neural Information Processing Systems (NeurIPS) 2020.

Mo Tiwari, Ryan Kang*, Donghyun Lee*, Sebastian Thrun, Chris Piech, Ilan Shomorony, Martin Jinye Zhang. "BanditPAM++: Faster k-medoids Clustering" Advances in Neural Information Processing Systems (NeurIPS) 2023.

@inproceedings{tiwari2020banditpam,
  title={BanditPAM: Almost Linear Time $k$-medoids Clustering via Multi-Armed Bandits},
  author={Tiwari, Mo and Zhang, Martin J and Mayclin, James and Thrun, Sebastian and Piech, Chris and Shomorony, Ilan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={368--374},
  year={2020}
}

@inproceedings{tiwari2023banditpam++,
  title={BanditPAM++: Faster $k$-medoids Clustering},
  author={Tiwari, Mo and Kang, Ryan and Lee, Donghyun and Thrun, Sebastian and Shomorony, Ilan and Zhang, Martin J},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={73371--73382},
  year={2023}
}

Requirements

TL;DR run python -m pip install banditpam or install.packages(banditpam) and jump to the examples.

If you have any difficulties, please see the platform-specific guides and file a Github issue if you have additional trouble.

Further Reading

Python Quickstart

Install the repo and its dependencies:

This can be done either through PyPI (recommended)

/BanditPAM/: python -m pip install -r requirements.txt
/BanditPAM/: python -m pip install banditpam

OR through the source code via

/BanditPAM/: git submodule update --init --recursive
/BanditPAM/: cd headers/carma
/BanditPAM/: mkdir build && cd build && cmake -DCARMA_INSTALL_LIB=ON .. && sudo cmake --build . --config Release --target install
/BanditPAM/: cd ../../..
/BanditPAM/: python -m pip install -r requirements.txt
/BanditPAM/: sudo python -m pip install .

Example 1: Synthetic data from a Gaussian Mixture Model

from banditpam import KMedoids
import numpy as np
import matplotlib.pyplot as plt

# Generate data from a Gaussian Mixture Model with the given means:
np.random.seed(0)
n_per_cluster = 40
means = np.array([[0,0], [-5,5], [5,5]])
X = np.vstack([np.random.randn(n_per_cluster, 2) + mu for mu in means])

# Fit the data with BanditPAM:
kmed = KMedoids(n_medoids=3, algorithm="BanditPAM")
kmed.fit(X, 'L2')

print(kmed.average_loss)  # prints 1.2482391595840454
print(kmed.labels)  # prints cluster assignments [0] * 40 + [1] * 40 + [2] * 40

# Visualize the data and the medoids:
for p_idx, point in enumerate(X):
    if p_idx in map(int, kmed.medoids):
        plt.scatter(X[p_idx, 0], X[p_idx, 1], color='red', s = 40)
    else:
        plt.scatter(X[p_idx, 0], X[p_idx, 1], color='blue', s = 10)

plt.show()

png

Example 2: MNIST and its medoids visualized via t-SNE

# Start in the repository root directory, i.e. '/BanditPAM/'.
from banditpam import KMedoids
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# Load the 1000-point subset of MNIST and calculate its t-SNE embeddings for visualization:
X = pd.read_csv('data/MNIST_1k.csv', sep=' ', header=None).to_numpy()
X_tsne = TSNE(n_components=2).fit_transform(X)

# Fit the data with BanditPAM:
kmed = KMedoids(n_medoids=10, algorithm="BanditPAM")
kmed.fit(X, 'L2')

# Visualize the data and the medoids via t-SNE:
for p_idx, point in enumerate(X):
    if p_idx in map(int, kmed.medoids):
        plt.scatter(X_tsne[p_idx, 0], X_tsne[p_idx, 1], color='red', s = 40)
    else:
        plt.scatter(X_tsne[p_idx, 0], X_tsne[p_idx, 1], color='blue', s = 5)

plt.show()

R Examples

Please see here.

Documentation

Documentation for BanditPAM can be found on read the docs.

Building the C++ executable from source

Please note that it is NOT necessary to build the C++ executable from source to use the Python code above. However, if you would like to use the C++ executable directly, follow the instructions below.

Option 1: Building with Docker

We highly recommend building using Docker. One can download and install Docker by following instructions at the Docker install page. Once you have Docker installed and the Docker Daemon is running, run the following commands:

/BanditPAM/scripts/docker$ chmod +x env_setup.sh
/BanditPAM/scripts/docker$ ./env_setup.sh
/BanditPAM/scripts/docker$ ./run_docker.sh

which will start a Docker instance with the necessary dependencies. Then:

/BanditPAM$ mkdir build && cd build
/BanditPAM/build$ cmake .. && make

This will create an executable named BanditPAM in BanditPAM/build/src.

Option 2: Installing requirements and building directly

Building this repository requires four external requirements:

  • CMake >= 3.17
  • Armadillo >= 10.5.3
  • OpenMP >= 2.5 (OpenMP is supported by default on most Linux platforms, and can be downloaded through homebrew on MacOS)
  • CARMA >= 0.6.2

If installing these requirements from source, one can generally use the following procedure to install each requirement from the library's root folder (with armadillo used as an example here):

/armadillo$ mkdir build && cd build
/armadillo/build$ cmake .. && make && sudo make install

Note that CARMA has different installation instructions; see its instructions.

Platform-specific installation guides

Further installation information for MacOS, Linux, and Windows is available in the docs folder. Ensure all the requirements above are installed and then run:

/BanditPAM$ mkdir build && cd build
/BanditPAM/build$ cmake .. && make

This will create an executable named BanditPAM in BanditPAM/build/src.

C++ Usage

Once the executable has been built, it can be invoked with:

/BanditPAM/build/src/BanditPAM -f [path/to/input.csv] -k [number of clusters]
  • -f is mandatory and specifies the path to the dataset
  • -k is mandatory and specifies the number of clusters with which to fit the data

For

View on GitHub
GitHub Stars655
CategoryEducation
Updated2d ago
Forks50

Languages

C++

Security Score

100/100

Audited on Mar 30, 2026

No findings