SkillAgentSearch skills...

Submodlib

Summarize Massive Datasets using Submodular Optimization

Install / Use

/learn @decile-team/Submodlib

README

<p align="center"> <br> &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp <img src="https://github.com/decile-team/submodlib/blob/master/submodlib_logo.png" width="500" /> </br> </p>

About SubModLib

SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.

Please check out our latest arxiv preprint: https://arxiv.org/abs/2202.10680

Salient Features

  • Rich suite of functions for a wide variety of subset selection tasks:
    • regular set (submodular) functions
    • submodular mutual information functions
    • conditional gain functions
    • conditional mutual information functions
  • Supports different types of optimizers
    • naive greedy
    • lazy (accelerated) greedy
    • stochastic (random) greedy
    • lazier than lazy greedy
  • Combines the best of Python's ease of use and C++'s efficiency
  • Rich API which gives a variety of options to the user. See this notebook for an example of different usage patterns
  • De-coupled function and optimizer paradigm makes it suitable for a wide-variety of tasks
  • Comprehensive documentation (available here)

Google Colab Notebooks Demonstrating the power of SubModLib and sample usage

Setup

Alternative 1

  • $ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Alternative 2 (if local docs need to be built and test cases need to be run)

  • $ git clone https://github.com/decile-team/submodlib.git
  • $ cd submodlib
  • $ pip install .
  • Latest documentation is available at readthedocs. However, if local documentation is required to be built, follow these steps::
    • $ pip install -U sphinx
    • $ pip install sphinxcontrib-bibtex
    • $ pip install sphinx-rtd-theme
    • $ cd docs
    • $ make clean html
  • To run the tests, follow these steps:
    • $ pip install pytest
    • $ pytest # this runs ALL tests
    • $ pytest -m <marker> --verbose --disable-warnings -rA # this runs test specified by the <marker>. Possible markers are mentioned in pyproject.toml file.

Usage

It is very easy to get started with submodlib. Using a submodular function in submodlib essentially boils down to just two steps:

  1. instantiate the corresponding function object
  2. invoke the desired method on the created object

The most frequently used methods are:

  1. f.evaluate() - takes a subset and returns the score of the subset as computed by the function f
  2. f.marginalGain() - takes a subset and an element and returns the marginal gain of adding the element to the subset, as computed by f
  3. f.maximize() - takes a budget and an optimizer to return an optimal set as a result of maximizing f

For example,

from submodlib import FacilityLocationFunction
objFL = FacilityLocationFunction(n=43, data=groundData, mode="dense", metric="euclidean")
greedyList = objFL.maximize(budget=10,optimizer='NaiveGreedy')

For a more detailed discussion on all possible usage patterns, please see Different Options of Usage

Functions

Modelling Capabilities of Different Functions

We demonstrate the representational power and modeling capabilities of different functions qualitatively in the following Google Colab notebooks:

View on GitHub
GitHub Stars124
CategoryDevelopment
Updated4h ago
Forks45

Languages

Jupyter Notebook

Security Score

100/100

Audited on Apr 6, 2026

No findings