ImprovedPLBF
This project contains three Python files that implement Turbo Partitioned Learned Bloom Filters (TurboPLBF) and Differentiable Partitioned Learned Bloom Filters (DiffPLBF). These are improved PLBF (Partitioned Learned Bloom Filters) structures which require significantly lower training times compared to traditional PLBFs.
Install / Use
/learn @mshandilya/ImprovedPLBFREADME
Improved Partitioned Bloom Filters
Overview
This project contains three Python files that implement Turbo Partitioned Learned Bloom Filters (TurboPLBF) and Differentiable Partitioned Learned Bloom Filters (DiffPLBF). The files included are:
turbo_structs.py: Contains utility functions and data structures used by both TurboPLBF and DiffPLBF.turbo_plbf.py: Implements the TurboPLBF construction and evaluation.diff_plbf.py: Implements the DiffPLBF construction and evaluation.
Files and Functions
turbo_structs.py
This file provides several utility functions and classes:
create_ideal_g_h(N): Generates ideal probability mass functionsgandhfor a givenN.simple_g_h(N): Generates simple linear probability mass functionsgandhfor a givenN.DataLoader: A class that loads and prepares probability mass functionsgandhbased on different distributions.load(): Returns the generatedg,h, and their cumulative sumspg,ph.
Node: A class representing a node in a doubly linked list.LinkedList: A class implementing a doubly linked list with various operations likeinsert,remove, andfetchfunctions.ImplicitMatrix: A class representing a matrix whose values are derived from a function. Includes methods to fetch elements and delete rows/columns.
turbo_plbf.py
This file implements the TurboPLBF algorithm:
optFPR(ts): Calculates the optimal False Positive Rates (FPR) for given thresholdsts.normBloom(): Returns the space used by a normal Bloom Filter.spaceUsed(ts): Returns the space used by the TurboPLBF for given thresholdsts.dkl(x, y): Computes the Kullback-Leibler divergence between two indices.reduce(mat_a): Reduces the implicit matrixmat_ato find relevant partitions.max_compute(mat_a): Computes the maximum values in the implicit matrixmat_a.- Main section: Initializes parameters, loads data, and constructs the TurboPLBF. Prints the construction time, partition score, and space used.
diff_plbf.py
This file implements the DiffPLBF algorithm:
optFPR(ts): Calculates the optimal False Positive Rates (FPR) for given thresholdsts.normBloom(): Returns the space used by a normal Bloom Filter.spaceUsed(ts): Returns the space used by the DiffPLBF for given thresholdsts.dkl(l, r): Computes the Kullback-Leibler divergence between two indices.F_func(x): Evaluates the functionFat pointx.G(x): Evaluates the constraint functionGat pointx.get_rand_pt(k): Generates a random point of dimensionk.grad(func, pt): Computes the gradient offuncat pointpt.- Main section: Initializes parameters, loads data, and constructs the DiffPLBF. Prints the construction time, partition score, and space used.
How to Run
-
Ensure you have Python installed on your system. These programs require Python 3 and the following libraries:
numpy
-
To run the TurboPLBF program:
python turbo_plbf.py -
To run the DiffPLBF program:
python diff_plbf.py -
Both scripts will output the construction time, partition score, and space used by the respective PLBF implementations.
Example
Running python turbo_plbf.py will output:
Construction Time: XXXXms
Score of partition: X.XXXX
Space Used (Bloom Filter): X.XXXX Mb
Space Used (TurboPLBF): X.XXXX Mb
Running python diff_plbf.py will output:
Construction Time: XXXXms
Score of partition: X.XXXX
Space Used (Bloom Filter): X.XXXX Mb
Space Used (DiffPLBF): X.XXXX Mb
License
This project is licensed under the MIT License - see the LICENSE file for details.
