Content-Based Image Retrieval Syetem

Project Objectives

Extracted keypoint detectors and local invariant descriptors of each image in the dataset and stored them in HDF5.
Clustered the extracted features in HDF5 to form a codebook (resulting centroids of each clustered futures) and visualized each codeword (the centroid) inside the codebook.
Constructed a bag-of-visual-words (BOVW) representation for each image by quantizing the associated feature vectors into histogram using the codebook created.
Accepted a query image from the user, constructed the BOVW representation for the query, and performed the actual search.
Implemented term frequency-inverse document frequency and spatial verification to improve the accuracy of the system.

Software/Package Used

Algorithms & Methods Involved

Keypoints and descriptors extraction
- Fast Hessian keypoint detector algorithms
- Local scale-invariant feature descriptors (RootSIFT)
Feature storage and indexing
- Structure HDF5 dataset
Clustering features to generate a codebook
- K-means algorithms
Visualizing codeword entries (centroids of clustered features)
Vector quantization
- BOVW extraction
- BOVW storage and indexing
Inverted indexing
- Implement redis for inverted index
Search performing
System accuracy evaluation
- "Points-based" metric
Term frequency-inverse document frequency (tf-idf)
Spatial verification (Future Plan)
- Random Sample Consensus (RANSAC)

Approaches

The dataset is about 1000 images from UKBench dataset.
The figure below shows the CBIR search pipelines.

Results

Extract keypoints and descriptors

This is the step 1 in building the bag of visual word (BOVW).

In order to extract features from each image in the dataset, I use Fast Hessian method for keypoint detectors and use RootSIFT for local invariant descriptors.

The descriptors/ directory (inside image_search_engine/image_search_pipeline/ directory) contains detectanddescribe.py (check here), which implements to extract keypoints and local invariant descriptors from the dataset.

The index/ directory inside image_search_engine/image_search_pipeline/ directory contains object-oriented interfaces to the HDF5 dataset to store features. In this part, baseindexer.py (check here) and featureindexer.py (check here) are used for storing features.

The index_fetures.py (check here) is the driver script for gluing all pieces mentioned above. After running this driver script, I have the features.hdf5 file shown below, which has about 556 MB.

Using the following command line will run the index_features.py driver script.

python index_features.py --dataset ukbench --features_db output/features.hdf5

Figure 1: features.hdf5 file, which contains all the features extracted from the whole dataset.

The Figure 2 shows a sample of interior structure inside features.hdf5 file. I use HDF5 because of the ease of interaction with the data. We can store huge amounts of data in our HDF5 dataset and manipulate the data using NumPy. In addition, the HDF5 format is standardized, meaning that datasets stored in HDF5 format are inherently portable and can be accessed by other developers using different programming languages, such as C, MATLAB, and Java.

Figure 2: A sample of interior structure of the features.hdf5.

The image_ids dataset has shape (M,) where M is total number of examples in dataset (in this case, M = 1000). And image_ids is corresponding to the filename.

The index dataset has shape (M, 2) and stores two integers, indicating indexes into features dataset for image i.

The features dataset has shape (N, 130), where N is the total number of feature vectors extracted from M images in the dataset (in this case, N = 523,505). First two columns are the (x, y)-coordinates of the keypoint associated with the feature vector. The other 128 columns are from RootSIFT feature vectors.

Cluster features

This is the step 2 in building the bag of visual word (BOVW).

The next step is to cluster extracted feature vectors to form "vocabulary", or simply result the cluster centers generated by the K-means algorithm.

Concept of bag of the visual word

The goal is to take an image that is represented using multiple feature vectors and then construct a histogram for each image of image patch occurrences that tabulate the frequency of each of these prototype vectors. A "prototype" vector is simply a "visual word" — it’s an abstract quantification of a region in an image. Some visual words may encode for corner regions. Others visual words may represent edges. Even other visual words symbolize areas of low texture. Some sample examples of the "visual word" will be demonstrated in next part.

The Vocabulary class inside vocabulary.py (check here) from information_retrieval/ directory (inside image_search_engine/image_search_pipeline/ directory) is used to ingest features.hdf5 dataset of features and then return cluster centers of visual words. These visual words will serve as our vector prototypes when I quantize the feature vectors into a single histogram of visual word occurrences in one of the following step.

The cluster_fetures.py (check here) is the driver script that clusters features.

The MiniBatchKMeans is used, which is a more efficient and scalable version of the original k-means algorithm. It essentially works by breaking the dataset into small segments, clustering each of the segments individually, then merging the clusters resulting from each of these segments together to form the final solution. This is in stark contrast to the standard k-means algorithm which clusters all of the data in a single segment. While the clusters obtained from mini-batch k-means aren’t necessarily as accurate as the ones using the standard k-means algorithm, the primary benefit is that mini-batch k-means is that it’s often an order of magnitude (or more) faster than standard k-means.

Using following command will cluster the features inside HDF5 file to generate a codebook. The clustered features will store inside pickle file.

python cluster_features.py --features_db output/features.hdf5 --codebook output/vocab.cpickle --clusters 1536 --percentage 0.25

Figure 3: vocab.cpickle file ("codebook" or "vocabulary") contains 1536 cluster centers.

Visualize features

The visualize_centers.py (check here) can help us to visualize the cluster centers from the codebook.

Using following command will create a visualization on each codeword inside codebooks (each centroid of clustered features).

python visualize_centers.py --dataset ukbench --features_db output/features.hdf5 --codebook output/vocab.cpickle --output output/vw_vis

This process takes about 60 - 90 mins to finish depend on the computers.

Here are a few samples (grayscale) of visualizing the features.

Figure 4: Book-title features (left), Leaves-of-tree features (right).

Figure 5: Detailed grass features (left), Car-light features (right).

Figure 6: Store-logo features (left), car-dashboard features (right).

Vector quantization

This is the last step in building the bag of visual word (BOVW).

There are multiple feature vectors per image by detecting keypoints and describing the image region surround each of these keypoints. These feature vectors are (more or less) unsuitable to directly applying CBIR or image classification algorithms.

What I need is a method to take these sets of feature vectors and combine them in a way that:

results in a single feature vector per image.
does not reduce the discriminative power of local features.

CBIR

Install / Use

README