A Comparison of Spatial Search Using K-D Trees And Quadtrees

This project compares and analyzes the performance of two popular spatial indexing data structures: K-D Trees and Quadtrees for insertion, search and finding the nearest neighbors of points on a 2D plane.

General Information
Design Choices
Comparative Analysis
Conclusion
Contact

General Information

The utilization of spatial data in modern Geographical Information Systems and Image Processing has led to increased research towards efficient spatial search techniques. For finding nearest neighbors of a given point with millions of points in the data set, the conventional queries are too slow to use in practice. K-D Trees and Quadtrees are two efficient data structures for spatial search implementations which support faster organization and processing of spatial data points.

QuadTree

Each Quadtree node has at most four children. The children nodes can be visualized as four quadrants of a 2-dimensional plane. The data points are organized by recursively subdividing the nodes into four quadrants and associating the data with leaf nodes. The subdivided areas can be squares or rectangles depending upon the division.

Some applications of Quadtrees include:

Computer vision, CAD, street maps (Google Maps / Google Earth).
Virtual reality
Computer graphics, games, movies
Image Compression

Construction:

To construct a Quadtree from a 2-D space:

Split the 2-D plane into quadrants.
If a quadrant contains one or more points in it, create a child node.
If a quadrant does not contain any points, do not create child nodes.
Recursively follow the steps for each children.

Figure 1: Quadtree for a set of 6 points.

Insertion:

To insert a given point into the Quadtree, we first check whether the point is within the bounds of the Quadtree window. If not, we cease the insertion. If the point lies within the window boundaries, we create the appropriate child to contain this element.

Search:

To search a given point in the Quadtree, the children are recursed until the leaf containing the point is obtained. For every recursion, the search space gets reduced by one-fourth since only the quadrant containing the point is selected.

Nearest Neighbor:

To search for the nearest neighbor Q we choose a point to search and a 2-dimensional window W around this point in which we expect to find its nearest neighbor. Then, the following steps are performed:

Put the root on a stack
Repeat Pop the next node T from the stack For each child C of T If C intersects with W around Q, add C to the stack If C is a leaf, examine point(s) inside C and record the minimum Euclidean distance to Q.
Return Q

KD-Tree

The idea for constructing a KD-Tree (also known as K-Dimensional Tree) is that each level of the tree compares against one dimension. KD-Tree can be visualized as a binary tree in which every node is a k-dimensional point.

Some applications of KD-Trees include:

Multidimensional search key (range searches and nearest neighbor searches).
Image Compression.

Construction and Insertion:

Each depth level of the tree is associated with a dimension. For each data point insertion, at every level until the insertion location is found, the node is compared with the data point with respect to the dimension of each depth level. If the comparison results in a lower value, the left tree is considered otherwise the right tree is considered.

To insert a point P(x,y) in the tree with dimension = 2, below steps are followed:

A depth level is x-Aligned if for the given depth level, x coordinates are taken into consideration.
Similarly, a depth level is y-Aligned if for the given depth level, y coordinates are taken into consideration.
For each level, check if the current node is x-Aligned or y-Aligned.
If the current node (a,b) is x-Aligned, compare x with a. If x>=a, P needs to be inserted to the right subtree otherwise left subtree.
If the current node (a,b) is y-Aligned, compare y with b. If y>=b, P needs to be inserted to the right subtree otherwise left subtree.
Recurse until a node with null value is obtained. Insert P there.

KD-Tree

Figure 2: KD-Tree Implementation.

Search:

Searching a point P (x,y) in a KD Tree with dimension = 2 involves the following steps:

For each level, check if the current node is x-Aligned or y-Aligned.
If the current node (a,b) is x-Aligned, compare x with a. If x>=a, search for P in the right subtree otherwise left subtree.
If the current node (a,b) is y-Aligned, compare y with b. If y>=b, search for P in the right subtree otherwise left subtree.
Recurse until P is obtained. If the current node can not be traversed further and P is not found, report search unsuccessful.

Nearest Neighbor:

To find the nearest neighbor:

Search from the root node downwards and at each level compare which of the two children nodes have the lesser Euclidian distance with the point for which the nearest neighbour has to be found.
Choose the node with the lesser Euclidian distance to traverse next.
Keep going downwards while maintaining the minimum Euclidian distance found till now and output the node which was the closest when a leaf node is reached.

Design Choices:

The dimension of KD-Tree is kept as 2 since the spatial search is implemented in a 2 dimension plane.
Similarly, the Quadtree constructed is a D2 granular Quadtree. Here D2 means a tree with dimension 2 and granular is the nature in which points are stored in the nodes. The minimum division for the quadtree possible is of area 1 square unit. Hence, there can not be more than one data point present within a granule/ area of 1 square unit.
The coordinates of 2-dimensional plane window W over which spatial search implementations are compared are kept as (0,0) and (100000,100000).
Points to be inserted are selected from the same window W.
The data points selected for insertion are generated randomly. For comparative analysis of insertion, the number of data points are varied from 1000 to 100,000 in multiples of 10,000.
For comparative analysis of the search operation, the number of test cases to be searched for each N (Here N is the number of data points in the data structure) is varied from 1 to 1000.
The density value ranges from 0 to 10 and is being used to segregate points into two regions: one of a high density cluster which occupies 10% of the 2-D plane and the other set will be randomly placed. As the density increases from 0 - 10, the proportion of points allotted to the dense cluster is higher. For eg: At density 0, all points are randomly placed on the 2D plane. At density 1, 10% of the points are in the dense cluster and 90% are random and so on.
To find the nearest neighbor, a window R of area W/10 is created. Nearest neighbor Q is calculated for a point P where P and Q both lie in R. Hence, it is guaranteed that the nearest neighbor is present in R and thus the search is restricted to window R.

Comparative Analysis:

We compare spatial search implementations based on the following factors:

Insertion a. Comparison based on number of data points b. Comparison based on varying the distribution density of points (Evenly spread data v/s Densely populated data) and relative distance.
Search a. Comparison based on number of data points b. Comparison based on varying the distribution density of points (Evenly spread data v/s Densely populated data) and relative distance.
Finding the nearest neighbor a. Comparison based on number of data points averaged over multiple density of data points.

Insertion

a. Comparison based on number of data points:

Insert-Points

To insert an element in a Quadtree, the average case time complexity is O(Logd(N)) where N is size of distance. Here, d=4 since the branching factor is 4.

Similarly, the insertion time complexity for KD Trees is O(Logd(N)) with d=2, since the branching factor is 2.

This is because in Quadtrees, the data reaching a node is split into a fixed (2^d), equal size cells, in our case: 4, whereas in KD-Trees, the data is split into two regions based on the current depth level dimension.

The graph represents the time taken to construct a Quadtree and KD-Tree for n elements. In theory, worst case construction time for a Quadtree and KD-Tree is O(nLogn) but the average time comes out to be O(N). In practice, the construction time is between O(N) and O(NlogN) because some data points result in a skewed insertion.

Our approach to insert a value in the quad tree is granular. It means that in order to insert a value in quadtree, the child nodes are recursively iterated until the exact granular location is obtained and then the point is inserted. Hence, construction of quad tree requires a traversal of greater depth per insertion than KD trees. Thus, in practice, construction of quad trees takes more time than KD-Trees.

b. Comparison based on varying the distribution density of points(Evenly spread data v/s Densely populated data) and relative distance:

Insert-Density

As the density of data points for a specific region within the 2D plane increases, the probability of inserting a data point within the same child nodes increase and hence fewer number of new child nodes are created. This explains a decrease in the average time of construction and insertion for both the trees.

SpatialSearch

Install / Use

README

A Comparison of Spatial Search Using K-D Trees And Quadtrees

Table of contents

General Information

QuadTree

Construction:

Insertion:

Search:

Nearest Neighbor:

KD-Tree

Construction and Insertion:

Search:

Nearest Neighbor:

Design Choices:

Comparative Analysis:

Insertion

a. Comparison based on number of data points:

b. Comparison based on varying the distribution density of points(Evenly spread data v/s Densely populated data) and relative distance:

Search

a. Comparison based on number of data point