Non-Overlapping Hierarchical Index Structure for Similarity Search

In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.





References:
[1] C. Faloutsos, "Searching Multimedia Databases by Content". Kluwer
Academic Publishers, 1996.
[2] A. Guttman. "R-trees: A dynamic index structure for spatial searching".
In Proceedings of the ACM SIGMOD International Conference on
Management of data, Boston, Masachussets, USA, pages 47-57. 1984.
[3] T. Zhang, R. Ramakrishnan, M. Linvy, "Birch: An efficient data
clustering method for very large databases". In Proceedings of the ACM
SIGMOD International Conference on Management of Data, Montreal,
Canada, pp.103-114, 1996.
[4] N. Beckman, H.P.Kriegel, R. Schneider, & B. Seeger, (1990). "The R*-
tree: An efficient and robust access method for points and rectangles". In
Proceeding of the ACM SIGMOD International Conference on
Management of Data, Atlantic City, New Jersey, USA, pp. 322-331,
1990.
[5] S. Bertchold, D.A. Keim, H.P. Kriegel, "The X-tree: An index structure
for hight-dimentional data". In Proceeding of the 22nd Internatioanl
Conference on Very Large Data Bases, Mumbai (Bombay), India, pp.
28-39, 1996.
[6] D.A. White and R. Jain, "Similarity Indexing with the SS-Tree". In
Proceeding of the 12th Int-l Conf Data Eng., pp. 516-523, 1996.
[7] N. Katayama, S. Satoh. "The SR-tree : An index structure for highdimensional
nearest neighbor queries". In Proceeding of the ACM
SIGMOD, International Conference on Management of Data, Tuscon,
Arizona, USA, pages 369-380. 1997.
[8] J.T. Robinson, "The k-d-B-Tree: A Search Structure for Large
Multidimensional Dynamic Indexes," Proc. ACM SIGMOD Int-l Conf.
Management of Data, pp. 10-18, Apr. 1981.
[9] D.B. Lomet and B. Salzberg, "The hB-Tree: A Multiattribute Indexing
Method with Good Guaranteed Performance," ACM Trans. Database
Systems, vol. 15, no. 4, pp. 625-658, 1990.
[10] A. Henrich, "The LSDh-Tree: An Access Structure for Feature Vectors".
Proc. 14th Int-l Conf. Data Eng., pp. 362-369, 1998.
[11] [kd-Tree] J. Bentley, "Multidimensional Binary Search Trees Used for
Associative Searching," Comm. ACM, vol. 18, no. 9, pp. 509- 517,
1975.
[12] D. L. Boley, "Principal Direction Divisive Partitioning", Data Mining
and Knowledge Discovery 2(4):325-344, 1998.
[13] N. Bouteldja, V. Gouet-Brunet et M. Scholl. "Evaluation of strategies
for multiple sphere queries with local image descriptors". In IS&T/SPIE
Conference on Multimedia Content Analysis, Management, and
Retrieval, San Jose CA, USA, 2006.
[14] S. Savaresi, D. L. Boley, S. Bittanti, G. Gazzaniga. "Choosing the
cluster to split in bisecting divisive clustering algorithms". CSE Report
TR-00-055, University of Minnesota, 2000.
[15] N. Roussopoulos, S. Kelly, F. Vincent. "Nearest Neighbor Queries". In
Proceeding of ACM SIGMOD, May 1995.