SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space

Locality Sensitive Hashing (LSH) is one of the most
promising techniques for solving nearest neighbour search problem in
high dimensional space. Euclidean LSH is the most popular variation
of LSH that has been successfully applied in many multimedia
applications. However, the Euclidean LSH presents limitations that
affect structure and query performances. The main limitation of the
Euclidean LSH is the large memory consumption. In order to achieve
a good accuracy, a large number of hash tables is required. In this
paper, we propose a new hashing algorithm to overcome the storage
space problem and improve query time, while keeping a good
accuracy as similar to that achieved by the original Euclidean LSH.
The Experimental results on a real large-scale dataset show that the
proposed approach achieves good performances and consumes less
memory than the Euclidean LSH.





References:
[1] A. Guttman. "R-trees: A dynamic index structure for spatial searching”.
In SIGMOD, 1984, pp. 47-57.
[2] J. L. Bentley. "K-d trees for semidynamic point sets”. InSoCG, 1990,
pp. 187-197.
[3] Weber, H.J. Schek and S. Blott, "A quantitative analysis and
performance, study for similarity-search methods in high-dimensional
spaces”. In Proceedings of 24rd International Conference on Very Large
Data Bases, New York, USA, 1998, pp.194-205.
[4] S. Blott and R. Weber."A Simple Vector-Approximation File for
Similarity Search in High-Dimensional Vector Spaces”. Technical
Report 19, ESPRIT project HERMES, March 1997.
[5] L. Ye and Y. Hua. "The CMVAI-File: An Efficient Approximation-
Based High-Dimensional Index Structure”,Multimedia Information
Networking and Security (MINES), 2010.
[6] H.Lu, B.C.Ooi, H.T.Shen and X.Xue. "Hierarchical Indexing Structure
for Efficient Similarity Search in Video Retrieval”. IEEE Transactions
on Knowledge and Data Engineering, November 2006.
[7] D. Gorisse, M. Cord and F. Precioso. "Locality-sensitive hashing for
chi2-Distance”, IEEE Transactions on Pattern Analysis and Machine
Intelligence 34, 2012, pp.402–409.
[8] H. Wang, J. Cao, L. Shu, and D. Rafiei."Locality sensitive hashing
revisited: Filling the gap between theory and algorithm analysis”. In
Proceedings of the 22nd ACM international conference on Conference
on information & knowledge management, 2013, pp. 1969-1978.
[9] A. Andoni and P. Indyk. "Near-Optimal Hashing Algorithms for
Approximate Nearest Neighbor in High Dimensions”. In 47th Annual
IEEE Symposium on foundations of Computer Science, 2006, pp. 459-
468.
[10] C. Calistru, C. Ribeiro and G.David. "High- Dimensional Indexing for
Video Retrieval”. Multimedia - A Multidisciplinary Approach to
Complex Issues, 2012.
[11] I. Daoudi, K. Idrissi and S.E.A Ouatik. "Kernel region approximation
blocks for indexing heterogonous databases”. Multimedia and Expo,
2008 IEEE International Conference, 2008, pp. 1237-1240.
[12] T. Liu, A. W. Moore, A. G. Gray and K. Yang. "An Investigation of
Practical Approximate Nearest Neighbor Algorithm”. In proceeding of:
Advances in Neural Information Processing Systems 17. Vancouver,
British Columbia, Canada, 2004.
[13] J. Gan, J. Feng, Q. Fang and W.Ng, "Locality-sensitive hashing scheme
based on dynamic collision counting”. In proceeding SIGMOD '12
Proceedings of the 2012 ACM SIGMOD International Conference on
Management of Data. New York, USA, 2012, pp. 541-552.
[14] Q. Lv, W. Josephson, Z. Wang, M. Charikar and K.Li. "Multi-probe
LSH: efficient indexing for high-dimensional similarity search”. In
VLDB, 2007, pp.950-96.
[15] P.Indyk, and R.Motwani, "Approximate nearest neighbor: Towards
removing the curse of dimensionality”. In Proceeding STOC '98
Proceedings of the thirtieth annual ACM symposium on Theory of
computing, 1998, pp.604-613.
[16] M.S. Charikar. "Similarity estimation techniques from rounding
algorithms”. In Proceedings of the Symposium on Theory of Computing,
2002.
[17] M. Datar, N. Immorlica, P. Indyk and V. Mirrokni. "Locality sensitive
hashing scheme based on p-stable distributions”. In Proceedings of the
ACM Symposium on Computational Geometry, 2004.
[18] A. Andoni and P. Indyk. E2LSH: Exact Euclidean locality-sensitive
hashing. Implementationavailableat:http://www.mit.edu/~andoni/LSH/.
[19] A. Z. Broder, M. Charikar, A. M. Frieze, andM. Mitzenmacher. "Minwise
independent permutations”. In STOC, 1998, pp. 327–336.
[20] S. Chafik, I.Daoudi, M.A EL Yacoubi, H. El Ouardi and B. Dorizzi.
"Locality sensitive hashing for content-based image retrieval: A
comparative experimental study”. The Fifth International Conference on
Next Generation Networks and services (NGNs), 2014.( unpublished)
[21] Haveliwala, T., Gionis, A. and Indyk. "Scalable Techniques for
Clustering the Web” (Extended Abstract). In Third International
Workshop on the Web and Databases (WebDB 2000), Dallas, Texas,
2000.
[22] D. Feng and J. Yang, C. Liu. "An efficient indexing method for contentbased
image retrieval”.Neurocomputing, April 2013, pp.103-114.
[23] J. Buhler and M. Tompa. "Finding motifs using randomprojections”. In
Proceedings of the Annual International Conference on Computational
Molecular Biology (RECOMB1), 2002, pp. 225-242.
[24] http://corpus-texmex.irisa.fr