Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

Scholarly

Volume:1, Issue: 10, 2007 Page No: 3158 - 3162

International Journal of Information, Control and Computer Sciences

ISSN: 2517-9942

1181 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.1071982 BibTeX JSON

Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

As a structure for processing string problem, suffix array is certainly widely-known and extensively-studied. But if the string access pattern follows the “90/10" rule, suffix array can not take advantage of the fact that we often find something that we have just found. Although the splay tree is an efficient data structure for small documents when the access pattern follows the “90/10" rule, it requires many structures and an excessive amount of pointer manipulations for efficiently processing and searching large documents. In this paper, we propose a new and conceptually powerful data structure, called splay suffix arrays (SSA), for string search. This data structure combines the features of splay tree and suffix arrays into a new approach which is suitable to implementation on both conventional and clustered computers.

Authors:

Keywords:

References:

[1] I. H. Written, A. Moffat, and T. C. Bell. Managing Gigabytes:
Compressing and Indexing Documents and Images. Van Nostrand
Reinhold, 1994.
[2] W. B. Cavnar. Using an n-gram based document representation with a
vector processing retrieval model. In Proc. Of the 3rd Text Retrieval
Conference (TREC-3), pages 269-277. NIST special publication, 1995.
[3] G. A. Stephen. String Searching Algorithms. World Scientific Publishing,
1994.
[4] M. Crochemore and W. Rytter. Text Algorithms. Oxford Univ. Press,
New York, 1994
[5] D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge
Univ. Press, 1997
[6] E. M. McCreighl. A Space-Economical Suffix Tree Construction
Algorithm. Journal of the ACM, 23, pp. 262-272, 1976.
[7] U. Manber and G. Myers Suffix Arrays: A New Method for On-Line
String Searches. SIAM J. Comput. 22(5), pp 935-948, 1993.
[8] G. H. Gonnet, R. A. Baeza-Yates, et al. New indices for text: PAT trees
and PAT arrays. In W. B. Frakes and R. A. Baeza-Yates, editors,
Information Retrieval: Data Structure and Algorithms, pages 66-82.
Prentice-Hall, New Jersey, 1992.
[9] M. Nagao and S. Mori. A new method of n-gram statistics for large
number of n and automatic extraction of words and phrase form large text
data of Japanese. In Proc of COLING-94, pages 611-615,1994.
[10] A. Apostolico. The myriad virtues of subword trees. In Combinatorial
Algorithms on Words, pages 85-96. Springer-Verlag, 1985.
[11] A. Fellah. Concurrent and Distributed Data Structures for Multikeys
Sorting on Computer Clusters. IEEE Proc. Of the 16th Intern. Symp. On
High Performance Comput. Systems and Applications. Pp. 281-,
Moncton, Canada, 2002.
[12] A. Fellah, A. Maamir and M. Abaza. Distributed Data Structures for
Multikey Sorting. Intern. J. of Parallel and Distributed Systems and
Networks, Vol. 2(2), pp. 62-68, 1999.
[13] Fellah, A. and Mawson, R. Distributed multidimensional suffix arrays for
string search. Communications, Computers and signal Processing, 2003.
PACRIM. 2003 IEEE Pacific Rim Conference on , Volume: 2, pp.
792-795, 2003.
[14] K. Sadakane. A fast algorithm for making suffix arrays and for
burrows-wheeler transformation. In Proc. DCC-98, pages 129-138, 1998.
[15] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer
Programming. Addison-Wesley, 1973.
[16] J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching
strings. In the 8th Annual ACM-SIAM Sympo. On Descrete Algorithms,
pages 360-369, 1997.
[17] J. Kitajima, G. Navarro, B. Ribeiro, and N. Ziviani. Distributed generation
of suffix arrays: a quicksort-based approach. In Proc. WSP-97, pages
53-69. Carleton University Press, 1997.
[18] J. Kitajima, B. Ribeiro, and N. Ziviani. Network and memory analysis in
distributed parallel generation of PAT arrays. In Proc. 8th Brazilian Symp.
On Comp. Arch. - High-Performance Processing, pages 193-202.
Brazilian Comp. Soc., 1996.
[19] G. Navarro, J. Kitajima, B. Ribeiro, and N. Ziviani. Distributed generation
of suffix arrays. In Proc. CPM-97, LNCS 1264, pages 102-115, 1997.
[20] Kitajima, J.P., Navarro, G. A fast distributed suffix array generation
algorithm. String Processing and
[21] Information Retrieval Symposium,1999 and International Workshop on
Groupware, Sept,1999,22-24 Pages:97-104
[22] D. D. Sleator and R. E. Tarjan. Self-adjusting Binary Search Trees.
Journal of the ACM 32, pp. 652-686, 1985.

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

Preview