A Practical Distributed String Matching Algorithm Architecture and Implementation

Traditional parallel single string matching algorithms are always based on PRAM computation model. Those algorithms concentrate on the cost optimal design and the theoretical speed. Based on the distributed string matching algorithm proposed by CHEN, a practical distributed string matching algorithm architecture is proposed in this paper. And also an improved single string matching algorithm based on a variant Boyer-Moore algorithm is presented. We implement our algorithm on the above architecture and the experiments prove that it is really practical and efficient on distributed memory machine. Its computation complexity is O(n/p + m), where n is the length of the text, and m is the length of the pattern, and p is the number of the processors.




References:
[1] Zheng Liu, Xin Chen, James Borneman and Tao Jiang, "A fast algorithm
for approximate string matching on gene sequences," in Symposium. 16th
Annu. Combinatorial Pattern Matching, LNCS, Springer-Verlag, vol.
3537, pp. 79-90, June 2005.
[2] N. Tuck, T. Sherwood, B. Calder and G. Varghese, "Deterministic
memory-efficient string matching algorithms for intrusion detection," in
Proc. IEEE INFOCOM, vol. 4, pp. 2628-2639, March 2004.
[3] D. E. Knuth, J. H. Morris, and V. R. Pratt, "Fast pattern matching in
strings," SIAM Journal on Computing, vol. 6, pp. 323-350, 1977.
[4] R. S. Boyer and J. S. Moore, "A fast string searching algorithm,"
Communications of the ACM, vol. 20, pp. 762-772, 1977.
[5] R. N. Horspool, "Practical fast searching in strings," Software - Practice
and Experience, vol. 10, pp. 501-506, 1980.
[6] D. M. Sunday, "A very fast substring search algorithm," Communications
of the ACM, vol. 33, pp. 132-142 ,1990.
[7] D. Cantone and S. Faro, "Fast-Search: A new efficient variant of the
Boyer-Moore string matching algorithm," in Proc. Second International
Workshop on Experimental and Efficient Algorithms, LNCS,
Springer-Verlag, Vol. 2647, pp. 47-58, May 2003.
[8] Z. Galil, "Optimal parallel algorithms for string matching," in Proc. 16th
Annu. ACM symposium on Theory of computing, pp. 240-248, 1984.
[9] U. Vishkin, "Optimal parallel matching in strings," Information and
control, vol. 67, pp. 91-113, 1985.
[10] Y. Takefuji, T. Tanaka, and K. C. Lee, "A parallel string search
algorithm", IEEE Trans. Systems, Man and Cybernetics, vol. 22, pp.
332-336, March-April 1992.
[11] CHEN Guo-liang, LIN-Jie, and GU Nai-jie, "Design and analysis of
string matching algorithm on distributed memory machine," Journal of
Software, vol. 11, pp. 771-778, 2000.
[12] C. Charras and T. Lecroq, "Exact string matching algorithms,"
Laboratoire d'Informatique de Rouen Université de Rouen. Available:
http://www-igm.univ-mlv.fr/~lecroq/string/
[13] Z. Galil, "On improving the worst case running time of the Boyer-Moore
string matching algorithm," Communications of the ACM, vol. 22, pp.
505-508, 1979.
[14] R. Cole. "Tight bounds on the complexity of the Boyer-Moore string
matching algorithm," SIAM Journal on Computing, vol. 23, pp.
1075-1091, 1994.
[15] GU Nai-jie, LI Wei and LIU Jing, "Fibonacci series-based multicast
algorithm," Chinese Journal of Computers, vol. 25, pp. 365-372, 2002.