Approximation Algorithm for the Shortest Approximate Common Superstring Problem

Scholarly

Volume:1, Issue: 12, 2007 Page No: 605 - 610

International Journal of Engineering, Mathematical and Physical Sciences

ISSN: 2517-9934

1010 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.1083623 BibTeX JSON

Approximation Algorithm for the Shortest Approximate Common Superstring Problem

The Shortest Approximate Common Superstring (SACS) problem is : Given a set of strings f={w1, w2, ... , wn}, where no wi is an approximate substring of wj, i ≠ j, find a shortest string Sa, such that, every string of f is an approximate substring of Sa. When the number of the strings n>2, the SACS problem becomes NP-complete. In this paper, we present a greedy approximation SACS algorithm. Our algorithm is a 1/2-approximation for the SACS problem. It is of complexity O(n2*(l2+log(n))) in computing time, where n is the number of the strings and l is the length of a string. Our SACS algorithm is based on computation of the Length of the Approximate Longest Overlap (LALO).

Authors:

Keywords:

References:

[1] D. Maier, J. A. Storer, A note on complexity of the superstring problem,
TR-233, Princeton University, Dept. EECS : (1977).
[2] D. Maier, The complexity of some problems on subsequences and
supersequences, J. of ACM, Vol. 25, N┬░2 : (April 1978), p322-336.
[3] M. R. Garey, D. S. Johnson, Computers and intractability, Freeman
(Ed.) : (1979).
[4] J. K. Gallant, D. Maier, J. Storer, On finding minimal length
superstrings, J. Computer and System Sciences, Vol. 20, N┬░1 : (1980),
p50-58.
[5] D. S. Hirschberg, Recent results on the complexity of commonsubsequence
problems, Time Warps, String Edits and Macromolecules
The Theory and Practice of Sequence Comparison, Sankoff and Kruskal
(Eds.), Addison-Wesley Pub. Inc. : (1983), p325-330.
[6] H. Peltola, H. Söderlund, E. Ukkonen, SEQAID : A DNA sequence
assembling program based on a mathematical model, Nucleic Acids
Research, Vol. 12, N┬░1 : (1984), p307-321.
[7] S. Dear, R. Staden, A sequence assembly and editing program for
efficient management of large projects, Nucleic Acids Research, Vol. 19
: (1991), p3907-3911.
[8] X. Huang, A contig assembly program based on sensitive detection of
string overlaps, Genomics, N┬░14 : (1992), p18-25.
[9] J. Kececioglu, E. Myers, Combinatorial algorithms for DNA sequence
assembly, Algorithmica, Vol. 13, N┬░12, Springer International Eds. :
(1995), p7-51.
[10] G. G. Sutton, O. White, M. D. Adams, A. R. Kerlavage, TIGR
Assembler: A new tool for assembing large shotgun sequencing projects,
Genome Science & Technology, Vol. 1, N┬░1, Mary Ann Liebert, Inc. :
(1995), p9-19.
[11] M. Elloumi, DNA Sequence Assembly Algorithms Based on Clustering
Approaches, The 2000 International Conference on Mathematics and
Engineering Techniques in Medicine and Biological Sciences,
METMBS'2000 (Las Vegas, Nevada, USA) : (June 2000).
[12] J. Cohen, Bioinformatics-An Introduction For Computer Scientists,
ACM Computing Surveys, Vol 36, No 2, June 2004, p122-158.
[13] S. Rahmann, The Shortest Common Supersequence Problem In a
Microarray Production Setting, Bioinformatics, Vol 19, (2003), p ii156-
ii161.
[14] E. Ukkonen, A Linear time algorithm for finding approximate shortest
common superstrings, Algorithmica, N┬░5: (1990), p313-323.
[15] J. Kececioglu, Exact and approximation algorithms for DNA sequence
reconstruction, Ph.D. dissertation, Technical Report, N┬░91-26, Dept. of
Computer Science, The University of Arisona, Tucson, AZ 85721 :
(1991).
[16] J. Tarhio, E. Ukkonen, A greedy approximation algorithm for
constructing shortest common superstrings, Theo. Comput. Sci., N┬░57:
(1988), p131-145.
[17] S. Teng, F. Yao, Approximation shortest superstrings, 34th IEEE
Symposium on Foundation of Computer Science : (1993).
[18] T. A. Jenkyns, The greedy travelling salesman's problem, Networks N┬░9
: (1979), p363-373.
[19] J. K. Gallant, The complexity of the overlap method for sequencing
biopolymers, J. Theor. Biol., N┬░101 : (1982), p1-17.
[20] J. Van Leeuwen, Graph algorithms, Handbook of Theoretical Computer
Science, Vol. A : Algorithms and Complexity, J. Van Leeuwen (Ed.),
Elsevier Science Pub. B. V. : (1990), p527-631.
[21] R. E. Bellman, Dynamic Programming, Princeton University Press, New
Jersey : (1957)
[22] R. E. Bellman, S. E. Dreyfus, Applied Dynamic Programming, Princeton
University Press, New Jersey : (1962)
[23] R. A. Wagner, M. J. Fischer, The string-to-string correction problem, J.
of ACM, Vol 21 No 1 : (1974), p168-173.
[24] P. H. Sellers, The theory and computation of Evolutionary distances :
Pattern recognition, J. Algorithms, No 1 : (1980) p359-373.
[25] M. Elloumi, An algorithm for the approximate string-matching problem,
Atlantic Symposium on Computational Biology, Genome Information
Systems & Technology, CBGI'2001, (Durham, North Carolina,
U.S.A.) : (March 2001).
[26] J. S. Vitter, Ph. Flajolet, Average-case analysis of algorithms and data
structures, Handbook of Theoretical Computer Science, Vol. A :
Algorithms and Complexity, J. Van Leeuwen (Ed.), Elsevier Science
Pub. B. V. : (1990), p431-524.

Scholarly

International Journal of Engineering, Mathematical and Physical Sciences

Archive

Last Issue

Commitee

Approximation Algorithm for the Shortest Approximate Common Superstring Problem

Scholarly

International Journal of Engineering, Mathematical and Physical Sciences

Archive

Last Issue

Commitee

Approximation Algorithm for the Shortest Approximate Common Superstring Problem

Preview