Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform

Approximate tandem repeats in a genomic sequence are two or more contiguous, similar copies of a pattern of nucleotides. They are used in DNA mapping, studying molecular evolution mechanisms, forensic analysis and research in diagnosis of inherited diseases. All their functions are still investigated and not well defined, but increasing biological databases together with tools for identification of these repeats may lead to discovery of their specific role or correlation with particular features. This paper presents a new approach for finding approximate tandem repeats in a given sequence, where the similarity between consecutive repeats is measured using the Hamming distance. It is an enhancement of a method for finding exact tandem repeats in DNA sequences based on the Burrows- Wheeler transform.




References:
[1] R. Chakraborty, M. Kimmel, D. N. Stivers, L. J. Davison, and R. Deka,
Relative mutation rates at di-, tri-, and tetranucleotide microsatellite
loci, PNAS, Vol. 94, pp. 1041 AI1046, 1997
[2] S. Kruglyak, R. T. Durrett, M. D. Schug, and C. F. Aquadro, Equilibrium
distributions of microsatellite repeat length resulting from a balance
between slippage events and point mutations, PNAS, Vol. 95, pp.
1077410778, 1998
[3] M. D. Vinces, M. Legendre, M. Caldara, M. Hagihara, K. J. Verstrepen,
Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability,
Science 324, 1213 (2009)
[4] C. T. McMurray, Mechanisms of trinucleotide repeat instability during
human development, Nat Rev Genet. 2010 Nov; 11(11): 786-99.
[5] A. J. Jeffreys, V. Wilson, S.L. Thein, Individual-specific -fingerprints-
of human DNA, Nature 316, 76 79, 1985
[6] J. L. Weber and C. Wong, Mutation of human short tandem repeats,
Hum. Mol. Genet. 2 (1993), pp. 11231128.
[7] A. Merkel, N. Gemmell, Detecting short tandem repeats from genome
data: opening the software black box, Brief. Bioinform. 9 (5) (2008)
355AI366.
[8] R. Pokrzywa, Application of the Burrows-Wheeler Transform for searching
for tandem repeats in DNA sequences, Int. J. Bioinf. Res. Appl. vol.
5, 432-446 (2009)
[9] R. Pokrzywa, A. Polanski.: BWtrs: A tool for searching for tandem
repeats in DNA sequences based on the Burrows-Wheeler transform,
Genomics 96, 316-321 (2010)
[10] M. Burrows, D.J. Wheeler, A block-sorting lossless data compression
algorithm, SRC Research Report 124, Digital Equipment Corporation,
California (1994)
[11] P. Ferragina, G. Manzini, Opportunistic data structures with applications,
In: Proceedings of the 41st Annual Symposium on Foundations of
Computer Science, pp. 390-398, IEEE Computer Society Washington,
DC, USA (2000)
[12] S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye,
R. Giegerich: REPuter: The Manifold Applications of Repeat Analysis
on a Genomic Scale, Nucleic Acids Res., 29(22):4633-4642, 2001.
[13] R. Kolpakov, G. Bana, G. Kucherov, mreps: efficient and flexible
detection of tandem repeats in DNA, Nucleid Acids Research 31, 3672-
3678 (2003)
[14] G. Benson, Tandem Repeats Finder: a program to analyze DNA sequences,
Nucleic Acids Research 27, 573-580 (1999)
[15] Y. Wexler, Z. Yakhini, Y. Kashi, D. Geiger, Finding Approximate Tandem
Repeats in Genomic Sequences, Journal of Computational Biology
(2005) 928-942
[16] D. Sokol, F. Atagun, TRedD: A Database for Tandem Repeats over the
Edit Distance, Database (2010)
[17] V. Boeva, M. Regnier, D. Papatsenko, V. Makeev, Short fuzzy tandem
repeats in genomic sequences, identification, and possible role in regulation
of gene expression, Bioinformatics (2006) 22 (6): 676-684
[18] G. M. Landau, J. P. Schmidt, D. Sokol, An Algorithm for Approximate
Tandem Repeat, Journal of Computational Biology, 8, 1-18, 2001