Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform
Approximate tandem repeats in a genomic sequence are
two or more contiguous, similar copies of a pattern of nucleotides.
They are used in DNA mapping, studying molecular evolution
mechanisms, forensic analysis and research in diagnosis of inherited
diseases. All their functions are still investigated and not well
defined, but increasing biological databases together with tools for
identification of these repeats may lead to discovery of their specific
role or correlation with particular features. This paper presents a new
approach for finding approximate tandem repeats in a given sequence,
where the similarity between consecutive repeats is measured using
the Hamming distance. It is an enhancement of a method for finding
exact tandem repeats in DNA sequences based on the Burrows-
Wheeler transform.
[1] R. Chakraborty, M. Kimmel, D. N. Stivers, L. J. Davison, and R. Deka,
Relative mutation rates at di-, tri-, and tetranucleotide microsatellite
loci, PNAS, Vol. 94, pp. 1041 AI1046, 1997
[2] S. Kruglyak, R. T. Durrett, M. D. Schug, and C. F. Aquadro, Equilibrium
distributions of microsatellite repeat length resulting from a balance
between slippage events and point mutations, PNAS, Vol. 95, pp.
1077410778, 1998
[3] M. D. Vinces, M. Legendre, M. Caldara, M. Hagihara, K. J. Verstrepen,
Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability,
Science 324, 1213 (2009)
[4] C. T. McMurray, Mechanisms of trinucleotide repeat instability during
human development, Nat Rev Genet. 2010 Nov; 11(11): 786-99.
[5] A. J. Jeffreys, V. Wilson, S.L. Thein, Individual-specific -fingerprints-
of human DNA, Nature 316, 76 79, 1985
[6] J. L. Weber and C. Wong, Mutation of human short tandem repeats,
Hum. Mol. Genet. 2 (1993), pp. 11231128.
[7] A. Merkel, N. Gemmell, Detecting short tandem repeats from genome
data: opening the software black box, Brief. Bioinform. 9 (5) (2008)
355AI366.
[8] R. Pokrzywa, Application of the Burrows-Wheeler Transform for searching
for tandem repeats in DNA sequences, Int. J. Bioinf. Res. Appl. vol.
5, 432-446 (2009)
[9] R. Pokrzywa, A. Polanski.: BWtrs: A tool for searching for tandem
repeats in DNA sequences based on the Burrows-Wheeler transform,
Genomics 96, 316-321 (2010)
[10] M. Burrows, D.J. Wheeler, A block-sorting lossless data compression
algorithm, SRC Research Report 124, Digital Equipment Corporation,
California (1994)
[11] P. Ferragina, G. Manzini, Opportunistic data structures with applications,
In: Proceedings of the 41st Annual Symposium on Foundations of
Computer Science, pp. 390-398, IEEE Computer Society Washington,
DC, USA (2000)
[12] S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye,
R. Giegerich: REPuter: The Manifold Applications of Repeat Analysis
on a Genomic Scale, Nucleic Acids Res., 29(22):4633-4642, 2001.
[13] R. Kolpakov, G. Bana, G. Kucherov, mreps: efficient and flexible
detection of tandem repeats in DNA, Nucleid Acids Research 31, 3672-
3678 (2003)
[14] G. Benson, Tandem Repeats Finder: a program to analyze DNA sequences,
Nucleic Acids Research 27, 573-580 (1999)
[15] Y. Wexler, Z. Yakhini, Y. Kashi, D. Geiger, Finding Approximate Tandem
Repeats in Genomic Sequences, Journal of Computational Biology
(2005) 928-942
[16] D. Sokol, F. Atagun, TRedD: A Database for Tandem Repeats over the
Edit Distance, Database (2010)
[17] V. Boeva, M. Regnier, D. Papatsenko, V. Makeev, Short fuzzy tandem
repeats in genomic sequences, identification, and possible role in regulation
of gene expression, Bioinformatics (2006) 22 (6): 676-684
[18] G. M. Landau, J. P. Schmidt, D. Sokol, An Algorithm for Approximate
Tandem Repeat, Journal of Computational Biology, 8, 1-18, 2001
[1] R. Chakraborty, M. Kimmel, D. N. Stivers, L. J. Davison, and R. Deka,
Relative mutation rates at di-, tri-, and tetranucleotide microsatellite
loci, PNAS, Vol. 94, pp. 1041 AI1046, 1997
[2] S. Kruglyak, R. T. Durrett, M. D. Schug, and C. F. Aquadro, Equilibrium
distributions of microsatellite repeat length resulting from a balance
between slippage events and point mutations, PNAS, Vol. 95, pp.
1077410778, 1998
[3] M. D. Vinces, M. Legendre, M. Caldara, M. Hagihara, K. J. Verstrepen,
Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability,
Science 324, 1213 (2009)
[4] C. T. McMurray, Mechanisms of trinucleotide repeat instability during
human development, Nat Rev Genet. 2010 Nov; 11(11): 786-99.
[5] A. J. Jeffreys, V. Wilson, S.L. Thein, Individual-specific -fingerprints-
of human DNA, Nature 316, 76 79, 1985
[6] J. L. Weber and C. Wong, Mutation of human short tandem repeats,
Hum. Mol. Genet. 2 (1993), pp. 11231128.
[7] A. Merkel, N. Gemmell, Detecting short tandem repeats from genome
data: opening the software black box, Brief. Bioinform. 9 (5) (2008)
355AI366.
[8] R. Pokrzywa, Application of the Burrows-Wheeler Transform for searching
for tandem repeats in DNA sequences, Int. J. Bioinf. Res. Appl. vol.
5, 432-446 (2009)
[9] R. Pokrzywa, A. Polanski.: BWtrs: A tool for searching for tandem
repeats in DNA sequences based on the Burrows-Wheeler transform,
Genomics 96, 316-321 (2010)
[10] M. Burrows, D.J. Wheeler, A block-sorting lossless data compression
algorithm, SRC Research Report 124, Digital Equipment Corporation,
California (1994)
[11] P. Ferragina, G. Manzini, Opportunistic data structures with applications,
In: Proceedings of the 41st Annual Symposium on Foundations of
Computer Science, pp. 390-398, IEEE Computer Society Washington,
DC, USA (2000)
[12] S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye,
R. Giegerich: REPuter: The Manifold Applications of Repeat Analysis
on a Genomic Scale, Nucleic Acids Res., 29(22):4633-4642, 2001.
[13] R. Kolpakov, G. Bana, G. Kucherov, mreps: efficient and flexible
detection of tandem repeats in DNA, Nucleid Acids Research 31, 3672-
3678 (2003)
[14] G. Benson, Tandem Repeats Finder: a program to analyze DNA sequences,
Nucleic Acids Research 27, 573-580 (1999)
[15] Y. Wexler, Z. Yakhini, Y. Kashi, D. Geiger, Finding Approximate Tandem
Repeats in Genomic Sequences, Journal of Computational Biology
(2005) 928-942
[16] D. Sokol, F. Atagun, TRedD: A Database for Tandem Repeats over the
Edit Distance, Database (2010)
[17] V. Boeva, M. Regnier, D. Papatsenko, V. Makeev, Short fuzzy tandem
repeats in genomic sequences, identification, and possible role in regulation
of gene expression, Bioinformatics (2006) 22 (6): 676-684
[18] G. M. Landau, J. P. Schmidt, D. Sokol, An Algorithm for Approximate
Tandem Repeat, Journal of Computational Biology, 8, 1-18, 2001
@article{"International Journal of Biological, Life and Agricultural Sciences:50665", author = "Agnieszka Danek and Rafał Pokrzywa", title = "Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform", abstract = "Approximate tandem repeats in a genomic sequence are
two or more contiguous, similar copies of a pattern of nucleotides.
They are used in DNA mapping, studying molecular evolution
mechanisms, forensic analysis and research in diagnosis of inherited
diseases. All their functions are still investigated and not well
defined, but increasing biological databases together with tools for
identification of these repeats may lead to discovery of their specific
role or correlation with particular features. This paper presents a new
approach for finding approximate tandem repeats in a given sequence,
where the similarity between consecutive repeats is measured using
the Hamming distance. It is an enhancement of a method for finding
exact tandem repeats in DNA sequences based on the Burrows-
Wheeler transform.", keywords = "approximate tandem repeats, Burrows-Wheeler transform,
Hamming distance, suffix array", volume = "6", number = "1", pages = "1-5", }