An Improved Fast Search Method Using Histogram Features for DNA Sequence Database
In this paper, we propose an efficient hierarchical DNA
sequence search method to improve the search speed while the
accuracy is being kept constant. For a given query DNA sequence,
firstly, a fast local search method using histogram features is used as a
filtering mechanism before scanning the sequences in the database.
An overlapping processing is newly added to improve the robustness
of the algorithm. A large number of DNA sequences with low
similarity will be excluded for latter searching. The Smith-Waterman
algorithm is then applied to each remainder sequences. Experimental
results using GenBank sequence data show the proposed method
combining histogram information and Smith-Waterman algorithm is
more efficient for DNA sequence search.
[1] J. C. Venter, M. D. Adams, E. W. Myers, etc., "The sequence of the
human genome", Science, vol. 291, no. 5507, pp. 1304 -1351, 2001.
[2] F. S. Collins, M. Morgan, A. Patrinos, "The human genome project:
lessons from Large-Scale Biology", Science, vol. 300, no. 5617, pp.
286-290, 2003.
[3] S.B.Needlman and C.D.Wunsch, "A general method applicable to the
search for similarities in the amino acid sequence of two proteins",
Journal of Molecular Biology, vol. 48, pp. 443- 453, 1970.
[4] T. F. Smith and M. S.Waterman, "Identification of common molecular
subsequences", Journal of Molecular Biology, vol. 47, pp. 195- 197,
1981.
[5] S. F. Altscgul, W.Gish, W. Miller, E. W. Myers and D. J. Lipman, "Basic
local alignment search tool", Journal of Molecular Biology, vol. 215, pp.
403- 410, 1990.
[6] D. Lipman and W. R. Pearson, "Rapid and sensitive protein similarity
searches", Science, vol. 227, pp. 1435-1441, 1985.
[7] GenBank, ftp://ftp.ncbi.nih.gov/genbank/
[8] http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
[9] M. Li, and B. Ma, "PatternHunter II: highly sensitive and fast homology
search", Genome Informatics, vol. 14, pp. 164-175, 2003.
[10] B. Ma, J. Tromp, and M. Li, "PatternHunter: faster and more sensitive
homology search", Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002.
[11] Q. Chen, K. Kotani, F. Lee, and T. Ohmi, "A Fast Retrieval of DNA
Sequences Using Histogram Information", 2009 Int-l Conf. on Future
Information Technology and Management Engineering (FITME 2009),
pp. 529-532, Sanya, China, Dec., 2009.
[1] J. C. Venter, M. D. Adams, E. W. Myers, etc., "The sequence of the
human genome", Science, vol. 291, no. 5507, pp. 1304 -1351, 2001.
[2] F. S. Collins, M. Morgan, A. Patrinos, "The human genome project:
lessons from Large-Scale Biology", Science, vol. 300, no. 5617, pp.
286-290, 2003.
[3] S.B.Needlman and C.D.Wunsch, "A general method applicable to the
search for similarities in the amino acid sequence of two proteins",
Journal of Molecular Biology, vol. 48, pp. 443- 453, 1970.
[4] T. F. Smith and M. S.Waterman, "Identification of common molecular
subsequences", Journal of Molecular Biology, vol. 47, pp. 195- 197,
1981.
[5] S. F. Altscgul, W.Gish, W. Miller, E. W. Myers and D. J. Lipman, "Basic
local alignment search tool", Journal of Molecular Biology, vol. 215, pp.
403- 410, 1990.
[6] D. Lipman and W. R. Pearson, "Rapid and sensitive protein similarity
searches", Science, vol. 227, pp. 1435-1441, 1985.
[7] GenBank, ftp://ftp.ncbi.nih.gov/genbank/
[8] http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
[9] M. Li, and B. Ma, "PatternHunter II: highly sensitive and fast homology
search", Genome Informatics, vol. 14, pp. 164-175, 2003.
[10] B. Ma, J. Tromp, and M. Li, "PatternHunter: faster and more sensitive
homology search", Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002.
[11] Q. Chen, K. Kotani, F. Lee, and T. Ohmi, "A Fast Retrieval of DNA
Sequences Using Histogram Information", 2009 Int-l Conf. on Future
Information Technology and Management Engineering (FITME 2009),
pp. 529-532, Sanya, China, Dec., 2009.
@article{"International Journal of Biological, Life and Agricultural Sciences:60036", author = "Qiu Chen and Feifei Lee and Koji Kotani and Tadahiro Ohmi", title = "An Improved Fast Search Method Using Histogram Features for DNA Sequence Database", abstract = "In this paper, we propose an efficient hierarchical DNA
sequence search method to improve the search speed while the
accuracy is being kept constant. For a given query DNA sequence,
firstly, a fast local search method using histogram features is used as a
filtering mechanism before scanning the sequences in the database.
An overlapping processing is newly added to improve the robustness
of the algorithm. A large number of DNA sequences with low
similarity will be excluded for latter searching. The Smith-Waterman
algorithm is then applied to each remainder sequences. Experimental
results using GenBank sequence data show the proposed method
combining histogram information and Smith-Waterman algorithm is
more efficient for DNA sequence search.", keywords = "Fast search, DNA sequence, Histogram feature,Smith-Waterman algorithm, Local search", volume = "4", number = "9", pages = "696-4", }