Application of Exact String Matching Algorithms towards SMILES Representation of Chemical Structure

Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information. NMRShiftDB is one of the main databases that used to represent the chemical structures in 2D or 3D structures. SMILES format is one of many ways to write a chemical structure in a linear format. In this study we extracted Antimicrobial Structures in SMILES format from NMRShiftDB and stored it in our Local Data Warehouse with its corresponding information. Additionally, we developed a searching tool that would response to user-s query using the JME Editor tool that allows user to draw or edit molecules and converts the drawn structure into SMILES format. We applied Quick Search algorithm to search for Antimicrobial Structures in our Local Data Ware House.





References:
[1] Carlos Morel, "Bioinformatics for disease endemic countries:
opportunities and challenges in science and technology development for
health", Special Program for Research and Training in Tropical Diseases
(TDR). Geneva, Switzerland, 2002, pp. 1-4.
[2] Chen Guang Li., "String Matching and the Knuth-Morris-Pratt
Algorithm". Carleton University, Canada, 2006, pp. 1-8.
[3] Christian Charras and Thierry Lecroq, "Exact String Matching
Algorithms". De Rouen University. France.
[4] Christoph Steinbeck and Stefan Kuhn. Open Content Databases and
Open Source Libraries for Chemoinformatics. Cologne University
Bioinformatics Center (CUBIC).
[5] Domenico Cantone and Simone Faro, "Forward-Fast-Search: Another
Fast Variant of the Boyer-Moore String Matching Algorithm".
Dipartimento di Matematica e informatica, Universita di Catania, Italy,
2003, pp. 10-24.
[6] Edward Reingold, Kenneth Urban and David Gries, "K-M-P string
matching revisited". Department of Computer Science, Cornell
University, USA, 1997, pp. 217-223.
[7] Greg Plaxton, "String Matching: Boyer-Moore Algorithm", Theory in
Programming Practice. Department of Computer Science, University of
Texas at Austin. 2005.
[8] Ireille Régnier and Wojciech Szpankowski, "Complexity of Sequential
Pattern Matching Algorithms". Barcelona, Spain, 2004, pp.187-200.
[9] Jerome Mettetal and Ross Lippert, "Brute Force Algorithms: Motif
Finding". 2004, pp. 1-7.
[10] Jun Xu and Arnold Hagler, "Chemoinformatics and Drug Discovery",
Partners International. USA, 2002, pp. 566-600.
[11] Kanniah Rajasekaran, Gerald DeGray, Kanniah Rajasekaran, Franzine
Smith, John Sanford, and Henry Daniell. "Expression of an
Antimicrobial Peptide via the Chloroplast Genome to Control
Phytopathogenic Bacteria and Fungi", Department of Molecular Biology
and Microbiology and Center for Discovery of Drugs and Diagnostics,
University of Central Florida, Florida, 2001, pp. 203-210.
[12] Maxime Crochemore and Thierry Lecroq, "Pattern matching and text
compression algorithms". Chapter 2, pp. 12-14.
[13] NMRShiftDB. Available: http://nmrshiftdb.ice.mpg.de/nmrshiftdb.
(Accessed February, 2007).
[14] Olivier Danvy and Henning Korsholm Rohde, "Obtaining the Boyer-
Moore String-Matching Algorithm by Partial Evaluation". Department
of Computer Science University of Aarhus, 2005, pp. 1-9.
[15] Peter Ertl, JME Editor. Available: http://www.molinspration.com.
(Accessed February, 2007).
[16] Prasit Palittapongarnpim, "Thailand's bioinformatics initiatives", The
National Center for Genetic Engineering and Biotechnology and
Department of Microbiology. Faculty of Science, Mahidol University,
Bangkok, Thailand, 2002, pp. 6-8.
[17] Rahul Thathoo, Ashish Virmani, S. Sai Lakshmi, N. Balakrishnan and
K. Sekar1, "TVSBS: A fast exact pattern matching algorithm for
biological sequences". India, 2006, pp. 47-53.
[18] Richard L. Rowley, R. Jeremy Rowley, John L. Oscarson and W.
Vincent Wilding. "Development of an Automated SMILES Pattern
Matching Program to Facilitate the Prediction of Thermo physical
Properties by Group Contribution Methods", Department of Chemical
Engineering, Brigham Young University. Provo, Utah, 2001, pp. 1110-
1113.
[19] SMILES - A Simplified Chemical Language. Available:
http://www.daylight.com. (Accessed March, 2007).
[20] Thomas E. Besser, Paul S. Morley, Michael D. Apley, Derek P. Burney,
Paula J. Fedorka-Cray, Mark G. Papich, Josie L. Traub-Dargatz, and J.
Scott Weese. "Antimicrobial Drug Use in Veterinary Medicine", 2005,
pp. 617-629.
[21] Tim Bell, Matt Powell, Amar Mukherjee and Don Adjeroh, "Searching
BWT compressed text with the Boyer-Moore algorithm and binary
search". University of Central Florida, USA, 2001, pp. 1-10.
[22] TIMO RAITA, "Tuning the Boyer-Moore-Horspool String Searching
Algorithm". University of Turku, Finland, 1992, pp. 879-884.
[23] Werner Arber, Daniel Nathans and Hamilton Smith, "DNA Mapping and
Brute Force Algorithms". Berlin, Germany. pp. 1-29.
[24] Yusuke Shibata, Tetsuya Matsumoto, Masayuki Takeda, Ayumi
Shinohara and Setsuo Arikawa, "A Boyer-Moore Type Algorithm for
Compressed Pattern Matching". Montreal, Canada, 2004, pp.1-20.
[25] Peter Willet,John M Barnard and Geoffrey M. Down, 1998, Chemical
Similarity Searching, Krebs Institute for bimolecular research and
department of Information Studies,University of Sheffiled,UK, pp 983-
996.