Abstract: With the rapid development in the field of life
sciences and the flooding of genomic information, the need for faster
and scalable searching methods has become urgent. One of the
approaches that were investigated is indexing. The indexing methods
have been categorized into three categories which are the lengthbased
index algorithms, transformation-based algorithms and mixed
techniques-based algorithms. In this research, we focused on the
transformation based methods. We embedded the N-gram method
into the transformation-based method to build an inverted index
table. We then applied the parallel methods to speed up the index
building time and to reduce the overall retrieval time when querying
the genomic database. Our experiments show that the use of N-Gram
transformation algorithm is an economical solution; it saves time and
space too. The result shows that the size of the index is smaller than
the size of the dataset when the size of N-Gram is 5 and 6. The
parallel N-Gram transformation algorithm-s results indicate that the
uses of parallel programming with large dataset are promising which
can be improved further.
Abstract: The size, complexity and number of databases used
for protein information have caused bioinformatics to lag behind in
adapting to the need to handle this distributed information.
Integrating all the information from different databases into one
database is a challenging problem. Our main research is to develop a
tool which can be used to access and manipulate protein information
from difference databases. In our approach, we have integrated
difference databases such as Swiss-prot, PDB, Interpro, and EMBL
and transformed these databases in flat file format into relational
form using XML and Bioperl. As a result, we showed this tool can
search different sizes of protein information stored in relational
database and the result can be retrieved faster compared to flat file
database. A web based user interface is provided to allow user to
access or search for protein information in the local database.