Abstract: In this paper, we propose an improvement of pattern
growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree
framework and separator database to reduce the execution time and
memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result
shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.
Abstract: In this paper, we propose an efficient hierarchical DNA
sequence search method to improve the search speed while the
accuracy is being kept constant. For a given query DNA sequence,
firstly, a fast local search method using histogram features is used as a
filtering mechanism before scanning the sequences in the database.
An overlapping processing is newly added to improve the robustness
of the algorithm. A large number of DNA sequences with low
similarity will be excluded for latter searching. The Smith-Waterman
algorithm is then applied to each remainder sequences. Experimental
results using GenBank sequence data show the proposed method
combining histogram information and Smith-Waterman algorithm is
more efficient for DNA sequence search.
Abstract: Due to the ever growing amount of publications about
protein-protein interactions, information extraction from text is
increasingly recognized as one of crucial technologies in
bioinformatics. This paper presents a Protein Interaction Extraction
System using a Link Grammar Parser from biomedical abstracts
(PIELG). PIELG uses linkage given by the Link Grammar Parser to
start a case based analysis of contents of various syntactic roles as
well as their linguistically significant and meaningful combinations.
The system uses phrasal-prepositional verbs patterns to overcome
preposition combinations problems. The recall and precision are
74.4% and 62.65%, respectively. Experimental evaluations with two
other state-of-the-art extraction systems indicate that PIELG system
achieves better performance. For further evaluation, the system is
augmented with a graphical package (Cytoscape) for extracting
protein interaction information from sequence databases. The result
shows that the performance is remarkably promising.
Abstract: The size, complexity and number of databases used
for protein information have caused bioinformatics to lag behind in
adapting to the need to handle this distributed information.
Integrating all the information from different databases into one
database is a challenging problem. Our main research is to develop a
tool which can be used to access and manipulate protein information
from difference databases. In our approach, we have integrated
difference databases such as Swiss-prot, PDB, Interpro, and EMBL
and transformed these databases in flat file format into relational
form using XML and Bioperl. As a result, we showed this tool can
search different sizes of protein information stored in relational
database and the result can be retrieved faster compared to flat file
database. A web based user interface is provided to allow user to
access or search for protein information in the local database.