Abstract: The biological function of an RNA molecule depends
on its structure. The objective of the alignment is finding the
homology between two or more RNA secondary structures. Knowing
the common functionalities between two RNA structures allows
a better understanding and a discovery of other relationships
between them. Besides, identifying non-coding RNAs -that is not
translated into a protein- is a popular application in which RNA
structural alignment is the first step A few methods for RNA
structure-to-structure alignment have been developed. Most of these
methods are partial structure-to-structure, sequence-to-structure, or
structure-to-sequence alignment. Less attention is given in the
literature to the use of efficient RNA structure representation and the
structure-to-structure alignment methods are lacking. In this paper,
we introduce an O(N2) Component-based Pairwise RNA Structure
Alignment (CompPSA) algorithm, where structures are given as
a component-based representation and where N is the maximum
number of components in the two structures. The proposed algorithm
compares the two RNA secondary structures based on their weighted
component features rather than on their base-pair details. Extensive
experiments are conducted illustrating the efficiency of the CompPSA
algorithm when compared to other approaches and on different real
and simulated datasets. The CompPSA algorithm shows an accurate
similarity measure between components. The algorithm gives the
flexibility for the user to align the two RNA structures based on
their weighted features (position, full length, and/or stem length).
Moreover, the algorithm proves scalability and efficiency in time and
memory performance.
Abstract: With the increase in popularity of mobile devices,
new and varied forms of malware have emerged. Consequently,
the organizations for cyberdefense have echoed the need to deploy
more effective defensive schemes adapted to the challenges posed
by these recent monitoring environments. In order to contribute to
their development, this paper presents a malware detection strategy
for mobile devices based on sequence alignment algorithms. Unlike
the previous proposals, only the system calls performed during the
startup of applications are studied. In this way, it is possible to
efficiently study in depth, the sequences of system calls executed
by the applications just downloaded from app stores, and initialize
them in a secure and isolated environment. As demonstrated in the
performed experimentation, most of the analyzed malicious activities
were successfully identified in their boot processes.
Abstract: DNA Barcode provides good sources of needed
information to classify living species. The classification problem has
to be supported with reliable methods and algorithms. To analyze
species regions or entire genomes, it becomes necessary to use the
similarity sequence methods. A large set of sequences can be
simultaneously compared using Multiple Sequence Alignment which
is known to be NP-complete. However, all the used methods are still
computationally very expensive and require significant computational
infrastructure. Our goal is to build predictive models that are highly
accurate and interpretable. In fact, our method permits to avoid the
complex problem of form and structure in different classes of
organisms. The empirical data and their classification performances
are compared with other methods. Evenly, in this study, we present
our system which is consisted of three phases. The first one, is called
transformation, is composed of three sub steps; Electron-Ion
Interaction Pseudopotential (EIIP) for the codification of DNA
Barcodes, Fourier Transform and Power Spectrum Signal Processing.
Moreover, the second phase step is an approximation; it is
empowered by the use of Multi Library Wavelet Neural Networks
(MLWNN). Finally, the third one, is called the classification of DNA
Barcodes, is realized by applying the algorithm of hierarchical
classification.
Abstract: Bacterial strains capable of degradation of malathion
from the domestic sewage were isolated by an enrichment culture
technique. Three bacterial strains were screened and identified as
Acinetobacter baumannii (AFA), Pseudomonas aeruginosa (PS1),
and Pseudomonas mendocina (PS2) based on morphological,
biochemical identification and 16S rRNA sequence analysis.
Acinetobacter baumannii AFA was the most efficient malathion
degrading bacterium, so used for further biodegradation study. AFA
was able to grow in mineral salt medium (MSM) supplemented with
malathion (100 mg/l) as a sole carbon source, and within 14 days,
84% of the initial dose was degraded by the isolate measured by high
performance liquid chromatography. Strain AFA could also degrade
other organophosphorus compounds including diazinon, chlorpyrifos
and fenitrothion. The effect of different culture conditions on the
degradation of malathion like inoculum density, other carbon or
nitrogen sources, temperature and shaking were examined.
Degradation of malathion and bacterial cell growth were accelerated
when culture media were supplemented with yeast extract, glucose
and citrate. The optimum conditions for malathion degradation by
strain AFA were; an inoculum density of 1.5x 10^12CFU/ml at 30°C
with shaking. A specific polymerase chain reaction primers were
designed manually using multiple sequence alignment of the
corresponding carboxylesterase enzymes of Acinetobacter species.
Sequencing result of amplified PCR product and phylogenetic
analysis showed low degree of homology with the other
carboxylesterase enzymes of Acinetobacter strains, so we suggested
that this enzyme is a novel esterase enzyme. Isolated bacterial strains
may have potential role for use in bioremediation of malathion
contaminated.
Abstract: Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.
Abstract: Due to the tremendous amount of information provided
by the World Wide Web (WWW) developing methods for mining
the structure of web-based documents is of considerable interest. In
this paper we present a similarity measure for graphs representing
web-based hypertext structures. Our similarity measure is mainly
based on a novel representation of a graph as linear integer strings,
whose components represent structural properties of the graph. The
similarity of two graphs is then defined as the optimal alignment of
the underlying property strings. In this paper we apply the well known
technique of sequence alignments for solving a novel and challenging
problem: Measuring the structural similarity of generalized trees.
In other words: We first transform our graphs considered as high
dimensional objects in linear structures. Then we derive similarity
values from the alignments of the property strings in order to
measure the structural similarity of generalized trees. Hence, we
transform a graph similarity problem to a string similarity problem for
developing a efficient graph similarity measure. We demonstrate that
our similarity measure captures important structural information by
applying it to two different test sets consisting of graphs representing
web-based document structures.
Abstract: A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represent one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the modern genetic code. Our results suggest that the Watson-Crick base pairing and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the later. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences.
Abstract: A total of twenty tensile biopsies were collected from
children undergoing tonsillectomy from teaching hospital ENT
department and Kurdistan private hospital in sulaimani city. All
biopsies were homogenized and cultured; the obtained bacterial
isolates were purified and identified by biochemical tests and VITEK
2 compact system. Among the twenty studied samples, only one
Pseudomonas putida with probability of 99% was isolated.
Antimicrobial susceptibility was carried out by disk diffusion
method, Pseudomonas putida showed resistance to all antibiotics
used except vancomycin. The isolate further subjected to PCR and
DNA sequence analysis of blaVIM gene using different set of primers
for different regions of VIM gene. The results were found to be PCR
positive for the blaVIM gene. To determine the sequence of blaVIM
gene, DNA sequencing performed. Sequence alignment of blaVIM
gene with previously recorded blaVIM gene in NCBI- database showed
that P. putida isolate have different blaVIM gene.
Abstract: Proteins or genes that have similar sequences are likely to perform the same function. One of the most widely used techniques for sequence comparison is sequence alignment. Sequence alignment allows mismatches and insertion/deletion, which represents biological mutations. Sequence alignment is usually performed only on two sequences. Multiple sequence alignment, is a natural extension of two-sequence alignment. In multiple sequence alignment, the emphasis is to find optimal alignment for a group of sequences. Several applicable techniques were observed in this research, from traditional method such as dynamic programming to the extend of widely used stochastic optimization method such as Genetic Algorithms (GAs) and Simulated Annealing. A framework with combination of Genetic Algorithm and Simulated Annealing is presented to solve Multiple Sequence Alignment problem. The Genetic Algorithm phase will try to find new region of solution while Simulated Annealing can be considered as an alignment improver for any near optimal solution produced by GAs.
Abstract: Source code retrieval is of immense importance in the software engineering field. The complex tasks of retrieving and extracting information from source code documents is vital in the development cycle of the large software systems. The two main subtasks which result from these activities are code duplication prevention and plagiarism detection. In this paper, we propose a Mohamed Amine Ouddan, and Hassane Essafi source code retrieval system based on two-level fingerprint representation, respectively the structural and the semantic information within a source code. A sequence alignment technique is applied on these fingerprints in order to quantify the similarity between source code portions. The specific purpose of the system is to detect plagiarism and duplicated code between programs written in different programming languages belonging to the same class, such as C, Cµ, Java and CSharp. These four languages are supported by the actual version of the system which is designed such that it may be easily adapted for any programming language.
Abstract: The Partitioned Global Address Space (PGAS) programming
paradigm offers ease-of-use in expressing parallelism
through a global shared address space while emphasizing performance
by providing locality awareness through the partitioning of
this address space. Therefore, the interest in PGAS programming
languages is growing and many new languages have emerged and
are becoming ubiquitously available on nearly all modern parallel
architectures. Recently, new parallel machines with multiple cores
are designed for targeting high performance applications. Most of the
efforts have gone into benchmarking but there are a few examples of
real high performance applications running on multicore machines.
In this paper, we present and evaluate a parallelization technique
for implementing a local DNA sequence alignment algorithm using
a PGAS based language, UPC (Unified Parallel C) on a chip
multithreading architecture, the UltraSPARC T1.
Abstract: Background: Dialign is a DNA/Protein alignment tool
for performing pairwise and multiple pairwise alignments through the
comparison of gap-free segments (fragments) between sequence
pairs. An alignment of two sequences is a chain of fragments, i.e
local gap-free pairwise alignments, with the highest total score.
METHOD: A new approach is defined in this article which relies on
the concept of using three-dimensional fragments – i.e. local threeway
alignments -- in the alignment process instead of twodimensional
ones. These three-dimensional fragments are gap-free
alignments constituting of equal-length segments belonging to three
distinct sequences. RESULTS: The obtained results showed good
improvments over the performance of DIALIGN.
Abstract: Spectrum is a scarce commodity, and considering the spectrum scarcity faced by the wireless-based service providers led to high congestion levels. Technical inefficiencies from pooled, since all networks share a common pool of channels, exhausting the available channels will force networks to block the services. Researchers found that cognitive radio (CR) technology may resolve the spectrum scarcity. A CR is a self-configuring entity in a wireless networking that senses its environment, tracks changes, and frequently exchanges information with their networks. However, CRN facing challenges and condition become worst while tracks changes i.e. reallocation of another under-utilized channels while primary network user arrives. In this paper, channels or resource reallocation technique based on DNA-inspired computing algorithm for CRN has been proposed.
Abstract: Multiple sequence alignment is a fundamental part in
many bioinformatics applications such as phylogenetic analysis.
Many alignment methods have been proposed. Each method gives a
different result for the same data set, and consequently generates a
different phylogenetic tree. Hence, the chosen alignment method
affects the resulting tree. However in the literature, there is no
evaluation of multiple alignment methods based on the comparison of
their phylogenetic trees. This work evaluates the following eight
aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN,
ProbCons and Align-m, based on their phylogenetic trees (test trees)
produced on a given data set. The Neighbor-Joining method is used
to estimate trees. Three criteria, namely, the dNNI, the dRF and the
Id_Tree are established to test the ability of different alignment
methods to produce closer test tree compared to the reference one
(true tree). Results show that the method which produces the most
accurate alignment gives the nearest test tree to the reference tree.
MUSCLE outperforms all aligners with respect to the three criteria
and for all datasets, performing particularly better when sequence
identities are within 10-20%. It is followed by T-Coffee at lower
sequence identity (30%), trees scores of all methods
become similar.
Abstract: Protein 3D structure prediction has always been an
important research area in bioinformatics. In particular, the
prediction of secondary structure has been a well-studied research
topic. Despite the recent breakthrough of combining multiple
sequence alignment information and artificial intelligence algorithms
to predict protein secondary structure, the Q3 accuracy of various
computational prediction algorithms rarely has exceeded 75%. In a
previous paper [1], this research team presented a rule-based method
called RT-RICO (Relaxed Threshold Rule Induction from Coverings)
to predict protein secondary structure. The average Q3 accuracy on
the sample datasets using RT-RICO was 80.3%, an improvement
over comparable computational methods. Although this demonstrated
that RT-RICO might be a promising approach for predicting
secondary structure, the algorithm-s computational complexity and
program running time limited its use. Herein a parallelized
implementation of a slightly modified RT-RICO approach is
presented. This new version of the algorithm facilitated the testing of
a much larger dataset of 396 protein domains [2]. Parallelized RTRICO
achieved a Q3 score of 74.6%, which is higher than the
consensus prediction accuracy of 72.9% that was achieved for the
same test dataset by a combination of four secondary structure
prediction methods [2].
Abstract: Bioinformatics and computational biology involve
the use of techniques including applied mathematics,
informatics, statistics, computer science, artificial intelligence,
chemistry, and biochemistry to solve biological problems
usually on the molecular level. Research in computational
biology often overlaps with systems biology. Major research
efforts in the field include sequence alignment, gene finding,
genome assembly, protein structure alignment, protein structure
prediction, prediction of gene expression and proteinprotein
interactions, and the modeling of evolution. Various
global rearrangements of permutations, such as reversals and
transpositions,have recently become of interest because of their
applications in computational molecular biology. A reversal is
an operation that reverses the order of a substring of a permutation.
A transposition is an operation that swaps two adjacent
substrings of a permutation. The problem of determining the
smallest number of reversals required to transform a given
permutation into the identity permutation is called sorting by
reversals. Similar problems can be defined for transpositions
and other global rearrangements. In this work we perform a
study about some genome rearrangement primitives. We show
how a genome is modelled by a permutation, introduce some
of the existing primitives and the lower and upper bounds
on them. We then provide a comparison of the introduced
primitives.