Abstract: The small interfering RNA (siRNA) alters the
regulatory role of mRNA during gene expression by translational
inhibition. Recent studies show that upregulation of mRNA because
serious diseases like cancer. So designing effective siRNA with good
knockdown effects plays an important role in gene silencing. Various
siRNA design tools had been developed earlier. In this work, we are
trying to analyze the existing good scoring second generation siRNA
predicting tools and to optimize the efficiency of siRNA prediction
by designing a computational model using Artificial Neural Network
and whole stacking energy (%G), which may help in gene silencing
and drug design in cancer therapy. Our model is trained and tested
against a large data set of siRNA sequences. Validation of our results
is done by finding correlation coefficient of experimental versus
observed inhibition efficacy of siRNA. We achieved a correlation
coefficient of 0.727 in our previous computational model and we
could improve the correlation coefficient up to 0.753 when the
threshold of whole tacking energy is greater than or equal to -32.5
kcal/mol.
Abstract: A gene network gives the knowledge of the regulatory
relationships among the genes. Each gene has its activators and
inhibitors that regulate its expression positively and negatively
respectively. Genes themselves are believed to act as activators and
inhibitors of other genes. They can even activate one set of genes and
inhibit another set. Identifying gene networks is one of the most
crucial and challenging problems in Bioinformatics. Most work done
so far either assumes that there is no time delay in gene regulation or
there is a constant time delay. We here propose a Dynamic Time-
Lagged Correlation Based Method (DTCBM) to learn the gene
networks, which uses time-lagged correlation to find the potential
gene interactions, and then uses a post-processing stage to remove
false gene interactions to common parents, and finally uses dynamic
correlation thresholds for each gene to construct the gene network.
DTCBM finds correlation between gene expression signals shifted in
time, and therefore takes into consideration the multi time delay
relationships among the genes. The implementation of our method is
done in MATLAB and experimental results on Saccharomyces
cerevisiae gene expression data and comparison with other methods
indicate that it has a better performance.
Abstract: Using Dynamic Bayesian Networks (DBN) to model genetic regulatory networks from gene expression data is one of the major paradigms for inferring the interactions among genes. Averaging a collection of models for predicting network is desired, rather than relying on a single high scoring model. In this paper, two kinds of model searching approaches are compared, which are Greedy hill-climbing Search with Restarts (GSR) and Markov Chain Monte Carlo (MCMC) methods. The GSR is preferred in many papers, but there is no such comparison study about which one is better for DBN models. Different types of experiments have been carried out to try to give a benchmark test to these approaches. Our experimental results demonstrated that on average the MCMC methods outperform the GSR in accuracy of predicted network, and having the comparable performance in time efficiency. By proposing the different variations of MCMC and employing simulated annealing strategy, the MCMC methods become more efficient and stable. Apart from comparisons between these approaches, another objective of this study is to investigate the feasibility of using DBN modeling approaches for inferring gene networks from few snapshots of high dimensional gene profiles. Through synthetic data experiments as well as systematic data experiments, the experimental results revealed how the performances of these approaches can be influenced as the target gene network varies in the network size, data size, as well as system complexity.
Abstract: Microarray data profiles gene expression on a whole
genome scale, therefore, it provides a good way to study associations
between gene expression and occurrence or progression of cancer.
More and more researchers realized that microarray data is helpful
to predict cancer sample. However, the high dimension of gene
expressions is much larger than the sample size, which makes this
task very difficult. Therefore, how to identify the significant genes
causing cancer becomes emergency and also a hot and hard research
topic. Many feature selection algorithms have been proposed in
the past focusing on improving cancer predictive accuracy at the
expense of ignoring the correlations between the features. In this
work, a novel framework (named by SGS) is presented for stable gene
selection and efficient cancer prediction . The proposed framework
first performs clustering algorithm to find the gene groups where
genes in each group have higher correlation coefficient, and then
selects the significant genes in each group with Bayesian Lasso and
important gene groups with group Lasso, and finally builds prediction
model based on the shrinkage gene space with efficient classification
algorithm (such as, SVM, 1NN, Regression and etc.). Experiment
results on real world data show that the proposed framework often
outperforms the existing feature selection and prediction methods,
say SAM, IG and Lasso-type prediction model.
Abstract: Gene expression profiling is rapidly evolving into a
powerful technique for investigating tumor malignancies. The
researchers are overwhelmed with the microarray-based platforms
and methods that confer them the freedom to conduct large-scale
gene expression profiling measurements. Simultaneously,
investigations into cross-platform integration methods have started
gaining momentum due to their underlying potential to help
comprehend a myriad of broad biological issues in tumor diagnosis,
prognosis, and therapy. However, comparing results from different
platforms remains to be a challenging task as various inherent
technical differences exist between the microarray platforms. In this
paper, we explain a simple ratio-transformation method, which can
provide some common ground for cDNA and Affymetrix platform
towards cross-platform integration. The method is based on the
characteristic data attributes of Affymetrix- and cDNA- platform. In
the work, we considered seven childhood leukemia patients and their
gene expression levels in either platform. With a dataset of 822
differentially expressed genes from both these platforms, we carried
out a specific ratio-treatment to Affymetrix data, which subsequently
showed an improvement in the relationship with the cDNA data.
Abstract: Sparse representation which can represent high dimensional
data effectively has been successfully used in computer vision
and pattern recognition problems. However, it doesn-t consider the
label information of data samples. To overcome this limitation,
we develop a novel dimensionality reduction algorithm namely
dscriminatively regularized sparse subspace learning(DR-SSL) in this
paper. The proposed DR-SSL algorithm can not only make use of
the sparse representation to model the data, but also can effective
employ the label information to guide the procedure of dimensionality
reduction. In addition,the presented algorithm can effectively deal
with the out-of-sample problem.The experiments on gene-expression
data sets show that the proposed algorithm is an effective tool for
dimensionality reduction and gene-expression data classification.
Abstract: Recent years have seen a growing trend towards the
integration of multiple information sources to support large-scale
prediction of protein-protein interaction (PPI) networks in model
organisms. Despite advances in computational approaches, the
combination of multiple “omic" datasets representing the same type
of data, e.g. different gene expression datasets, has not been
rigorously studied. Furthermore, there is a need to further investigate
the inference capability of powerful approaches, such as fullyconnected
Bayesian networks, in the context of the prediction of PPI
networks. This paper addresses these limitations by proposing a
Bayesian approach to integrate multiple datasets, some of which
encode the same type of “omic" data to support the identification of
PPI networks. The case study reported involved the combination of
three gene expression datasets relevant to human heart failure (HF).
In comparison with two traditional methods, Naive Bayesian and
maximum likelihood ratio approaches, the proposed technique can
accurately identify known PPI and can be applied to infer potentially
novel interactions.
Abstract: Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.
Abstract: DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.
Abstract: The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.
Abstract: This paper gives a novel method for improving
classification performance for cancer classification with very few
microarray Gene expression data. The method employs classification
with individual gene ranking and gene subset ranking. For selection
and classification, the proposed method uses the same classifier. The
method is applied to three publicly available cancer gene expression
datasets from Lymphoma, Liver and Leukaemia datasets. Three
different classifiers namely Support vector machines-one against all
(SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant
analysis (LDA) were tested and the results indicate the improvement
in performance of SVM-OAA classifier with satisfactory results on
all the three datasets when compared with the other two classifiers.
Abstract: Prediction of bacterial virulent protein sequences can
give assistance to identification and characterization of novel
virulence-associated factors and discover drug/vaccine targets against
proteins indispensable to pathogenicity. Gene Ontology (GO)
annotation which describes functions of genes and gene products as a
controlled vocabulary of terms has been shown effectively for a
variety of tasks such as gene expression study, GO annotation
prediction, protein subcellular localization, etc. In this study, we
propose a sequence-based method Virulent-GO by mining informative
GO terms as features for predicting bacterial virulent proteins.
Each protein in the datasets used by the existing method
VirulentPred is annotated by using BLAST to obtain its homologies
with known accession numbers for retrieving GO terms. After
investigating various popular classifiers using the same five-fold
cross-validation scheme, Virulent-GO using the single kind of GO
term features with an accuracy of 82.5% is slightly better than
VirulentPred with 81.8% using five kinds of sequence-based features.
For the evaluation of independent test, Virulent-GO also yields better
results (82.0%) than VirulentPred (80.7%). When evaluating single
kind of feature with SVM, the GO term feature performs much well,
compared with each of the five kinds of features.
Abstract: It has been established that microRNAs (miRNAs) play
an important role in gene expression by post-transcriptional regulation
of messengerRNAs (mRNAs). However, the precise relationships
between microRNAs and their target genes in sense of numbers,
types and biological relevance remain largely unclear. Dissecting the
miRNA-target relationships will render more insights for miRNA
targets identification and validation therefore promote the understanding
of miRNA function. In miRBase, miRanda is the key
algorithm used for target prediction for Zebrafish. This algorithm
is high-throughput but brings lots of false positives (noise). Since
validation of a large scale of targets through laboratory experiments
is very time consuming, several computational methods for miRNA
targets validation should be developed. In this paper, we present an
integrative method to investigate several aspects of the relationships
between miRNAs and their targets with the final purpose of extracting
high confident targets from miRanda predicted targets pool. This is
achieved by using the techniques ranging from statistical tests to
clustering and association rules. Our research focuses on Zebrafish.
It was found that validated targets do not necessarily associate with
the highest sequence matching. Besides, for some miRNA families,
the frequency of their predicted targets is significantly higher in the
genomic region nearby their own physical location. Finally, in a case
study of dre-miR-10 and dre-miR-196, it was found that the predicted
target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR-
10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar
characteristics as validated target genes and therefore represent high
confidence target candidates.
Abstract: MATCH project [1] entitle the development of an
automatic diagnosis system that aims to support treatment of colon
cancer diseases by discovering mutations that occurs to tumour
suppressor genes (TSGs) and contributes to the development of
cancerous tumours. The constitution of the system is based on a)
colon cancer clinical data and b) biological information that will be
derived by data mining techniques from genomic and proteomic
sources The core mining module will consist of the popular, well
tested hybrid feature extraction methods, and new combined
algorithms, designed especially for the project. Elements of rough
sets, evolutionary computing, cluster analysis, self-organization maps
and association rules will be used to discover the annotations
between genes, and their influence on tumours [2]-[11].
The methods used to process the data have to address their high
complexity, potential inconsistency and problems of dealing with the
missing values. They must integrate all the useful information
necessary to solve the expert's question. For this purpose, the system
has to learn from data, or be able to interactively specify by a domain
specialist, the part of the knowledge structure it needs to answer a
given query. The program should also take into account the
importance/rank of the particular parts of data it analyses, and adjusts
the used algorithms accordingly.
Abstract: Heat-inducible gene expression vectors are useful for hyperthermia-induced cancer gene therapy, because the combination
of hyperthermia and gene therapy can considerably improve the therapeutic effects. In the present study, we developed an enhanced
heat-inducible transgene expression system in which a heat-shock
protein (HSP) promoter and tetracycline-responsive transactivator
were combined. When the transactivator plasmid containing the
tetracycline-responsive transactivator gene was co-transfected with
the reporter gene expression plasmid, a high level of heat-induced gene expression was observed compared with that using the HSP
promoter without the transactivator. In vitro evaluation of the
therapeutic effect using HeLa cells showed that heat-induced therapeutic gene expression caused cell death in a high percentage of
these cells, indicating that this strategy is promising for cancer gene therapy.
Abstract: Using DNA microarrays the comparative analysis of a
gene expression profiles is carried out in a liver and kidneys of pigs.
The hypothesis of a cross hybridization of one probe with different
cDNA sites of the same gene or different genes is checked up, and it
is shown, that cross hybridization can be a source of essential errors
at revealing of a key genes in organ-specific transcriptome. It is
reveald that distinctions in profiles of a gene expression are well coordinated
with function, morphology, biochemistry and histology of
these organs.
Abstract: Attachment of the circulating monocytes to the
endothelium is the earliest detectable events during formation of
atherosclerosis. The adhesion molecules, chemokines and matrix
proteases genes were identified to be expressed in atherogenesis.
Expressions of these genes may influence structural integrity of the
luminal endothelium. The aim of this study is to relate changes in the
ultrastructural morphology of the aortic luminal surface and gene
expressions of the endothelial surface, chemokine and MMP-12 in
normal and hypercholesterolemic rabbits. Luminal endothelial
surface from rabbit aortic tissue was examined by scanning electron
microscopy (SEM) using low vacuum mode to ascertain
ultrastructural changes in development of atherosclerotic lesion. Gene
expression of adhesion molecules, MCP-1 and MMP-12 were studied
by Real-time PCR. Ultrastructural observations of the aortic luminal
surface exhibited changes from normal regular smooth intact
endothelium to irregular luminal surface including marked globular
appearance and ruptures of the membrane layer. Real-time PCR
demonstrated differentially expressed of studied genes in
atherosclerotic tissues. The appearance of ultrastructural changes in
aortic tissue of hypercholesterolemic rabbits is suggested to have
relation with underlying changes of endothelial surface molecules,
chemokine and MMP-12 gene expressions.
Abstract: Bioinformatics and computational biology involve
the use of techniques including applied mathematics,
informatics, statistics, computer science, artificial intelligence,
chemistry, and biochemistry to solve biological problems
usually on the molecular level. Research in computational
biology often overlaps with systems biology. Major research
efforts in the field include sequence alignment, gene finding,
genome assembly, protein structure alignment, protein structure
prediction, prediction of gene expression and proteinprotein
interactions, and the modeling of evolution. Various
global rearrangements of permutations, such as reversals and
transpositions,have recently become of interest because of their
applications in computational molecular biology. A reversal is
an operation that reverses the order of a substring of a permutation.
A transposition is an operation that swaps two adjacent
substrings of a permutation. The problem of determining the
smallest number of reversals required to transform a given
permutation into the identity permutation is called sorting by
reversals. Similar problems can be defined for transpositions
and other global rearrangements. In this work we perform a
study about some genome rearrangement primitives. We show
how a genome is modelled by a permutation, introduce some
of the existing primitives and the lower and upper bounds
on them. We then provide a comparison of the introduced
primitives.
Abstract: Cancers could normally be marked by a number of
differentially expressed genes which show enormous potential as
biomarkers for a certain disease. Recent years, cancer classification
based on the investigation of gene expression profiles derived by
high-throughput microarrays has widely been used. The selection of
discriminative genes is, therefore, an essential preprocess step in
carcinogenesis studies. In this paper, we have proposed a novel gene
selector using information-theoretic measures for biological
discovery. This multivariate filter is a four-stage framework through
the analyses of feature relevance, feature interdependence, feature
redundancy-dependence and subset rankings, and having been
examined on the colon cancer data set. Our experimental result show
that the proposed method outperformed other information theorem
based filters in all aspect of classification errors and classification
performance.
Abstract: The most common result of analysis of highthroughput
data in molecular biology represents a global list of
genes, ranked accordingly to a certain score. The score can be a
measure of differential expression. Recent work proposed a new
method for selecting a number of genes in a ranked gene list from
microarray gene expression data such that this set forms the
Optimally Functionally Enriched Network (OFTEN), formed by
known physical interactions between genes or their products. Here
we present calculation results of relative connectivity of genes from
META-OFTEN network and tentative biological interpretation of the
most reproducible signal. The relative connectivity and
inbetweenness values of genes from META-OFTEN network were
estimated.