Abstract: In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.
Abstract: A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.
Abstract: As DNA microarray data contain relatively small
sample size compared to the number of genes, high dimensional
models are often employed. In high dimensional models, the selection
of tuning parameter (or, penalty parameter) is often one of the crucial
parts of the modeling. Cross-validation is one of the most common
methods for the tuning parameter selection, which selects a parameter
value with the smallest cross-validated score. However, selecting a
single value as an ‘optimal’ value for the parameter can be very
unstable due to the sampling variation since the sample sizes of
microarray data are often small. Our approach is to choose multiple candidates of tuning parameter
first, then average the candidates with different weights depending
on their performance. The additional step of estimating the weights
and averaging the candidates rarely increase the computational cost,
while it can considerably improve the traditional cross-validation. We
show that the selected value from the suggested methods often lead to
stable parameter selection as well as improved detection of significant
genetic variables compared to the tradition cross-validation via real
data and simulated data sets.
Abstract: Array-based gene expression analysis is a powerful
tool to profile expression of genes and to generate information on
therapeutic effects of new anti-cancer compounds. Anti-apoptotic
effect of thymoquinone was studied in MCF7 breast cancer cell line
using gene expression profiling with cDNA microarray. The purity
and yield of RNA samples were determined using RNeasyPlus Mini
kit. The Agilent RNA 6000 NanoLabChip kit evaluated the quantity
of the RNA samples. AffinityScript RT oligo-dT promoter primer
was used to generate cDNA strands. T7 RNA polymerase was used to
convert cDNA to cRNA. The cRNA samples and human universal
reference RNA were labelled with Cy-3-CTP and Cy-5-CTP,
respectively. Feature Extraction and GeneSpring softwares analysed
the data. The single experiment analysis revealed involvement of 64
pathways with up-regulated genes and 78 pathways with downregulated
genes. The MAPK and p38-MAPK pathways were
inhibited due to the up-regulation of PTPRR gene. The inhibition of
p38-MAPK suggested up-regulation of TGF-ß pathway. Inhibition of
p38-MAPK caused up-regulation of TP53 and down-regulation of
Bcl2 genes indicating involvement of intrinsic apoptotic pathway.
Down-regulation of CARD16 gene as an adaptor molecule regulated
CASP1 and suggested necrosis-like programmed cell death and
involvement of caspase in apoptosis. Furthermore, down-regulation
of GPCR, EGF-EGFR signalling pathways suggested reduction of
ER. Involvement of AhR pathway which control cytochrome P450
and glucuronidation pathways showed metabolism of Thymoquinone.
The findings showed differential expression of several genes in
apoptosis pathways with thymoquinone treatment in estrogen
receptor-positive breast cancer cells.
Abstract: Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.
Abstract: The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.
Abstract: Inferring the network structure from time series data
is a hard problem, especially if the time series is short and noisy.
DNA microarray is a technology allowing to monitor the mRNA
concentration of thousands of genes simultaneously that produces
data of these characteristics. In this study we try to investigate the
influence of the experimental design on the quality of the result.
More precisely, we investigate the influence of two different types of
random single gene perturbations on the inference of genetic networks
from time series data. To obtain an objective quality measure for
this influence we simulate gene expression values with a biologically
plausible model of a known network structure. Within this framework
we study the influence of single gene knock-outs in opposite to
linearly controlled expression for single genes on the quality of the
infered network structure.
Abstract: DNA microarray technology is widely used by
geneticists to diagnose or treat diseases through gene expression.
This technology is based on the hybridization of a tissue-s DNA
sequence into a substrate and the further analysis of the image
formed by the thousands of genes in the DNA as green, red or yellow
spots. The process of DNA microarray image analysis involves
finding the location of the spots and the quantification of the
expression level of these. In this paper, a tool to perform DNA
microarray image analysis is presented, including a spot addressing
method based on the image projections, the spot segmentation
through contour based segmentation and the extraction of relevant
information due to gene expression.
Abstract: DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.
Abstract: The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.
Abstract: The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.
Abstract: A series of microarray experiments produces observations
of differential expression for thousands of genes across multiple
conditions.
Principal component analysis(PCA) has been widely used in
multivariate data analysis to reduce the dimensionality of the data in
order to simplify subsequent analysis and allow for summarization of
the data in a parsimonious manner. PCA, which can be implemented
via a singular value decomposition(SVD), is useful for analysis of
microarray data.
For application of PCA using SVD we use the DNA microarray
data for the small round blue cell tumors(SRBCT) of childhood
by Khan et al.(2001). To decide the number of components which
account for sufficient amount of information we draw scree plot.
Biplot, a graphic display associated with PCA, reveals important
features that exhibit relationship between variables and also the
relationship of variables with observations.
Abstract: Using DNA microarrays the comparative analysis of a
gene expression profiles is carried out in a liver and kidneys of pigs.
The hypothesis of a cross hybridization of one probe with different
cDNA sites of the same gene or different genes is checked up, and it
is shown, that cross hybridization can be a source of essential errors
at revealing of a key genes in organ-specific transcriptome. It is
reveald that distinctions in profiles of a gene expression are well coordinated
with function, morphology, biochemistry and histology of
these organs.
Abstract: In this paper we present a method for gene ranking
from DNA microarray data. More precisely, we calculate the correlation
networks, which are unweighted and undirected graphs, from
microarray data of cervical cancer whereas each network represents
a tissue of a certain tumor stage and each node in the network
represents a gene. From these networks we extract one tree for
each gene by a local decomposition of the correlation network. The
interpretation of a tree is that it represents the n-nearest neighbor
genes on the n-th level of a tree, measured by the Dijkstra distance,
and, hence, gives the local embedding of a gene within the correlation
network. For the obtained trees we measure the pairwise similarity
between trees rooted by the same gene from normal to cancerous
tissues. This evaluates the modification of the tree topology due to
progression of the tumor. Finally, we rank the obtained similarity
values from all tissue comparisons and select the top ranked genes.
For these genes the local neighborhood in the correlation networks
changes most between normal and cancerous tissues. As a result
we find that the top ranked genes are candidates suspected to be
involved in tumor growth and, hence, indicates that our method
captures essential information from the underlying DNA microarray
data of cervical cancer.