Abstract: The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.
Abstract: In molecular biology, microarray technology is widely and successfully utilized to efficiently measure gene activity. If working with less studied organisms, methods to design custom-made microarray probes are available. One design criterion is to select probes with minimal melting temperature variances thus ensuring similar hybridization properties. If the microarray application focuses on the investigation of metabolic pathways, it is not necessary to cover the whole genome. It is more efficient to cover each metabolic pathway with a limited number of genes. Firstly, an approach is presented which minimizes the overall melting temperature variance of selected probes for all genes of interest. Secondly, the approach is extended to include the additional constraints of covering all pathways with a limited number of genes while minimizing the overall variance. The new optimization problem is solved by a bottom-up programming approach which reduces the complexity to make it computationally feasible. The new method is exemplary applied for the selection of microarray probes in order to cover all fungal secondary metabolite gene clusters for Aspergillus terreus.
Abstract: A series of microarray experiments produces observations
of differential expression for thousands of genes across multiple
conditions.
Principal component analysis(PCA) has been widely used in
multivariate data analysis to reduce the dimensionality of the data in
order to simplify subsequent analysis and allow for summarization of
the data in a parsimonious manner. PCA, which can be implemented
via a singular value decomposition(SVD), is useful for analysis of
microarray data.
For application of PCA using SVD we use the DNA microarray
data for the small round blue cell tumors(SRBCT) of childhood
by Khan et al.(2001). To decide the number of components which
account for sufficient amount of information we draw scree plot.
Biplot, a graphic display associated with PCA, reveals important
features that exhibit relationship between variables and also the
relationship of variables with observations.
Abstract: This paper gives a novel method for improving
classification performance for cancer classification with very few
microarray Gene expression data. The method employs classification
with individual gene ranking and gene subset ranking. For selection
and classification, the proposed method uses the same classifier. The
method is applied to three publicly available cancer gene expression
datasets from Lymphoma, Liver and Leukaemia datasets. Three
different classifiers namely Support vector machines-one against all
(SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant
analysis (LDA) were tested and the results indicate the improvement
in performance of SVM-OAA classifier with satisfactory results on
all the three datasets when compared with the other two classifiers.
Abstract: MATCH project [1] entitle the development of an
automatic diagnosis system that aims to support treatment of colon
cancer diseases by discovering mutations that occurs to tumour
suppressor genes (TSGs) and contributes to the development of
cancerous tumours. The constitution of the system is based on a)
colon cancer clinical data and b) biological information that will be
derived by data mining techniques from genomic and proteomic
sources The core mining module will consist of the popular, well
tested hybrid feature extraction methods, and new combined
algorithms, designed especially for the project. Elements of rough
sets, evolutionary computing, cluster analysis, self-organization maps
and association rules will be used to discover the annotations
between genes, and their influence on tumours [2]-[11].
The methods used to process the data have to address their high
complexity, potential inconsistency and problems of dealing with the
missing values. They must integrate all the useful information
necessary to solve the expert's question. For this purpose, the system
has to learn from data, or be able to interactively specify by a domain
specialist, the part of the knowledge structure it needs to answer a
given query. The program should also take into account the
importance/rank of the particular parts of data it analyses, and adjusts
the used algorithms accordingly.
Abstract: Using DNA microarrays the comparative analysis of a
gene expression profiles is carried out in a liver and kidneys of pigs.
The hypothesis of a cross hybridization of one probe with different
cDNA sites of the same gene or different genes is checked up, and it
is shown, that cross hybridization can be a source of essential errors
at revealing of a key genes in organ-specific transcriptome. It is
reveald that distinctions in profiles of a gene expression are well coordinated
with function, morphology, biochemistry and histology of
these organs.
Abstract: Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.
Abstract: Cancers could normally be marked by a number of
differentially expressed genes which show enormous potential as
biomarkers for a certain disease. Recent years, cancer classification
based on the investigation of gene expression profiles derived by
high-throughput microarrays has widely been used. The selection of
discriminative genes is, therefore, an essential preprocess step in
carcinogenesis studies. In this paper, we have proposed a novel gene
selector using information-theoretic measures for biological
discovery. This multivariate filter is a four-stage framework through
the analyses of feature relevance, feature interdependence, feature
redundancy-dependence and subset rankings, and having been
examined on the colon cancer data set. Our experimental result show
that the proposed method outperformed other information theorem
based filters in all aspect of classification errors and classification
performance.
Abstract: In this paper we present a method for gene ranking
from DNA microarray data. More precisely, we calculate the correlation
networks, which are unweighted and undirected graphs, from
microarray data of cervical cancer whereas each network represents
a tissue of a certain tumor stage and each node in the network
represents a gene. From these networks we extract one tree for
each gene by a local decomposition of the correlation network. The
interpretation of a tree is that it represents the n-nearest neighbor
genes on the n-th level of a tree, measured by the Dijkstra distance,
and, hence, gives the local embedding of a gene within the correlation
network. For the obtained trees we measure the pairwise similarity
between trees rooted by the same gene from normal to cancerous
tissues. This evaluates the modification of the tree topology due to
progression of the tumor. Finally, we rank the obtained similarity
values from all tissue comparisons and select the top ranked genes.
For these genes the local neighborhood in the correlation networks
changes most between normal and cancerous tissues. As a result
we find that the top ranked genes are candidates suspected to be
involved in tumor growth and, hence, indicates that our method
captures essential information from the underlying DNA microarray
data of cervical cancer.
Abstract: The most common result of analysis of highthroughput
data in molecular biology represents a global list of
genes, ranked accordingly to a certain score. The score can be a
measure of differential expression. Recent work proposed a new
method for selecting a number of genes in a ranked gene list from
microarray gene expression data such that this set forms the
Optimally Functionally Enriched Network (OFTEN), formed by
known physical interactions between genes or their products. Here
we present calculation results of relative connectivity of genes from
META-OFTEN network and tentative biological interpretation of the
most reproducible signal. The relative connectivity and
inbetweenness values of genes from META-OFTEN network were
estimated.
Abstract: Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.