Abstract: Feature Selection is significant in order to perform constructive classification in the area of cancer diagnosis. However, a large number of features compared to the number of samples makes the task of classification computationally very hard and prone to errors in microarray gene expression datasets. In this paper, we present an innovative method for selecting highly informative gene subsets of gene expression data that effectively classifies the cancer data into tumorous and non-tumorous. The hybrid gene selection technique comprises of combined Mutual Information and Fisher score to select informative genes. The gene selection is validated by classification using Support Vector Machine (SVM) which is a supervised learning algorithm capable of solving complex classification problems. The results obtained from improved Mutual Information and F-Score with SVM as a classifier has produced efficient results.
Abstract: A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.
Abstract: The inherent skin patterns created at the joints in the
finger exterior are referred as finger knuckle-print. It is exploited to
identify a person in a unique manner because the finger knuckle print
is greatly affluent in textures. In biometric system, the region of
interest is utilized for the feature extraction algorithm. In this paper,
local and global features are extracted separately. Fast Discrete
Orthonormal Stockwell Transform is exploited to extract the local
features. Global feature is attained by escalating the size of Fast
Discrete Orthonormal Stockwell Transform to infinity. Two features
are fused to increase the recognition accuracy. A matching distance is
calculated for both the features individually. Then two distances are
merged mutually to acquire the final matching distance. The
proposed scheme gives the better performance in terms of equal error
rate and correct recognition rate.
Abstract: The use of eXtensible Markup Language (XML) in
web, business and scientific databases lead to the development of
methods, techniques and systems to manage and analyze XML data.
Semi-structured documents suffer due to its heterogeneity and
dimensionality. XML structure and content mining represent
convergence for research in semi-structured data and text mining. As
the information available on the internet grows drastically, extracting
knowledge from XML documents becomes a harder task. Certainly,
documents are often so large that the data set returned as answer to a
query may also be very big to convey the required information. To
improve the query answering, a Semantic Tree Based Association
Rule (STAR) mining method is proposed. This method provides
intentional information by considering the structure, content and the
semantics of the content. The method is applied on Reuter’s dataset
and the results show that the proposed method outperforms well.
Abstract: Microarray technology is universally used in the study
of disease diagnosis using gene expression levels. The main
shortcoming of gene expression data is that it includes thousands of
genes and a small number of samples. Abundant methods and
techniques have been proposed for tumor classification using
microarray gene expression data. Feature or gene selection methods
can be used to mine the genes that directly involve in the
classification and to eliminate irrelevant genes. In this paper
statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR)
and F-Statistics are used to rank the genes. The ranked genes are used
for further classification. Particle Swarm Optimization (PSO)
algorithm and Shuffled Frog Leaping (SFL) algorithm are used to
find the significant genes from the top-m ranked genes. The Naïve
Bayes Classifier (NBC) is used to classify the samples based on the
significant genes. The proposed work is applied on Lung and Ovarian
datasets. The experimental results show that the proposed method
achieves 100% accuracy in all the three datasets and the results are
compared with previous works.
Abstract: Search is the most obvious application of information
retrieval. The variety of widely obtainable biomedical data is
enormous and is expanding fast. This expansion makes the existing
techniques are not enough to extract the most interesting patterns
from the collection as per the user requirement. Recent researches are
concentrating more on semantic based searching than the traditional
term based searches. Algorithms for semantic searches are
implemented based on the relations exist between the words of the
documents. Ontologies are used as domain knowledge for identifying
the semantic relations as well as to structure the data for effective
information retrieval. Annotation of data with concepts of ontology is
one of the wide-ranging practices for clustering the documents. In
this paper, indexing based on concept and annotation are proposed
for clustering the biomedical documents. Fuzzy c-means (FCM)
clustering algorithm is used to cluster the documents. The
performances of the proposed methods are analyzed with traditional
term based clustering for PubMed articles in five different diseases
communities. The experimental results show that the proposed
methods outperform the term based fuzzy clustering.
Abstract: This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.
Abstract: Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.
Abstract: Privacy and Security have emerged as an important research issue in Mobile Ad Hoc Networks (MANET) due to its unique nature such as scarce of resources and absence of centralized authority. There are number of protocols have been proposed to provide privacy and security for data communication in an adverse environment, but those protocols are compromised in many ways by the attackers. The concept of anonymity (in terms of unlinkability and unobservability) and pseudonymity has been introduced in this paper to ensure privacy and security. In this paper, a Secure Onion Throat (SOT) protocol is proposed to provide complete anonymity in an adverse environment. The SOT protocol is designed based on the combination of group signature and onion routing with ID-based encryption for route discovery. The security analysis demonstrates the performance of SOT protocol against all categories of attacks. The simulation results ensure the necessity and importance of the proposed SOT protocol in achieving such anonymity.
Abstract: The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.
Abstract: Bloom filter is a probabilistic and memory efficient
data structure designed to answer rapidly whether an element is
present in a set. It tells that the element is definitely not in the set but
its presence is with certain probability. The trade-off to use Bloom
filter is a certain configurable risk of false positives. The odds of a
false positive can be made very low if the number of hash function is
sufficiently large. For spam detection, weight is attached to each set
of elements. The spam weight for a word is a measure used to rate the
e-mail. Each word is assigned to a Bloom filter based on its weight.
The proposed work introduces an enhanced concept in Bloom filter
called Bin Bloom Filter (BBF). The performance of BBF over
conventional Bloom filter is evaluated under various optimization
techniques. Real time data set and synthetic data sets are used for
experimental analysis and the results are demonstrated for bin sizes 4,
5, 6 and 7. Finally analyzing the results, it is found that the BBF
which uses heuristic techniques performs better than the traditional
Bloom filter in spam detection.
Abstract: Tumor classification is a key area of research in the
field of bioinformatics. Microarray technology is commonly used in
the study of disease diagnosis using gene expression levels. The
main drawback of gene expression data is that it contains thousands
of genes and a very few samples. Feature selection methods are used
to select the informative genes from the microarray. These methods
considerably improve the classification accuracy. In the proposed
method, Genetic Algorithm (GA) is used for effective feature
selection. Informative genes are identified based on the T-Statistics,
Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate
solutions of GA are obtained from top-m informative genes. The
classification accuracy of k-Nearest Neighbor (kNN) method is used
as the fitness function for GA. In this work, kNN and Support Vector
Machine (SVM) are used as the classifiers. The experimental results
show that the proposed work is suitable for effective feature
selection. With the help of the selected genes, GA-kNN method
achieves 100% accuracy in 4 datasets and GA-SVM method
achieves in 5 out of 10 datasets. The GA with kNN and SVM
methods are demonstrated to be an accurate method for microarray
based tumor classification.