Abstract: Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.
Abstract: Covering-based rough sets is an extension of rough
sets and it is based on a covering instead of a partition of the
universe. Therefore it is more powerful in describing some practical
problems than rough sets. However, by extending the rough sets,
covering-based rough sets can increase the roughness of each model
in recognizing objects. How to obtain better approximations from
the models of a covering-based rough sets is an important issue.
In this paper, two concepts, determinate elements and indeterminate
elements in a universe, are proposed and given precise definitions
respectively. This research makes a reasonable refinement of the
covering-element from a new viewpoint. And the refinement may
generate better approximations of covering-based rough sets models.
To prove the theory above, it is applied to eight major coveringbased
rough sets models which are adapted from other literature.
The result is, in all these models, the lower approximation increases
effectively. Correspondingly, in all models, the upper approximation
decreases with exceptions of two models in some special situations.
Therefore, the roughness of recognizing objects is reduced. This
research provides a new approach to the study and application of
covering-based rough sets.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Knowledge Discovery in Databases (KDD) has
evolved into an important and active area of research because of
theoretical challenges and practical applications associated with the
problem of discovering (or extracting) interesting and previously
unknown knowledge from very large real-world databases. Rough
Set Theory (RST) is a mathematical formalism for representing
uncertainty that can be considered an extension of the classical set
theory. It has been used in many different research areas, including
those related to inductive machine learning and reduction of
knowledge in knowledge-based systems. One important concept
related to RST is that of a rough relation. In this paper we presented
the current status of research on applying rough set theory to KDD,
which will be helpful for handle the characteristics of real-world
databases. The main aim is to show how rough set and rough set
analysis can be effectively used to extract knowledge from large
databases.