Abstract: In this paper, we proposed the notion of single valued
neutrosophic hesitant fuzzy rough set, by combining single valued
neutrosophic hesitant fuzzy set and rough set. The combination
of single valued neutrosophic hesitant fuzzy set and rough set
is a powerful tool for dealing with uncertainty, granularity and
incompleteness of knowledge in information systems. We presented
both definition and some basic properties of the proposed model.
Finally, we gave a general approach which is applied to a
decision making problem in disease diagnoses, and demonstrated the
effectiveness of the approach by a numerical example.
Abstract: Feature selection and attribute reduction are crucial
problems, and widely used techniques in the field of machine
learning, data mining and pattern recognition to overcome the
well-known phenomenon of the Curse of Dimensionality. This paper
presents a feature selection method that efficiently carries out attribute
reduction, thereby selecting the most informative features of a dataset.
It consists of two components: 1) a measure for feature subset
evaluation, and 2) a search strategy. For the evaluation measure,
we have employed the fuzzy-rough dependency degree (FRFDD)
of the lower approximation-based fuzzy-rough feature selection
(L-FRFS) due to its effectiveness in feature selection. As for the
search strategy, a modified version of a binary shuffled frog leaping
algorithm is proposed (B-SFLA). The proposed feature selection
method is obtained by hybridizing the B-SFLA with the FRDD. Nine
classifiers have been employed to compare the proposed approach
with several existing methods over twenty two datasets, including
nine high dimensional and large ones, from the UCI repository.
The experimental results demonstrate that the B-SFLA approach
significantly outperforms other metaheuristic methods in terms of the
number of selected features and the classification accuracy.
Abstract: Relation between tolerance class and indispensable attribute and knowledge dependency in rough set model with tolerance relation is explored. After giving definitions and concepts of knowledge dependency and knowledge dependency degree for incomplete information system in tolerance rough set model by distinguishing decision attribute containing missing attribute value or not, the result of maintaining reflectivity, transitivity, augmentation, decomposition law and merge law for complete knowledge dependency is proved. Knowledge dependency degrees (not complete knowledge dependency degrees) only satisfy some laws after transitivity, augmentation and decomposition operations. An algorithm to solve attribute reduction in an incomplete decision table is designed. The correctness is checked by an example.
Abstract: Some invariant properties of incomplete information systems homomorphism are studied in this paper. Demand conditions of tolerance class, attribute reduction, indispensable attribute and dispensable attribute being invariant under homomorphism in incomplete information system are revealed and discussed. The existing condition of endohomomorphism on an incomplete information system is also explored. It establishes some theoretical foundations for further investigations on incomplete information systems in rough set theory, like in information systems.
Abstract: Some extended rough set models in incomplete information system cannot distinguish the two objects that have few known attributes and more unknown attributes; some cannot make a flexible and accurate discrimination. In order to solve this problem, this paper suggests an improved limited tolerance rough set model using two thresholds to control what two objects have a relationship between them in limited tolerance relation and to classify objects. Our practical study case shows the model can get fine and reasonable decision results.
Abstract: This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.
Abstract: Branch of modern mathematics, graphs represent instruments
for optimization and solving practical applications in
various fields such as economic networks, engineering, network optimization,
the geometry of social action, generally, complex systems
including contemporary urban problems (path or transport efficiencies,
biourbanism, & c.). In this paper is studied the interconnection
of some urban network, which can lead to a simulation problem of a
digraph through another digraph. The simulation is made univoc or
more general multivoc. The concepts of fragment and atom are very
useful in the study of connectivity in the digraph that is simulation
- including an alternative evaluation of k- connectivity. Rough set
approach in (bi)digraph which is proposed in premier in this paper
contribute to improved significantly the evaluation of k-connectivity.
This rough set approach is based on generalized rough sets - basic
facts are presented in this paper.
Abstract: Some properties of approximation sets are studied in multi-granulation optimist model in rough set theory using maximal compatible classes. The relationships between or among lower and upper approximations in single and multiple granulation are compared and discussed. Through designing Boolean functions and discernibility matrices in incomplete information systems, the lower and upper approximation sets and reduction in multi-granulation environments can be found. By using examples, the correctness of computation approach is consolidated. The related conclusions obtained are suitable for further investigating in multiple granulation RSM.
Abstract: In rough set models, tolerance relation, similarity
relation and limited tolerance relation solve different situation
problems for incomplete information systems in which there exists a
phenomenon of missing value. If two objects have the same few
known attributes and more unknown attributes, they cannot
distinguish them well. In order to solve this problem, we presented two
improved limited and variable precision rough set models. One is
symmetric, the other one is non-symmetric. They all use more
stringent condition to separate two small probability equivalent objects
into different classes. The two models are needed to engage further
study in detail. In the present paper, we newly form object classes with
a different respect comparing to the first suggested model. We
overcome disadvantages of non-symmetry regarding to the second
suggested model. We discuss relationships between or among several
models and also make rule generation. The obtained results by
applying the second model are more accurate and reasonable.
Abstract: In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.
Abstract: Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.
Abstract: One of the global combinatorial optimization
problems in machine learning is feature selection. It concerned with
removing the irrelevant, noisy, and redundant data, along with
keeping the original meaning of the original data. Attribute reduction
in rough set theory is an important feature selection method. Since
attribute reduction is an NP-hard problem, it is necessary to
investigate fast and effective approximate algorithms. In this paper,
we proposed two feature selection mechanisms based on memetic
algorithms (MAs) which combine the genetic algorithm with a fuzzy
record to record travel algorithm and a fuzzy controlled great deluge
algorithm, to identify a good balance between local search and
genetic search. In order to verify the proposed approaches, numerical
experiments are carried out on thirteen datasets. The results show that
the MAs approaches are efficient in solving attribute reduction
problems when compared with other meta-heuristic approaches.
Abstract: In this paper we consider the rule reduct generation
problem. Rule Reduct Generation (RG) and Modified Rule
Generation (MRG) algorithms, that are used to solve this problem,
are well-known. Alternative to these algorithms, we develop Pruning
Rule Generation (PRG) algorithm. We compare the PRG algorithm
with RG and MRG.
Abstract: Frequent pattern mining is the process of finding a
pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set. It was proposed in the context of frequent
itemsets and association rule mining. Frequent pattern mining is used
to find inherent regularities in data. What products were often
purchased together? Its applications include basket data analysis,
cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis. However, one of
the bottlenecks of frequent itemset mining is that as the data increase
the amount of time and resources required to mining the data
increases at an exponential rate. In this investigation a new algorithm
is proposed which can be uses as a pre-processor for frequent itemset
mining. FASTER (FeAture SelecTion using Entropy and Rough sets)
is a hybrid pre-processor algorithm which utilizes entropy and roughsets
to carry out record reduction and feature (attribute) selection
respectively. FASTER for frequent itemset mining can produce a
speed up of 3.1 times when compared to original algorithm while
maintaining an accuracy of 71%.
Abstract: Advances in the field of image processing envision a
new era of evaluation techniques and application of procedures in
various different fields. One such field being considered is the
biomedical field for prognosis as well as diagnosis of diseases. This
plethora of methods though provides a wide range of options to select
from, it also proves confusion in selecting the apt process and also in
finding which one is more suitable. Our objective is to use a series of
techniques on bone scans, so as to detect the occurrence of
rheumatoid arthritis (RA) as accurately as possible. Amongst other
techniques existing in the field our proposed system tends to be more
effective as it depends on new methodologies that have been proved
to be better and more consistent than others. Computer aided
diagnosis will provide more accurate and infallible rate of
consistency that will help to improve the efficiency of the system.
The image first undergoes histogram smoothing and specification,
morphing operation, boundary detection by edge following algorithm
and finally image subtraction to determine the presence of
rheumatoid arthritis in a more efficient and effective way. Using preprocessing
noises are removed from images and using segmentation,
region of interest is found and Histogram smoothing is applied for a
specific portion of the images. Gray level co-occurrence matrix
(GLCM) features like Mean, Median, Energy, Correlation, Bone
Mineral Density (BMD) and etc. After finding all the features it
stores in the database. This dataset is trained with inflamed and noninflamed
values and with the help of neural network all the new
images are checked properly for their status and Rough set is
implemented for further reduction.
Abstract: Some properties of Intuitionistic Fuzzy (IF) rough relational algebraic operators under an IF rough relational data model are investigated and illustrated using diabetes and heart disease databases. These properties are important and desirable for processing queries in an effective and efficient manner.
Abstract: In the era of rapidly increasing notebook market, consumer electronics manufacturers are facing a highly dynamic and competitive environment. In particular, the product appearance is the first part for user to distinguish the product from the product of other brands. Notebook product should differ in its appearance to engage users and contribute to the user experience (UX). The UX evaluates various product concepts to find the design for user needs; in addition, help the designer to further understand the product appearance preference of different market segment. However, few studies have been done for exploring the relationship between consumer background and the reaction of product appearance. This study aims to propose a data mining framework to capture the user’s information and the important relation between product appearance factors. The proposed framework consists of problem definition and structuring, data preparation, rules generation, and results evaluation and interpretation. An empirical study has been done in Taiwan that recruited 168 subjects from different background to experience the appearance performance of 11 different portable computers. The results assist the designers to develop product strategies based on the characteristics of consumers and the product concept that related to the UX, which help to launch the products to the right customers and increase the market shares. The results have shown the practical feasibility of the proposed framework.
Abstract: The reduction or removal of noise in a color image is an essential part of image processing, whether the final information is used for human perception or for an automatic inspection and analysis. This paper describes the modeling system based on the rough neural network model to adaptive cellular automata for various image processing tasks and noise remover. In this paper, we consider the problem of object processing in colored image using rough neural networks to help deriving the rules which will be used in cellular automata for noise image. The proposed method is compared with some classical and recent methods. The results demonstrate that the new model is capable of being trained to perform many different tasks, and that the quality of these results is comparable or better than established specialized algorithms.
Abstract: Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.
Abstract: Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.