Abstract: Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Abstract: In this paper a Pattern Recognition algorithm based on
a constrained version of the k-means clustering algorithm will be
presented. The proposed algorithm is a non parametric supervised
statistical pattern recognition algorithm, i.e. it works under very mild
assumptions on the dataset. The performance of the algorithm will
be tested, togheter with a feature extraction technique that captures
the information on the closed two-dimensional contour of an image,
on images of industrial mineral ores.
Abstract: An unsupervised classification algorithm is derived
by modeling observed data as a mixture of several mutually
exclusive classes that are each described by linear combinations of
independent non-Gaussian densities. The algorithm estimates the
data density in each class by using parametric nonlinear functions
that fit to the non-Gaussian structure of the data. This improves
classification accuracy compared with standard Gaussian mixture
models. When applied to textures, the algorithm can learn basis
functions for images that capture the statistically significant structure
intrinsic in the images. We apply this technique to the problem of
unsupervised texture classification and segmentation.
Abstract: The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.
Abstract: In this paper a non-parametric statistical pattern recognition algorithm for the problem of credit scoring will be presented. The proposed algorithm is based on a clustering k- means algorithm and allows for the determination of subclasses of homogenous elements in the data. The algorithm will be tested on two benchmark datasets and its performance compared with other well known pattern recognition algorithm for credit scoring.