Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.
Abstract: It is well known that Logistic Regression is the gold
standard method for predicting clinical outcome, especially
predicting risk of mortality. In this paper, the Decision Tree method
has been proposed to solve specific problems that commonly use
Logistic Regression as a solution. The Biochemistry and
Haematology Outcome Model (BHOM) dataset obtained from
Portsmouth NHS Hospital from 1 January to 31 December 2001 was
divided into four subsets. One subset of training data was used to
generate a model, and the model obtained was then applied to three
testing datasets. The performance of each model from both methods
was then compared using calibration (the χ2 test or chi-test) and
discrimination (area under ROC curve or c-index). The experiment
presented that both methods have reasonable results in the case of the
c-index. However, in some cases the calibration value (χ2) obtained
quite a high result. After conducting experiments and investigating
the advantages and disadvantages of each method, we can conclude
that Decision Trees can be seen as a worthy alternative to Logistic
Regression in the area of Data Mining.