Abstract: The vast amount of information hidden in huge
databases has created tremendous interests in the field of data
mining. This paper examines the possibility of using data clustering
techniques in oral medicine to identify functional relationships
between different attributes and classification of similar patient
examinations. Commonly used data clustering algorithms have been
reviewed and as a result several interesting results have been
gathered.
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Abstract: Currently, web usage make a huge data from a lot of
user attention. In general, proxy server is a system to support web
usage from user and can manage system by using hit rates. This
research tries to improve hit rates in proxy system by applying data
mining technique. The data set are collected from proxy servers in the
university and are investigated relationship based on several features.
The model is used to predict the future access websites. Association
rule technique is applied to get the relation among Date, Time, Main
Group web, Sub Group web, and Domain name for created model.
The results showed that this technique can predict web content for the
next day, moreover the future accesses of websites increased from
38.15% to 85.57 %.
This model can predict web page access which tends to increase
the efficient of proxy servers as a result. In additional, the
performance of internet access will be improved and help to reduce
traffic in networks.
Abstract: A data cutting and sorting method (DCSM) is proposed
to optimize the performance of data mining. DCSM reduces the
calculation time by getting rid of redundant data during the data
mining process. In addition, DCSM minimizes the computational units
by splitting the database and by sorting data with support counts. In the
process of searching for the relationship between metabolic syndrome
and lifestyles with the health examination database of an electronics
manufacturing company, DCSM demonstrates higher search
efficiency than the traditional Apriori algorithm in tests with different
support counts.
Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.
Abstract: It is well known that Logistic Regression is the gold
standard method for predicting clinical outcome, especially
predicting risk of mortality. In this paper, the Decision Tree method
has been proposed to solve specific problems that commonly use
Logistic Regression as a solution. The Biochemistry and
Haematology Outcome Model (BHOM) dataset obtained from
Portsmouth NHS Hospital from 1 January to 31 December 2001 was
divided into four subsets. One subset of training data was used to
generate a model, and the model obtained was then applied to three
testing datasets. The performance of each model from both methods
was then compared using calibration (the χ2 test or chi-test) and
discrimination (area under ROC curve or c-index). The experiment
presented that both methods have reasonable results in the case of the
c-index. However, in some cases the calibration value (χ2) obtained
quite a high result. After conducting experiments and investigating
the advantages and disadvantages of each method, we can conclude
that Decision Trees can be seen as a worthy alternative to Logistic
Regression in the area of Data Mining.