Hierarchical Clustering Algorithms in Data Mining

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.

A Survey of Discrete Facility Location Problems

Facility location is a complex real-world problem which needs a strategic management decision. This paper provides a general review on studies, efforts and developments in Facility Location Problems which are classical optimization problems having a wide-spread applications in various areas such as transportation, distribution, production, supply chain decisions and telecommunication. Our goal is not to review all variants of different studies in FLPs or to describe very detailed computational techniques and solution approaches, but rather to provide a broad overview of major location problems that have been studied, indicating how they are formulated and what are proposed by researchers to tackle the problem. A brief, elucidative table based on a grouping according to “General Problem Type” and “Methods Proposed” used in the studies is also presented at the end of the work.

Mathematics Anxiety among Male and Female Students

The purpose of this study is to determine the relationship of anxiety level between male and female undergraduates at a private university in Malaysia. Convenient sampling method used in this study in which the students were selected based on the grouping assigned by the faculty. There were 214 undergraduates who registered the probability courses had participated in this study. Mathematics Anxiety Rating Scale (MARS) was the instrument used in study which used to determine students’ anxiety level towards probability. Reliability and validity of instrument was done before the major study was conducted. In the major study, students were given briefing about the study conducted. Participation of this study was voluntary. Students were given consent form to determine whether they agree to participate in the study. Duration of two weeks was given for students to complete the given online questionnaire. The data collected will be analyzed using Statistical Package for the Social Sciences (SPSS) to determine the level of anxiety. There were three anxiety level, i.e., low, average and high. Students’ anxiety level was determined based on their scores obtained compared with the mean and standard deviation. If the scores obtained were below mean and standard deviation, the anxiety level was low. If the scores were at below and above the mean and between one standard deviation, the anxiety level was average. If the scores were above the mean and greater than one standard deviation, the anxiety level was high. Results showed that both of genders were having average anxiety level. Among low, average and high anxiety level, frequency of males were found to be higher as compared to females. Hence, the mean values obtained for males (M = 3.62) was higher than females (M = 3.42). In order to be significant of anxiety level among the gender, the p-value should be less than .05. The p-value obtained in this study was .117. However, this value was greater than .05. Thus, there was no significant difference of anxiety level among the gender. In other words, there was no relationship of anxiety level with the gender.

Various Advanced Statistical Analyses of Index Values Extracted from Outdoor Agricultural Workers Motion Data

We have been grouping and developing various kinds of practical, promising sensing applied systems concerning agricultural advancement and technical tradition (guidance). These include advanced devices to secure real-time data related to worker motion, and we analyze by methods of various advanced statistics and human dynamics (e.g. primary component analysis, Ward system based cluster analysis, and mapping). What is more, we have been considering worker daily health and safety issues. Targeted fields are mainly common farms, meadows, and gardens. After then, we observed and discussed time-line style, changing data. And, we made some suggestions. The entire plan makes it possible to improve both the aforementioned applied systems and farms.

Data Rate Based Grouping Scheme for Cooperative Communications in Wireless LANs

IEEE 802.11a/b/g standards provide multiple transmission rates, which can be changed dynamically according to the channel condition. Cooperative communications were introduced to improve the overall performance of wireless LANs with the help of relay nodes with higher transmission rates. The cooperative communications are based on the fact that the transmission is much faster when sending data packets to a destination node through a relay node with higher transmission rate, rather than sending data directly to the destination node at low transmission rate. To apply the cooperative communications in wireless LAN, several MAC protocols have been proposed. Some of them can result in collisions among relay nodes in a dense network. In order to solve this problem, we propose a new protocol. Relay nodes are grouped based on their transmission rates. And then, relay nodes only in the highest group try to get channel access. Performance evaluation is conducted using simulation, and shows that the proposed protocol significantly outperforms the previous protocol in terms of throughput and collision probability.

The Design and Applied of Learning Management System via Social Media on Internet: Case Study of Operating System for Business Subject

Learning Management System (LMS) is the system which uses to manage the learning in order to grouping the content and learning activity between the lecturer and learner including online examination and evaluation. Nowadays, it is the borderless learning era so the learning activities can be accessed from everywhere in the world and also anytime via the information technology and media. The learner can easily access to the knowledge so the different in time and distance is not a constraint for learning anymore. The learning pattern which was used in this research is the integration of the in-class learning and online learning via internet and will be able to monitor the progress by the Learning management system which will create the fast response and accessible learning process via the social media. In order to increase the capability and freedom of the learner, the system can show the current and history of the learning document, video conference and also has the chat room for the learner and lecturer to interact to each other. So the objectives of the “The Design and Applied of Learning Management System via Social Media on Internet: Case Study of Operating System for Business Subject” are to expand the opportunity of learning and to increase the efficiency of learning as well as increase the communication channel between lecturer and student. The data of this research was collect from 30 users of the system which are students who enroll in the subject. And the result of the research is in the “Very Good” which is conformed to the hypothesis.

Performances and Activities of Urban Communities Leader Based On Sufficiency Economy Philosophy in Dusit District, Bangkok Metropolitan

The research studies the behaviors based on sufficiency economy philosophy at individual and community levelsas well as the satisfaction of the urban community leaders by collecting data with purposive sampling technique. For in-depth interviews with 26 urban community leaders, the result shows that the urban community leaders have good knowledge and understanding about sufficiency economy philosophy. Especially in terms of money spending, they must consider the need for living and be economical. The activities in the community or society should not take advantage of the others as well as colleagues. At present, most of the urban community leaders live in sufficient way. They often spend time with public service, but many families are dealing with debt. Many communities have some political conflict and high family allowances because of living in the urban communities with rapid social and economic changes. However, there are many communities that leaders have applied their wisdom in development for their people by gathering and grouping the professionals to form activities such as making chilli sauce, textile organization, making artificial flowers to worship the sanctity. The most prominent group is the foot massage business in Wat Pracha Rabue Tham. This professional group is supported continuously by the government. One of the factors in terms of satisfaction used for evaluating community leaders is the customary administration in brotherly, interdependent way rather than using the absolute power or controlling power, but using the roles of leader to perform the activities with their people intently, determinedly and having public mind for people.

The Efficiency of Mechanization in Weed Control in Artificial Regeneration of Oriental Beech (Fagus orientalis Lipsky.)

In this study which has been conducted in Akçasu Forest Range District of Devrek Forest Directorate; 3 methods (weed control with labourer power, cover removal with Hitachi F20 Excavator, and weed control with agricultural equipment mounted on a Ferguson 240S agriculture tractor) were utilized in weed control efforts in regeneration of degraded oriental beech forests have been compared. In this respect, 3 methods have been compared by determining certain work hours and standard durations of unit areas (1 hectare). For this purpose, evaluating the tasks made with human and machine force from the aspects of duration, productivity and costs, it has been aimed to determine the most productive method in accordance with the actual ecological conditions of research field. Within the scope of the study, the time studies have been conducted for 3 methods used in weed control efforts. While carrying out those studies, the performed implementations have been evaluated by dividing them into business stages. Also, the actual data have been used while calculating the cost accounts. In those calculations, the latest formulas and equations which are also used in developed countries have been utilized. The variance of analysis (ANOVA) was used in order to determine whether there is any statistically significant difference among obtained results, and the Duncan test was used for grouping if there is significant difference. According to the measurements and findings carried out within the scope of this study, it has been found during living cover removal efforts in regeneration efforts in demolished oriental beech forests that the removal of weed layer in 1 hectare of field has taken 920 hours with labourer force, 15.1 hours with excavator and 60 hours with an equipment mounted on a tractor. On the other hand, it has been determined that the cost of removal of living cover in unit area (1 hectare) was 3220.00 TL for labourer power, 1250 TL for excavator and 1825 TL for equipment mounted on a tractor. According to the obtained results, it has been found that the utilization of excavator in weed control effort in regeneration of degraded oriental beech regions under actual ecological conditions of research field has been found to be more productive from both of aspects of duration and costs. These determinations carried out should be repeated in weed control efforts in degraded forest fields with different ecological conditions, it is compulsory for finding the most efficient weed control method. These findings will light the way of technical staff of forestry directorate in determination of the most effective and economic weed control method. Thus, the more actual data will be used while preparing the weed control budgets, and there will be significant contributions to national economy. Also the results of this and similar studies are very important for developing the policies for our forestry in short and long term.

Students’ Perception and Patterns of Listening Behavior in an Online Forum Discussion

Online forum is part of a Learning Management System (LMS) environment in which students share their opinions. This study attempts to investigate the perceptions of students towards online forum and their patterns of listening behavior during the forum interaction. The students’ perceptions were measured using a questionnaire, in which seven dimensions were used involving online experience, benefits of forum participation, cost of participation, perceived ease of use, usefulness, attitude, and intention. Meanwhile, their patterns of listening behaviors were obtained using the log file extracted from the LMS. A total of 25 postgraduate students undertaking a course were involved in this study, and their activities in the forum session were recorded by the LMS and used as a log file. The results from the questionnaire analysis indicated that the students perceived that the forum is easy to use, useful, and bring benefits to them. Also, they showed positive attitude towards online forum, and they have the intention to use it in future. Based on the log data, the participants were also divided into six clusters of listening behavior, in which they are different in terms of temporality, breadth, depth and speaking level. The findings were compared to previous clusters grouping and future recommendations are also discussed.

Computational Methods in Official Statistics with an Example on Calculating and Predicting Diabetes Mellitus [DM] Prevalence in Different Age Groups within Australia in Future Years, in Light of the Aging Population

An analysis of the Australian Diabetes Screening Study estimated undiagnosed diabetes mellitus [DM] prevalence in a high risk general practice based cohort. DM prevalence varied from 9.4% to 18.1% depending upon the diagnostic criteria utilised with age being a highly significant risk factor. Utilising the gold standard oral glucose tolerance test, the prevalence of DM was 22-23% in those aged >= 70 years and

Democratic Political Socialization of the 5th and 6th Graders under the Authority of Dusit District Office, Bangkok

This research aims to study the democratic political socialization of the 5th and 6th Graders under the Authority of Dusit District Office, Bangkok by using stratified sampling for probability sampling and using purposive sampling for non-probability sampling to collect data toward the distribution of questionnaires to 300 respondents. This covers all of the schools under the authority of Dusit District Office. The researcher analyzed the data by using descriptive statistics which include arithmetic mean and standard deviation. The result shows that 5th and 6th graders under the authority of Dusit District Office, Bangkok, have displayed some characteristics following democratic political socialization both inside and outside classroom as well as outside school. However, the democratic political socialization in classroom through grouping and class participation is much more emphasized.

Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Computer Aided Diagnosis of Polycystic Kidney Disease Using ANN

Many inherited diseases and non-hereditary disorders are common in the development of renal cystic diseases. Polycystic kidney disease (PKD) is a disorder developed within the kidneys in which grouping of cysts filled with water like fluid. PKD is responsible for 5-10% of end-stage renal failure treated by dialysis or transplantation. New experimental models, application of molecular biology techniques have provided new insights into the pathogenesis of PKD. Researchers are showing keen interest for developing an automated system by applying computer aided techniques for the diagnosis of diseases. In this paper a multilayered feed forward neural network with one hidden layer is constructed, trained and tested by applying back propagation learning rule for the diagnosis of PKD based on physical symptoms and test results of urinalysis collected from the individual patients. The data collected from 50 patients are used to train and test the network. Among these samples, 75% of the data used for training and remaining 25% of the data are used for testing purpose. Further, this trained network is used to implement for new samples. The output results in normality and abnormality of the patient.

Collaborative and Content-based Recommender System for Social Bookmarking Website

This study proposes a new recommender system based on the collaborative folksonomy. The purpose of the proposed system is to recommend Internet resources (such as books, articles, documents, pictures, audio and video) to users. The proposed method includes four steps: creating the user profile based on the tags, grouping the similar users into clusters using an agglomerative hierarchical clustering, finding similar resources based on the user-s past collections by using content-based filtering, and recommending similar items to the target user. This study examines the system-s performance for the dataset collected from “del.icio.us," which is a famous social bookmarking website. Experimental results show that the proposed tag-based collaborative and content-based filtering hybridized recommender system is promising and effectiveness in the folksonomy-based bookmarking website.

Object Recognition in Color Images by the Self Configuring System MEMORI

System MEMORI automatically detects and recognizes rotated and/or rescaled versions of the objects of a database within digital color images with cluttered background. This task is accomplished by means of a region grouping algorithm guided by heuristic rules, whose parameters concern some geometrical properties and the recognition score of the database objects. This paper focuses on the strategies implemented in MEMORI for the estimation of the heuristic rule parameters. This estimation, being automatic, makes the system a self configuring and highly user-friendly tool.

Event Template Generation for News Articles

In this paper we focus on event extraction from Tamil news article. This system utilizes a scoring scheme for extracting and grouping event-specific sentences. Using this scoring scheme eventspecific clustering is performed for multiple documents. Events are extracted from each document using a scoring scheme based on feature score and condition score. Similarly event specific sentences are clustered from multiple documents using this scoring scheme. The proposed system builds the Event Template based on user specified query. The templates are filled with event specific details like person, location and timeline extracted from the formed clusters. The proposed system applies these methodologies for Tamil news articles that have been enconverted into UNL graphs using a Tamil to UNL-enconverter. The main intention of this work is to generate an event based template.

A K-Means Based Clustering Approach for Finding Faulty Modules in Open Source Software Systems

Prediction of fault-prone modules provides one way to support software quality engineering. Clustering is used to determine the intrinsic grouping in a set of unlabeled data. Among various clustering techniques available in literature K-Means clustering approach is most widely being used. This paper introduces K-Means based Clustering approach for software finding the fault proneness of the Object-Oriented systems. The contribution of this paper is that it has used Metric values of JEdit open source software for generation of the rules for the categorization of software modules in the categories of Faulty and non faulty modules and thereafter empirically validation is performed. The results are measured in terms of accuracy of prediction, probability of Detection and Probability of False Alarms.

A Generic Approach to Achieve Optimal Server Consolidation by Using Existing Servers in Virtualized Data Center

Virtualization-based server consolidation has been proven to be an ideal technique to solve the server sprawl problem by consolidating multiple virtualized servers onto a few physical servers leading to improved resource utilization and return on investment. In this paper, we solve this problem by using existing servers, which are heterogeneous and diversely preferred by IT managers. Five practical consolidation rules are introduced, and a decision model is proposed to optimally allocate source services to physical target servers while maximizing the average resource utilization and preference value. Our model can be regarded as a multi-objective multi-dimension bin-packing (MOMDBP) problem with constraints, which is strongly NP-hard. An improved grouping generic algorithm (GGA) is introduced for the problem. Extensive simulations were performed and the results are given.

Optimizing Allocation of Two Dimensional Irregular Shapes using an Agent Based Approach

Packing problems arise in a wide variety of application areas. The basic problem is that of determining an efficient arrangement of different objects in a region without any overlap and with minimal wasted gap between shapes. This paper presents a novel population based approach for optimizing arrangement of irregular shapes. In this approach, each shape is coded as an agent and the agents' reproductions and grouping policies results in arrangements of the objects in positions with least wasted area between them. The approach is implemented in an application for cutting sheets and test results on several problems from literature are presented.

Journey on Image Clustering Based on Color Composition

Image clustering is a process of grouping images based on their similarity. The image clustering usually uses the color component, texture, edge, shape, or mixture of two components, etc. This research aims to explore image clustering using color composition. In order to complete this image clustering, three main components should be considered, which are color space, image representation (feature extraction), and clustering method itself. We aim to explore which composition of these factors will produce the best clustering results by combining various techniques from the three components. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use KMeans and Agglomerative Hierarchical Clustering algorithm. The results of the experiment show that GMM representation is better combined with RGB and L*a*b* color space, whereas Histogram is better combined with HSV. The experiments also show that K-Means is better than Agglomerative Hierarchical for images clustering.