Validation and Selection between Machine Learning Technique and Traditional Methods to Reduce Bullwhip Effects: a Data Mining Approach

The aim of this paper is to present a methodology in three steps to forecast supply chain demand. In first step, various data mining techniques are applied in order to prepare data for entering into forecasting models. In second step, the modeling step, an artificial neural network and support vector machine is presented after defining Mean Absolute Percentage Error index for measuring error. The structure of artificial neural network is selected based on previous researchers' results and in this article the accuracy of network is increased by using sensitivity analysis. The best forecast for classical forecasting methods (Moving Average, Exponential Smoothing, and Exponential Smoothing with Trend) is resulted based on prepared data and this forecast is compared with result of support vector machine and proposed artificial neural network. The results show that artificial neural network can forecast more precisely in comparison with other methods. Finally, forecasting methods' stability is analyzed by using raw data and even the effectiveness of clustering analysis is measured.

MiSense Hierarchical Cluster-Based Routing Algorithm (MiCRA) for Wireless Sensor Networks

Wireless sensor networks (WSN) are currently receiving significant attention due to their unlimited potential. These networks are used for various applications, such as habitat monitoring, automation, agriculture, and security. The efficient nodeenergy utilization is one of important performance factors in wireless sensor networks because sensor nodes operate with limited battery power. In this paper, we proposed the MiSense hierarchical cluster based routing algorithm (MiCRA) to extend the lifetime of sensor networks and to maintain a balanced energy consumption of nodes. MiCRA is an extension of the HEED algorithm with two levels of cluster heads. The performance of the proposed protocol has been examined and evaluated through a simulation study. The simulation results clearly show that MiCRA has a better performance in terms of lifetime than HEED. Indeed, MiCRA our proposed protocol can effectively extend the network lifetime without other critical overheads and performance degradation. It has been noted that there is about 35% of energy saving for MiCRA during the clustering process and 65% energy savings during the routing process compared to the HEED algorithm.

Neuro-fuzzy Model and Regression Model a Comparison Study of MRR in Electrical Discharge Machining of D2 Tool Steel

In the current research, neuro-fuzzy model and regression model was developed to predict Material Removal Rate in Electrical Discharge Machining process for AISI D2 tool steel with copper electrode. Extensive experiments were conducted with various levels of discharge current, pulse duration and duty cycle. The experimental data are split into two sets, one for training and the other for validation of the model. The training data were used to develop the above models and the test data, which was not used earlier to develop these models were used for validation the models. Subsequently, the models are compared. It was found that the predicted and experimental results were in good agreement and the coefficients of correlation were found to be 0.999 and 0.974 for neuro fuzzy and regression model respectively

Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Observation of the Correlations between Pair Wise Interaction and Functional Organization of the Proteins, in the Protein Interaction Network of Saccaromyces Cerevisiae

Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins.

Fuzzy Types Clustering for Microarray Data

The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.

Object-Based Image Indexing and Retrieval in DCT Domain using Clustering Techniques

In this paper, we present a new and effective image indexing technique that extracts features directly from DCT domain. Our proposed approach is an object-based image indexing. For each block of size 8*8 in DCT domain a feature vector is extracted. Then, feature vectors of all blocks of image using a k-means algorithm is clustered into groups. Each cluster represents a special object of the image. Then we select some clusters that have largest members after clustering. The centroids of the selected clusters are taken as image feature vectors and indexed into the database. Also, we propose an approach for using of proposed image indexing method in automatic image classification. Experimental results on a database of 800 images from 8 semantic groups in automatic image classification are reported.

Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions

In the present work, we propose a new technique to enhance the learning capabilities and reduce the computation intensity of a competitive learning multi-layered neural network using the K-means clustering algorithm. The proposed model use multi-layered network architecture with a back propagation learning mechanism. The K-means algorithm is first applied to the training dataset to reduce the amount of samples to be presented to the neural network, by automatically selecting an optimal set of samples. The obtained results demonstrate that the proposed technique performs exceptionally in terms of both accuracy and computation time when applied to the KDD99 dataset compared to a standard learning schema that use the full dataset.

A Comparison between Heuristic and Meta-Heuristic Methods for Solving the Multiple Traveling Salesman Problem

The multiple traveling salesman problem (mTSP) can be used to model many practical problems. The mTSP is more complicated than the traveling salesman problem (TSP) because it requires determining which cities to assign to each salesman, as well as the optimal ordering of the cities within each salesman's tour. Previous studies proposed that Genetic Algorithm (GA), Integer Programming (IP) and several neural network (NN) approaches could be used to solve mTSP. This paper compared the results for mTSP, solved with Genetic Algorithm (GA) and Nearest Neighbor Algorithm (NNA). The number of cities is clustered into a few groups using k-means clustering technique. The number of groups depends on the number of salesman. Then, each group is solved with NNA and GA as an independent TSP. It is found that k-means clustering and NNA are superior to GA in terms of performance (evaluated by fitness function) and computing time.

A Reconfigurable Distributed Multiagent System Optimized for Scalability

This paper proposes a novel solution for optimizing the size and communication overhead of a distributed multiagent system without compromising the performance. The proposed approach addresses the challenges of scalability especially when the multiagent system is large. A modified spectral clustering technique is used to partition a large network into logically related clusters. Agents are assigned to monitor dedicated clusters rather than monitor each device or node. The proposed scalable multiagent system is implemented using JADE (Java Agent Development Environment) for a large power system. The performance of the proposed topologyindependent decentralized multiagent system and the scalable multiagent system is compared by comprehensively simulating different fault scenarios. The time taken for reconfiguration, the overall computational complexity, and the communication overhead incurred are computed. The results of these simulations show that the proposed scalable multiagent system uses fewer agents efficiently, makes faster decisions to reconfigure when a fault occurs, and incurs significantly less communication overhead.

Fuzzy Control of the Air Conditioning System at Different Operating Pressures

The present work demonstrates the design and simulation of a fuzzy control of an air conditioning system at different pressures. The first order Sugeno fuzzy inference system is utilized to model the system and create the controller. In addition, an estimation of the heat transfer rate and water mass flow rate injection into or withdraw from the air conditioning system is determined by the fuzzy IF-THEN rules. The approach starts by generating the input/output data. Then, the subtractive clustering algorithm along with least square estimation (LSE) generates the fuzzy rules that describe the relationship between input/output data. The fuzzy rules are tuned by Adaptive Neuro-Fuzzy Inference System (ANFIS). The results show that when the pressure increases the amount of water flow rate and heat transfer rate decrease within the lower ranges of inlet dry bulb temperatures. On the other hand, and as pressure increases the amount of water flow rate and heat transfer rate increases within the higher ranges of inlet dry bulb temperatures. The inflection in the pressure effect trend occurs at lower temperatures as the inlet air humidity increases.

A Monte Carlo Method to Data Stream Analysis

Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.

Order Reduction using Modified Pole Clustering and Pade Approximations

The authors present a mixed method for reducing the order of the large-scale dynamic systems. In this method, the denominator polynomial of the reduced order model is obtained by using the modified pole clustering technique while the coefficients of the numerator are obtained by Pade approximations. This method is conceptually simple and always generates stable reduced models if the original high-order system is stable. The proposed method is illustrated with the help of the numerical examples taken from the literature.

Folksonomy-based Recommender Systems with User-s Recent Preferences

Social bookmarking is an environment in which the user gradually changes interests over time so that the tag data associated with the current temporal period is usually more important than tag data temporally far from the current period. This implies that in the social tagging system, the newly tagged items by the user are more relevant than older items. This study proposes a novel recommender system that considers the users- recent tag preferences. The proposed system includes the following stages: grouping similar users into clusters using an E-M clustering algorithm, finding similar resources based on the user-s bookmarks, and recommending the top-N items to the target user. The study examines the system-s information retrieval performance using a dataset from del.icio.us, which is a famous social bookmarking web site. Experimental results show that the proposed system is better and more effective than traditional approaches.

Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map

Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.

Grouping and Indexing Color Features for Efficient Image Retrieval

Content-based Image Retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift algorithm, a robust clustering technique. Then the cluster (region) mode is used as representative of the image in 3-D color space. The feature descriptor consists of the representative color of a region and is indexed using a spatial indexing method that uses *R -tree thus avoiding the high-dimensional indexing problems associated with the traditional color histogram. Alternatively, the images in the database are clustered based on region feature similarity using Euclidian distance. Only representative (centroids) features of these clusters are indexed using *R -tree thus improving the efficiency. For similarity retrieval, each representative color in the query image or region is used independently to find regions containing that color. The results of these methods are compared. A JAVA based query engine supporting query-by- example is built to retrieve images by color.

Modified Data Mining Approach for Defective Diagnosis in Hard Disk Drive Industry

Currently, slider process of Hard Disk Drive Industry become more complex, defective diagnosis for yield improvement becomes more complicated and time-consumed. Manufacturing data analysis with data mining approach is widely used for solving that problem. The existing mining approach from combining of the KMean clustering, the machine oriented Kruskal-Wallis test and the multivariate chart were applied for defective diagnosis but it is still be a semiautomatic diagnosis system. This article aims to modify an algorithm to support an automatic decision for the existing approach. Based on the research framework, the new approach can do an automatic diagnosis and help engineer to find out the defective factors faster than the existing approach about 50%.

On the Noise Distance in Robust Fuzzy C-Means

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. Robust fuzzy C-means (robust-FCM) is certainly one of the most known among these algorithms. In robust-FCM, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance δ from all data points. Distance δ determines the boundary of the noise cluster and therefore is a critical parameter of the algorithm. Though some approaches have been proposed to automatically determine the most suitable δ for the specific application, up to today an efficient and fully satisfactory solution does not exist. The aim of this paper is to propose a novel method to compute the optimal δ based on the analysis of the distribution of the percentage of objects assigned to the noise cluster in repeated executions of the robust-FCM with decreasing values of δ . The extremely encouraging results obtained on some data sets found in the literature are shown and discussed.

Neural Network Optimal Power Flow(NN-OPF) based on IPSO with Developed Load Cluster Method

An Optimal Power Flow based on Improved Particle Swarm Optimization (OPF-IPSO) with Generator Capability Curve Constraint is used by NN-OPF as a reference to get pattern of generator scheduling. There are three stages in Designing NN-OPF. The first stage is design of OPF-IPSO with generator capability curve constraint. The second stage is clustering load to specific range and calculating its index. The third stage is training NN-OPF using constructive back propagation method. In training process total load and load index used as input, and pattern of generator scheduling used as output. Data used in this paper is power system of Java-Bali. Software used in this simulation is MATLAB.

Comparative Study of Evolutionary Model and Clustering Methods in Circuit Partitioning Pertaining to VLSI Design

Partitioning is a critical area of VLSI CAD. In order to build complex digital logic circuits its often essential to sub-divide multi -million transistor design into manageable Pieces. This paper looks at the various partitioning techniques aspects of VLSI CAD, targeted at various applications. We proposed an evolutionary time-series model and a statistical glitch prediction system using a neural network with selection of global feature by making use of clustering method model, for partitioning a circuit. For evolutionary time-series model, we made use of genetic, memetic & neuro-memetic techniques. Our work focused in use of clustering methods - K-means & EM methodology. A comparative study is provided for all techniques to solve the problem of circuit partitioning pertaining to VLSI design. The performance of all approaches is compared using benchmark data provided by MCNC standard cell placement benchmark net lists. Analysis of the investigational results proved that the Neuro-memetic model achieves greater performance then other model in recognizing sub-circuits with minimum amount of interconnections between them.