Optimizing of Fuzzy C-Means Clustering Algorithm Using GA

Fuzzy C-means Clustering algorithm (FCM) is a method that is frequently used in pattern recognition. It has the advantage of giving good modeling results in many cases, although, it is not capable of specifying the number of clusters by itself. In FCM algorithm most researchers fix weighting exponent (m) to a conventional value of 2 which might not be the appropriate for all applications. Consequently, the main objective of this paper is to use the subtractive clustering algorithm to provide the optimal number of clusters needed by FCM algorithm by optimizing the parameters of the subtractive clustering algorithm by an iterative search approach and then to find an optimal weighting exponent (m) for the FCM algorithm. In order to get an optimal number of clusters, the iterative search approach is used to find the optimal single-output Sugenotype Fuzzy Inference System (FIS) model by optimizing the parameters of the subtractive clustering algorithm that give minimum least square error between the actual data and the Sugeno fuzzy model. Once the number of clusters is optimized, then two approaches are proposed to optimize the weighting exponent (m) in the FCM algorithm, namely, the iterative search approach and the genetic algorithms. The above mentioned approach is tested on the generated data from the original function and optimal fuzzy models are obtained with minimum error between the real data and the obtained fuzzy models.

Manifold Analysis by Topologically Constrained Isometric Embedding

We present a new algorithm for nonlinear dimensionality reduction that consistently uses global information, and that enables understanding the intrinsic geometry of non-convex manifolds. Compared to methods that consider only local information, our method appears to be more robust to noise. Unlike most methods that incorporate global information, the proposed approach automatically handles non-convexity of the data manifold. We demonstrate the performance of our algorithm and compare it to state-of-the-art methods on synthetic as well as real data.

Inferences on Compound Rayleigh Parameters with Progressively Type-II Censored Samples

This paper considers inference under progressive type II censoring with a compound Rayleigh failure time distribution. The maximum likelihood (ML), and Bayes methods are used for estimating the unknown parameters as well as some lifetime parameters, namely reliability and hazard functions. We obtained Bayes estimators using the conjugate priors for two shape and scale parameters. When the two parameters are unknown, the closed-form expressions of the Bayes estimators cannot be obtained. We use Lindley.s approximation to compute the Bayes estimates. Another Bayes estimator has been obtained based on continuous-discrete joint prior for the unknown parameters. An example with the real data is discussed to illustrate the proposed method. Finally, we made comparisons between these estimators and the maximum likelihood estimators using a Monte Carlo simulation study.

Self-adaptation of Ontologies to Folksonomies in Semantic Web

Ontologies and tagging systems are two different ways to organize the knowledge present in the current Web. In this paper we propose a simple method to model folksonomies, as tagging systems, with ontologies. We show the scalability of the method using real data sets. The modeling method is composed of a generic ontology that represents any folksonomy and an algorithm to transform the information contained in folksonomies to the generic ontology. The method allows representing folksonomies at any instant of time.

A Modified Fuzzy C-Means Algorithm for Natural Data Exploration

In Data mining, Fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms in dealing with the challenges posed by large collections of vague and uncertain natural data. This paper reviews concept of fuzzy logic and fuzzy clustering. The classical fuzzy c-means algorithm is presented and its limitations are highlighted. Based on the study of the fuzzy c-means algorithm and its extensions, we propose a modification to the cmeans algorithm to overcome the limitations of it in calculating the new cluster centers and in finding the membership values with natural data. The efficiency of the new modified method is demonstrated on real data collected for Bhutan-s Gross National Happiness (GNH) program.

Content Based Sampling over Transactional Data Streams

This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.

Modeling of Crude Oil Blending via Discrete-Time Neural Networks

Crude oil blending is an important unit operation in petroleum refining industry. A good model for the blending system is beneficial for supervision operation, prediction of the export petroleum quality and realizing model-based optimal control. Since the blending cannot follow the ideal mixing rule in practice, we propose a static neural network to approximate the blending properties. By the dead-zone approach, we propose a new robust learning algorithm and give theoretical analysis. Real data of crude oil blending is applied to illustrate the neuro modeling approach.

Modeling of Statistically Multiplexed Non Uniform Activity VBR Video

This paper reports the feasibility of the ARMA model to describe a bursty video source transmitting over a AAL5 ATM link (VBR traffic). The traffic represents the activity of the action movie "Lethal Weapon 3" transmitted over the ATM network using the Fore System AVA-200 ATM video codec with a peak rate of 100 Mbps and a frame rate of 25. The model parameters were estimated for a single video source and independently multiplexed video sources. It was found that the model ARMA (2, 4) is well-suited for the real data in terms of average rate traffic profile, probability density function, autocorrelation function, burstiness measure, and the pole-zero distribution of the filter model.

A Hybrid Neural Network and Traditional Approach for Forecasting Lumpy Demand

Accurate demand forecasting is one of the most key issues in inventory management of spare parts. The problem of modeling future consumption becomes especially difficult for lumpy patterns, which characterized by intervals in which there is no demand and, periods with actual demand occurrences with large variation in demand levels. However, many of the forecasting methods may perform poorly when demand for an item is lumpy. In this study based on the characteristic of lumpy demand patterns of spare parts a hybrid forecasting approach has been developed, which use a multi-layered perceptron neural network and a traditional recursive method for forecasting future demands. In the described approach the multi-layered perceptron are adapted to forecast occurrences of non-zero demands, and then a conventional recursive method is used to estimate the quantity of non-zero demands. In order to evaluate the performance of the proposed approach, their forecasts were compared to those obtained by using Syntetos & Boylan approximation, recently employed multi-layered perceptron neural network, generalized regression neural network and elman recurrent neural network in this area. The models were applied to forecast future demand of spare parts of Arak Petrochemical Company in Iran, using 30 types of real data sets. The results indicate that the forecasts obtained by using our proposed mode are superior to those obtained by using other methods.

An Intelligent Human-Computer Interaction System for Decision Support

This paper proposes a novel architecture for developing decision support systems. Unlike conventional decision support systems, the proposed architecture endeavors to reveal the decision-making process such that humans' subjectivity can be incorporated into a computerized system and, at the same time, to preserve the capability of the computerized system in processing information objectively. A number of techniques used in developing the decision support system are elaborated to make the decisionmarking process transparent. These include procedures for high dimensional data visualization, pattern classification, prediction, and evolutionary computational search. An artificial data set is first employed to compare the proposed approach with other methods. A simulated handwritten data set and a real data set on liver disease diagnosis are then employed to evaluate the efficacy of the proposed approach. The results are analyzed and discussed. The potentials of the proposed architecture as a useful decision support system are demonstrated.

A Comparison of the Sum of Squares in Linear and Partial Linear Regression Models

In this paper, estimation of the linear regression model is made by ordinary least squares method and the partially linear regression model is estimated by penalized least squares method using smoothing spline. Then, it is investigated that differences and similarity in the sum of squares related for linear regression and partial linear regression models (semi-parametric regression models). It is denoted that the sum of squares in linear regression is reduced to sum of squares in partial linear regression models. Furthermore, we indicated that various sums of squares in the linear regression are similar to different deviance statements in partial linear regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the partial linear regression model. For this aim, it is made two different applications. A simulated and a real data set are considered to prove the claim mentioned here. In this way, this study is supported with a simulation and a real data example.

Culturally Enhanced Collaborative Filtering

We propose an enhanced collaborative filtering method using Hofstede-s cultural dimensions, calculated for 111 countries. We employ 4 of these dimensions, which are correlated to the costumers- buying behavior, in order to detect users- preferences for items. In addition, several advantages of this method demonstrated for data sparseness and cold-start users, which are important challenges in collaborative filtering. We present experiments using a real dataset, Book Crossing Dataset. Experimental results shows that the proposed algorithm provide significant advantages in terms of improving recommendation quality.

Tracking Activity of Real Individuals in Web Logs

This paper describes an enhanced cookie-based method for counting the visitors of web sites by using a web log processing system that aims to cope with the ambitious goal of creating countrywide statistics about the browsing practices of real human individuals. The focus is put on describing a new more efficient way of detecting human beings behind web users by placing different identifiers on the client computers. We briefly introduce our processing system designed to handle the massive amount of data records continuously gathered from the most important content providers of the Hungary. We conclude by showing statistics of different time spans comparing the efficiency of multiple visitor counting methods to the one presented here, and some interesting charts about content providers and web usage based on real data recorded in 2007 will also be presented.

The Benefits of IFRS Adoption – A Survey of Chief Financial Officers of Romanian Listed Companies

The move towards internationalization of accounting encountered a great boost, when in 2002 EU delegated the IASB to provide the accounting standards to be applied inside its frontiers. Among the incentives of the standardization of accounting on the international level, is the reduction of the cost of capital. Romania made the move towards IFRS before EU, when the country was not yet a member of it. Even if this made Romania a special case, it was scarcely approached. The leak of real data is usually the reason for avoiding. The novelty of this paper is that it offers an insight from the reality of Romanian companies and their view regarding the IFRS. The paper is based on a survey that the authors made among the companies listed on the first two tiers of the Bucharest Stock Exchange (BSE), which are basically, the most important companies in the country.

Developing Examination Management System: Senior Capstone Project, a Case Study

This paper presents the result of three senior capstone projects at the Department of Computer Engineering, Prince of Songkla University, Thailand. These projects focus on developing an examination management system for the Faculty of Engineering in order to manage the examination both the examination room assignments and the examination proctor assignments in each room. The current version of the software is a web-based application. The developed software allows the examination proctors to select their scheduled time online while each subject is assigned to each available examination room according to its type and the room capacity. The developed system is evaluated using real data by prospective users of the system. Several suggestions for further improvements are given by the testers. Even though the features of the developed software are not superior, the developing process can be a case study for a projectbased teaching style. Furthermore, the process of developing this software can show several issues in developing an educational support application.

Autonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set

The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter  of the widely used RBF kernel. While for two-class SVMs the parameters can be tuned using cross-validation based on the confusion matrix, for a one-class SVM this is not possible, because only true positives and false negatives can occur during training. This paper proposes an approach to find the optimal set of parameters for SVDD solely based on a training set from one class and without any user parameterisation. Results on artificial and real data sets are presented, underpinning the usefulness of the approach.

An Algorithm of Finite Capacity Material Requirement Planning System for Multi-stage Assembly Flow Shop

This paper aims to develop an algorithm of finite capacity material requirement planning (FCMRP) system for a multistage assembly flow shop. The developed FCMRP system has two main stages. The first stage is to allocate operations to the first and second priority work centers and also determine the sequence of the operations on each work center. The second stage is to determine the optimal start time of each operation by using a linear programming model. Real data from a factory is used to analyze and evaluate the effectiveness of the proposed FCMRP system and also to guarantee a practical solution to the user. There are five performance measures, namely, the total tardiness, the number of tardy orders, the total earliness, the number of early orders, and the average flow-time. The proposed FCMRP system offers an adjustable solution which is a compromised solution among the conflicting performance measures. The user can adjust the weight of each performance measure to obtain the desired performance. The result shows that the combination of FCMRP NP3 and EDD outperforms other combinations in term of overall performance index. The calculation time for the proposed FCMRP system is about 10 minutes which is practical for the planners of the factory.

EMD-Based Signal Noise Reduction

This paper introduces a new signal denoising based on the Empirical mode decomposition (EMD) framework. The method is a fully data driven approach. Noisy signal is decomposed adaptively into oscillatory components called Intrinsic mode functions (IMFs) by means of a process called sifting. The EMD denoising involves filtering or thresholding each IMF and reconstructs the estimated signal using the processed IMFs. The EMD can be combined with a filtering approach or with nonlinear transformation. In this work the Savitzky-Golay filter and shoftthresholding are investigated. For thresholding, IMF samples are shrinked or scaled below a threshold value. The standard deviation of the noise is estimated for every IMF. The threshold is derived for the Gaussian white noise. The method is tested on simulated and real data and compared with averaging, median and wavelet approaches.

Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

A Decision Boundary based Discretization Technique using Resampling

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce a technique by using resampling (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether the resampling technique can lead to better discretization points, which opens up a new paradigm to construction of soft decision trees.