Abstract: Fuzzy C-means Clustering algorithm (FCM) is a
method that is frequently used in pattern recognition. It has the
advantage of giving good modeling results in many cases, although,
it is not capable of specifying the number of clusters by itself. In
FCM algorithm most researchers fix weighting exponent (m) to a
conventional value of 2 which might not be the appropriate for all
applications. Consequently, the main objective of this paper is to use
the subtractive clustering algorithm to provide the optimal number of
clusters needed by FCM algorithm by optimizing the parameters of
the subtractive clustering algorithm by an iterative search approach
and then to find an optimal weighting exponent (m) for the FCM
algorithm. In order to get an optimal number of clusters, the iterative
search approach is used to find the optimal single-output Sugenotype
Fuzzy Inference System (FIS) model by optimizing the
parameters of the subtractive clustering algorithm that give minimum
least square error between the actual data and the Sugeno fuzzy
model. Once the number of clusters is optimized, then two
approaches are proposed to optimize the weighting exponent (m) in
the FCM algorithm, namely, the iterative search approach and the
genetic algorithms. The above mentioned approach is tested on the
generated data from the original function and optimal fuzzy models
are obtained with minimum error between the real data and the
obtained fuzzy models.
Abstract: We present a new algorithm for nonlinear dimensionality reduction that consistently uses global information, and that enables understanding the intrinsic geometry of non-convex manifolds. Compared to methods that consider only local information, our method appears to be more robust to noise. Unlike most methods that incorporate global information, the proposed approach automatically handles non-convexity of the data manifold. We demonstrate the performance of our algorithm and compare it to state-of-the-art methods on synthetic as well as real data.
Abstract: This paper considers inference under progressive type II censoring with a compound Rayleigh failure time distribution. The maximum likelihood (ML), and Bayes methods are used for estimating the unknown parameters as well as some lifetime parameters, namely reliability and hazard functions. We obtained Bayes estimators using the conjugate priors for two shape and scale parameters. When the two parameters are unknown, the closed-form expressions of the Bayes estimators cannot be obtained. We use Lindley.s approximation to compute the Bayes estimates. Another Bayes estimator has been obtained based on continuous-discrete joint prior for the unknown parameters. An example with the real data is discussed to illustrate the proposed method. Finally, we made comparisons between these estimators and the maximum likelihood estimators using a Monte Carlo simulation study.
Abstract: Ontologies and tagging systems are two different ways to organize the knowledge present in the current Web. In this paper we propose a simple method to model folksonomies, as tagging systems, with ontologies. We show the scalability of the method using real data sets. The modeling method is composed of a generic ontology that represents any folksonomy and an algorithm to transform the information contained in folksonomies to the generic ontology. The method allows representing folksonomies at any instant of time.
Abstract: In Data mining, Fuzzy clustering algorithms have
demonstrated advantage over crisp clustering algorithms in dealing
with the challenges posed by large collections of vague and uncertain
natural data. This paper reviews concept of fuzzy logic and fuzzy
clustering. The classical fuzzy c-means algorithm is presented and its
limitations are highlighted. Based on the study of the fuzzy c-means
algorithm and its extensions, we propose a modification to the cmeans
algorithm to overcome the limitations of it in calculating the
new cluster centers and in finding the membership values with
natural data. The efficiency of the new modified method is
demonstrated on real data collected for Bhutan-s Gross National
Happiness (GNH) program.
Abstract: This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Abstract: Crude oil blending is an important unit operation in
petroleum refining industry. A good model for the blending system is
beneficial for supervision operation, prediction of the export
petroleum quality and realizing model-based optimal control. Since
the blending cannot follow the ideal mixing rule in practice, we
propose a static neural network to approximate the blending
properties. By the dead-zone approach, we propose a new robust
learning algorithm and give theoretical analysis. Real data of crude
oil blending is applied to illustrate the neuro modeling approach.
Abstract: This paper reports the feasibility of the ARMA model
to describe a bursty video source transmitting over a AAL5 ATM link
(VBR traffic). The traffic represents the activity of the action movie
"Lethal Weapon 3" transmitted over the ATM network using the Fore
System AVA-200 ATM video codec with a peak rate of 100 Mbps
and a frame rate of 25. The model parameters were estimated for a
single video source and independently multiplexed video sources. It
was found that the model ARMA (2, 4) is well-suited for the real data
in terms of average rate traffic profile, probability density function,
autocorrelation function, burstiness measure, and the pole-zero
distribution of the filter model.
Abstract: Accurate demand forecasting is one of the most key
issues in inventory management of spare parts. The problem of
modeling future consumption becomes especially difficult for lumpy
patterns, which characterized by intervals in which there is no
demand and, periods with actual demand occurrences with large
variation in demand levels. However, many of the forecasting
methods may perform poorly when demand for an item is lumpy.
In this study based on the characteristic of lumpy demand patterns
of spare parts a hybrid forecasting approach has been developed,
which use a multi-layered perceptron neural network and a
traditional recursive method for forecasting future demands. In the
described approach the multi-layered perceptron are adapted to
forecast occurrences of non-zero demands, and then a conventional
recursive method is used to estimate the quantity of non-zero
demands. In order to evaluate the performance of the proposed
approach, their forecasts were compared to those obtained by using
Syntetos & Boylan approximation, recently employed multi-layered
perceptron neural network, generalized regression neural network
and elman recurrent neural network in this area. The models were
applied to forecast future demand of spare parts of Arak
Petrochemical Company in Iran, using 30 types of real data sets. The
results indicate that the forecasts obtained by using our proposed
mode are superior to those obtained by using other methods.
Abstract: This paper proposes a novel architecture for developing decision support systems. Unlike conventional decision support systems, the proposed architecture endeavors to reveal the decision-making process such that humans' subjectivity can be incorporated into a computerized system and, at the same time, to preserve the capability of the computerized system in processing information objectively. A number of techniques used in developing the decision support system are elaborated to make the decisionmarking process transparent. These include procedures for high dimensional data visualization, pattern classification, prediction, and evolutionary computational search. An artificial data set is first employed to compare the proposed approach with other methods. A simulated handwritten data set and a real data set on liver disease diagnosis are then employed to evaluate the efficacy of the proposed approach. The results are analyzed and discussed. The potentials of the proposed architecture as a useful decision support system are demonstrated.
Abstract: In this paper, estimation of the linear regression
model is made by ordinary least squares method and the
partially linear regression model is estimated by penalized
least squares method using smoothing spline. Then, it is
investigated that differences and similarity in the sum of
squares related for linear regression and partial linear
regression models (semi-parametric regression models). It is
denoted that the sum of squares in linear regression is reduced
to sum of squares in partial linear regression models.
Furthermore, we indicated that various sums of squares in the
linear regression are similar to different deviance statements in
partial linear regression. In addition to, coefficient of the
determination derived in linear regression model is easily
generalized to coefficient of the determination of the partial
linear regression model. For this aim, it is made two different
applications. A simulated and a real data set are considered to
prove the claim mentioned here. In this way, this study is
supported with a simulation and a real data example.
Abstract: We propose an enhanced collaborative filtering
method using Hofstede-s cultural dimensions, calculated for 111
countries. We employ 4 of these dimensions, which are correlated to
the costumers- buying behavior, in order to detect users- preferences
for items. In addition, several advantages of this method
demonstrated for data sparseness and cold-start users, which are
important challenges in collaborative filtering. We present
experiments using a real dataset, Book Crossing Dataset.
Experimental results shows that the proposed algorithm provide
significant advantages in terms of improving recommendation
quality.
Abstract: This paper describes an enhanced cookie-based
method for counting the visitors of web sites by using a web log
processing system that aims to cope with the ambitious goal of
creating countrywide statistics about the browsing practices of real
human individuals. The focus is put on describing a new more
efficient way of detecting human beings behind web users by placing
different identifiers on the client computers. We briefly introduce our
processing system designed to handle the massive amount of data
records continuously gathered from the most important content
providers of the Hungary. We conclude by showing statistics of
different time spans comparing the efficiency of multiple visitor
counting methods to the one presented here, and some interesting
charts about content providers and web usage based on real data
recorded in 2007 will also be presented.
Abstract: The move towards internationalization of accounting encountered a great boost, when in 2002 EU delegated the IASB to provide the accounting standards to be applied inside its frontiers. Among the incentives of the standardization of accounting on the international level, is the reduction of the cost of capital. Romania made the move towards IFRS before EU, when the country was not yet a member of it. Even if this made Romania a special case, it was scarcely approached. The leak of real data is usually the reason for avoiding. The novelty of this paper is that it offers an insight from the reality of Romanian companies and their view regarding the IFRS. The paper is based on a survey that the authors made among the companies listed on the first two tiers of the Bucharest Stock Exchange (BSE), which are basically, the most important companies in the country.
Abstract: This paper presents the result of three senior capstone
projects at the Department of Computer Engineering, Prince of
Songkla University, Thailand. These projects focus on developing an
examination management system for the Faculty of Engineering in
order to manage the examination both the examination room
assignments and the examination proctor assignments in each room.
The current version of the software is a web-based application. The
developed software allows the examination proctors to select their
scheduled time online while each subject is assigned to each available
examination room according to its type and the room capacity. The
developed system is evaluated using real data by prospective users of
the system. Several suggestions for further improvements are given
by the testers. Even though the features of the developed software are
not superior, the developing process can be a case study for a projectbased
teaching style. Furthermore, the process of developing this
software can show several issues in developing an educational
support application.
Abstract: The one-class support vector machine “support vector
data description” (SVDD) is an ideal approach for anomaly or outlier
detection. However, for the applicability of SVDD in real-world
applications, the ease of use is crucial. The results of SVDD are
massively determined by the choice of the regularisation parameter C
and the kernel parameter of the widely used RBF kernel. While for
two-class SVMs the parameters can be tuned using cross-validation
based on the confusion matrix, for a one-class SVM this is not
possible, because only true positives and false negatives can occur
during training. This paper proposes an approach to find the optimal
set of parameters for SVDD solely based on a training set from
one class and without any user parameterisation. Results on artificial
and real data sets are presented, underpinning the usefulness of the
approach.
Abstract: This paper aims to develop an algorithm of finite
capacity material requirement planning (FCMRP) system for a multistage
assembly flow shop. The developed FCMRP system has two
main stages. The first stage is to allocate operations to the first and
second priority work centers and also determine the sequence of the
operations on each work center. The second stage is to determine the
optimal start time of each operation by using a linear programming
model. Real data from a factory is used to analyze and evaluate the
effectiveness of the proposed FCMRP system and also to guarantee a
practical solution to the user. There are five performance measures,
namely, the total tardiness, the number of tardy orders, the total
earliness, the number of early orders, and the average flow-time. The
proposed FCMRP system offers an adjustable solution which is a
compromised solution among the conflicting performance measures.
The user can adjust the weight of each performance measure to
obtain the desired performance. The result shows that the combination
of FCMRP NP3 and EDD outperforms other combinations
in term of overall performance index. The calculation time for the
proposed FCMRP system is about 10 minutes which is practical for
the planners of the factory.
Abstract: This paper introduces a new signal denoising based on the Empirical mode decomposition (EMD) framework. The method is a fully data driven approach. Noisy signal is decomposed adaptively into oscillatory components called Intrinsic mode functions (IMFs) by means of a process called sifting. The EMD denoising involves filtering or thresholding each IMF and reconstructs the estimated signal using the processed IMFs. The EMD can be combined with a filtering approach or with nonlinear transformation. In this work the Savitzky-Golay filter and shoftthresholding are investigated. For thresholding, IMF samples are shrinked or scaled below a threshold value. The standard deviation of the noise is estimated for every IMF. The threshold is derived for the Gaussian white noise. The method is tested on simulated and real data and compared with averaging, median and wavelet approaches.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: Many supervised induction algorithms require discrete
data, even while real data often comes in a discrete
and continuous formats. Quality discretization of continuous
attributes is an important problem that has effects on speed,
accuracy and understandability of the induction models. Usually,
discretization and other types of statistical processes are applied
to subsets of the population as the entire population is practically
inaccessible. For this reason we argue that the discretization
performed on a sample of the population is only an estimate of
the entire population. Most of the existing discretization methods,
partition the attribute range into two or several intervals using
a single or a set of cut points. In this paper, we introduce a
technique by using resampling (such as bootstrap) to generate
a set of candidate discretization points and thus, improving the
discretization quality by providing a better estimation towards
the entire population. Thus, the goal of this paper is to observe
whether the resampling technique can lead to better discretization
points, which opens up a new paradigm to construction of
soft decision trees.