Abstract: Traffic Management and Information Systems, which rely on a system of sensors, aim to describe in real-time traffic in urban areas using a set of parameters and estimating them. Though the state of the art focuses on data analysis, little is done in the sense of prediction. In this paper, we describe a machine learning system for traffic flow management and control for a prediction of traffic flow problem. This new algorithm is obtained by combining Random Forests algorithm into Adaboost algorithm as a weak learner. We show that our algorithm performs relatively well on real data, and enables, according to the Traffic Flow Evaluation model, to estimate and predict whether there is congestion or not at a given time on road intersections.
Abstract: Using neural network we try to model the unknown function f for given input-output data pairs. The connection strength of each neuron is updated through learning. Repeated simulations of crisp neural network produce different values of weight factors that are directly affected by the change of different parameters. We propose the idea that for each neuron in the network, we can obtain quasi-fuzzy weight sets (QFWS) using repeated simulation of the crisp neural network. Such type of fuzzy weight functions may be applied where we have multivariate crisp input that needs to be adjusted after iterative learning, like claim amount distribution analysis. As real data is subjected to noise and uncertainty, therefore, QFWS may be helpful in the simplification of such complex problems. Secondly, these QFWS provide good initial solution for training of fuzzy neural networks with reduced computational complexity.
Abstract: Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.
Abstract: One of the main processes of supply chain
management is supplier selection process which its accurate
implementation can dramatically increase company competitiveness.
In presented article model developed based on the features of
second tiers suppliers and four scenarios are predicted in order to
help the decision maker (DM) in making up his/her mind. In addition
two tiers of suppliers have been considered as a chain of suppliers.
Then the proposed approach is solved by a method combined of
concepts of fuzzy set theory (FST) and linear programming (LP)
which has been nourished by real data extracted from an engineering
design and supplying parts company. At the end results reveal the
high importance of considering second tier suppliers features as
criteria for selecting the best supplier.
Abstract: This paper presents a computational methodology
based on matrix operations for a computer based solution to the
problem of performance analysis of software reliability models
(SRMs). A set of seven comparison criteria have been formulated to
rank various non-homogenous Poisson process software reliability
models proposed during the past 30 years to estimate software
reliability measures such as the number of remaining faults, software
failure rate, and software reliability. Selection of optimal SRM for
use in a particular case has been an area of interest for researchers in
the field of software reliability. Tools and techniques for software
reliability model selection found in the literature cannot be used with
high level of confidence as they use a limited number of model
selection criteria. A real data set of middle size software project from
published papers has been used for demonstration of matrix method.
The result of this study will be a ranking of SRMs based on the
Permanent value of the criteria matrix formed for each model based
on the comparison criteria. The software reliability model with
highest value of the Permanent is ranked at number – 1 and so on.
Abstract: As emails communications have no consistent
authentication procedure to ensure the authenticity, we present an
investigation analysis approach for detecting forged emails based on
Random Forests and Naïve Bays classifiers. Instead of investigating
the email headers, we use the body content to extract a unique writing
style for all the possible suspects. Our approach consists of four main
steps: (1) The cybercrime investigator extract different effective
features including structural, lexical, linguistic, and syntactic
evidence from previous emails for all the possible suspects, (2) The
extracted features vectors are normalized to increase the accuracy
rate. (3) The normalized features are then used to train the learning
engine, (4) upon receiving the anonymous email (M); we apply the
feature extraction process to produce a feature vector. Finally, using
the machine learning classifiers the email is assigned to one of the
suspects- whose writing style closely matches M. Experimental
results on real data sets show the improved performance of the
proposed method and the ability of identifying the authors with a
very limited number of features.
Abstract: This paper deals with the localization of the wideband sources. We develop a new approach for estimating the wide band sources parameters. This method is based on the high order statistics of the recorded data in order to eliminate the Gaussian components from the signals received on the various hydrophones.In fact the noise of sea bottom is regarded as being Gaussian. Thanks to the coherent signal subspace algorithm based on the cumulant matrix of the received data instead of the cross-spectral matrix the wideband correlated sources are perfectly located in the very noisy environment. We demonstrate the performance of the proposed algorithm on the real data recorded during an underwater acoustics experiments.
Abstract: Detection of incipient abnormal events is important to
improve safety and reliability of machine operations and reduce losses
caused by failures. Improper set-ups or aligning of parts often leads to
severe problems in many machines. The construction of prediction
models for predicting faulty conditions is quite essential in making
decisions on when to perform machine maintenance. This paper
presents a multivariate calibration monitoring approach based on the
statistical analysis of machine measurement data. The calibration
model is used to predict two faulty conditions from historical reference
data. This approach utilizes genetic algorithms (GA) based variable
selection, and we evaluate the predictive performance of several
prediction methods using real data. The results shows that the
calibration model based on supervised probabilistic principal
component analysis (SPPCA) yielded best performance in this work.
By adopting a proper variable selection scheme in calibration models,
the prediction performance can be improved by excluding
non-informative variables from their model building steps.
Abstract: In this paper we propose a mixture of two different
distributions such as Exponential-Gamma, Exponential-Weibull and
Gamma-Weibull to model heterogeneous survival data. Various
properties of the proposed mixture of two different distributions are
discussed. Maximum likelihood estimations of the parameters are
obtained by using the EM algorithm. Illustrative example based on
real data are also given.
Abstract: This paper presents the determination of the proper
quality costs parameters which provide the optimum return. The
system dynamics simulation was applied. The simulation model was
constructed by the real data from a case of the electronic devices
manufacturer in Thailand. The Steepest Descent algorithm was
employed to optimise. The experimental results show that the
company should spend on prevention and appraisal activities for 850
and 10 Baht/day respectively. It provides minimum cumulative total
quality cost, which is 258,000 Baht in twelve months. The effect of
the step size in the stage of improving the variables to the optimum
was also investigated. It can be stated that the smaller step size
provided a better result with more experimental runs. However, the
different yield in this case is not significant in practice. Therefore, the
greater step size is recommended because the region of optima could
be reached more easily and rapidly.
Abstract: This paper describes an automated event detection and location system for water distribution pipelines which is based upon low-cost sensor technology and signature analysis by an Artificial
Neural Network (ANN). The development of a low cost failure sensor which measures the opacity or cloudiness of the local water
flow has been designed, developed and validated, and an ANN based system is then described which uses time series data produced by
sensors to construct an empirical model for time series prediction and
classification of events. These two components have been installed,
tested and verified in an experimental site in a UK water distribution
system. Verification of the system has been achieved from a series of
simulated burst trials which have provided real data sets. It is concluded that the system has potential in water distribution network
management.
Abstract: The objective of the present research manuscript is to
perform parametric, nonparametric, and decision tree analysis to
evaluate two treatments that are being used for breast cancer patients.
Our study is based on utilizing real data which was initially used in
“Tamoxifen with or without breast irradiation in women of 50 years
of age or older with early breast cancer" [1], and the data is supplied
to us by N.A. Ibrahim “Decision tree for competing risks survival
probability in breast cancer study" [2]. We agree upon certain aspects
of our findings with the published results. However, in this
manuscript, we focus on relapse time of breast cancer patients instead
of survival time and parametric analysis instead of semi-parametric
decision tree analysis is applied to provide more precise
recommendations of effectiveness of the two treatments with respect
to reoccurrence of breast cancer.
Abstract: Recurrent event data is a special type of multivariate
survival data. Dynamic and frailty models are one of the approaches
that dealt with this kind of data. A comparison between these two
models is studied using the empirical standard deviation of the
standardized martingale residual processes as a way of assessing the
fit of the two models based on the Aalen additive regression model.
Here we found both approaches took heterogeneity into account and
produce residual standard deviations close to each other both in the
simulation study and in the real data set.
Abstract: The belief K-modes method (BKM) approach is a new
clustering technique handling uncertainty in the attribute values of
objects in both the cluster construction task and the classification one.
Like the standard version of this method, the BKM results depend on
the chosen initial modes. So, one selection method of initial modes
is developed, in this paper, aiming at improving the performances of
the BKM approach. Experiments with several sets of real data show
that by considered the developed selection initial modes method, the
clustering algorithm produces more accurate results.
Abstract: Web usage mining has become a popular research
area, as a huge amount of data is available online. These data can be
used for several purposes, such as web personalization, web structure
enhancement, web navigation prediction etc. However, the raw log
files are not directly usable; they have to be preprocessed in order to
transform them into a suitable format for different data mining tasks.
One of the key issues in the preprocessing phase is to identify web
users. Identifying users based on web log files is not a
straightforward problem, thus various methods have been developed.
There are several difficulties that have to be overcome, such as client
side caching, changing and shared IP addresses and so on. This paper
presents three different methods for identifying web users. Two of
them are the most commonly used methods in web log mining
systems, whereas the third on is our novel approach that uses a
complex cookie-based method to identify web users. Furthermore we
also take steps towards identifying the individuals behind the
impersonal web users. To demonstrate the efficiency of the new
method we developed an implementation called Web Activity
Tracking (WAT) system that aims at a more precise distinction of
web users based on log data. We present some statistical analysis
created by the WAT on real data about the behavior of the Hungarian
web users and a comprehensive analysis and comparison of the three
methods