Traffic Flow Prediction using Adaboost Algorithm with Random Forests as a Weak Learner

Traffic Management and Information Systems, which rely on a system of sensors, aim to describe in real-time traffic in urban areas using a set of parameters and estimating them. Though the state of the art focuses on data analysis, little is done in the sense of prediction. In this paper, we describe a machine learning system for traffic flow management and control for a prediction of traffic flow problem. This new algorithm is obtained by combining Random Forests algorithm into Adaboost algorithm as a weak learner. We show that our algorithm performs relatively well on real data, and enables, according to the Traffic Flow Evaluation model, to estimate and predict whether there is congestion or not at a given time on road intersections.

Approximate Bounded Knowledge Extraction Using Type-I Fuzzy Logic

Using neural network we try to model the unknown function f for given input-output data pairs. The connection strength of each neuron is updated through learning. Repeated simulations of crisp neural network produce different values of weight factors that are directly affected by the change of different parameters. We propose the idea that for each neuron in the network, we can obtain quasi-fuzzy weight sets (QFWS) using repeated simulation of the crisp neural network. Such type of fuzzy weight functions may be applied where we have multivariate crisp input that needs to be adjusted after iterative learning, like claim amount distribution analysis. As real data is subjected to noise and uncertainty, therefore, QFWS may be helpful in the simplification of such complex problems. Secondly, these QFWS provide good initial solution for training of fuzzy neural networks with reduced computational complexity.

An AK-Chart for the Non-Normal Data

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

A Scenario Oriented Supplier Selection by Considering a Multi Tier Supplier Network

One of the main processes of supply chain management is supplier selection process which its accurate implementation can dramatically increase company competitiveness. In presented article model developed based on the features of second tiers suppliers and four scenarios are predicted in order to help the decision maker (DM) in making up his/her mind. In addition two tiers of suppliers have been considered as a chain of suppliers. Then the proposed approach is solved by a method combined of concepts of fuzzy set theory (FST) and linear programming (LP) which has been nourished by real data extracted from an engineering design and supplying parts company. At the end results reveal the high importance of considering second tier suppliers features as criteria for selecting the best supplier.

Performance Analysis of Software Reliability Models using Matrix Method

This paper presents a computational methodology based on matrix operations for a computer based solution to the problem of performance analysis of software reliability models (SRMs). A set of seven comparison criteria have been formulated to rank various non-homogenous Poisson process software reliability models proposed during the past 30 years to estimate software reliability measures such as the number of remaining faults, software failure rate, and software reliability. Selection of optimal SRM for use in a particular case has been an area of interest for researchers in the field of software reliability. Tools and techniques for software reliability model selection found in the literature cannot be used with high level of confidence as they use a limited number of model selection criteria. A real data set of middle size software project from published papers has been used for demonstration of matrix method. The result of this study will be a ranking of SRMs based on the Permanent value of the criteria matrix formed for each model based on the comparison criteria. The software reliability model with highest value of the Permanent is ranked at number – 1 and so on.

Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers

As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.

Identification of Wideband Sources Using Higher Order Statistics in Noisy Environment

This paper deals with the localization of the wideband sources. We develop a new approach for estimating the wide band sources parameters. This method is based on the high order statistics of the recorded data in order to eliminate the Gaussian components from the signals received on the various hydrophones.In fact the noise of sea bottom is regarded as being Gaussian. Thanks to the coherent signal subspace algorithm based on the cumulant matrix of the received data instead of the cross-spectral matrix the wideband correlated sources are perfectly located in the very noisy environment. We demonstrate the performance of the proposed algorithm on the real data recorded during an underwater acoustics experiments.

Automated Process Quality Monitoring with Prediction of Fault Condition Using Measurement Data

Detection of incipient abnormal events is important to improve safety and reliability of machine operations and reduce losses caused by failures. Improper set-ups or aligning of parts often leads to severe problems in many machines. The construction of prediction models for predicting faulty conditions is quite essential in making decisions on when to perform machine maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of machine measurement data. The calibration model is used to predict two faulty conditions from historical reference data. This approach utilizes genetic algorithms (GA) based variable selection, and we evaluate the predictive performance of several prediction methods using real data. The results shows that the calibration model based on supervised probabilistic principal component analysis (SPPCA) yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.

A Mixture Model of Two Different Distributions Approach to the Analysis of Heterogeneous Survival Data

In this paper we propose a mixture of two different distributions such as Exponential-Gamma, Exponential-Weibull and Gamma-Weibull to model heterogeneous survival data. Various properties of the proposed mixture of two different distributions are discussed. Maximum likelihood estimations of the parameters are obtained by using the EM algorithm. Illustrative example based on real data are also given.

Determination of the Proper Quality Costs Parameters via Variable Step Size Steepest Descent Algorithm

This paper presents the determination of the proper quality costs parameters which provide the optimum return. The system dynamics simulation was applied. The simulation model was constructed by the real data from a case of the electronic devices manufacturer in Thailand. The Steepest Descent algorithm was employed to optimise. The experimental results show that the company should spend on prevention and appraisal activities for 850 and 10 Baht/day respectively. It provides minimum cumulative total quality cost, which is 258,000 Baht in twelve months. The effect of the step size in the stage of improving the variables to the optimum was also investigated. It can be stated that the smaller step size provided a better result with more experimental runs. However, the different yield in this case is not significant in practice. Therefore, the greater step size is recommended because the region of optima could be reached more easily and rapidly.

Artificial Neural Network Model for a Low Cost Failure Sensor: Performance Assessment in Pipeline Distribution

This paper describes an automated event detection and location system for water distribution pipelines which is based upon low-cost sensor technology and signature analysis by an Artificial Neural Network (ANN). The development of a low cost failure sensor which measures the opacity or cloudiness of the local water flow has been designed, developed and validated, and an ANN based system is then described which uses time series data produced by sensors to construct an empirical model for time series prediction and classification of events. These two components have been installed, tested and verified in an experimental site in a UK water distribution system. Verification of the system has been achieved from a series of simulated burst trials which have provided real data sets. It is concluded that the system has potential in water distribution network management.

Parametric and Nonparametric Analysis of Breast Cancer Treatments

The objective of the present research manuscript is to perform parametric, nonparametric, and decision tree analysis to evaluate two treatments that are being used for breast cancer patients. Our study is based on utilizing real data which was initially used in “Tamoxifen with or without breast irradiation in women of 50 years of age or older with early breast cancer" [1], and the data is supplied to us by N.A. Ibrahim “Decision tree for competing risks survival probability in breast cancer study" [2]. We agree upon certain aspects of our findings with the published results. However, in this manuscript, we focus on relapse time of breast cancer patients instead of survival time and parametric analysis instead of semi-parametric decision tree analysis is applied to provide more precise recommendations of effectiveness of the two treatments with respect to reoccurrence of breast cancer.

Dynamic Models versus Frailty Models for Recurrent Event Data

Recurrent event data is a special type of multivariate survival data. Dynamic and frailty models are one of the approaches that dealt with this kind of data. A comparison between these two models is studied using the empirical standard deviation of the standardized martingale residual processes as a way of assessing the fit of the two models based on the Aalen additive regression model. Here we found both approaches took heterogeneity into account and produce residual standard deviations close to each other both in the simulation study and in the real data set.

Selection Initial modes for Belief K-modes Method

The belief K-modes method (BKM) approach is a new clustering technique handling uncertainty in the attribute values of objects in both the cluster construction task and the classification one. Like the standard version of this method, the BKM results depend on the chosen initial modes. So, one selection method of initial modes is developed, in this paper, aiming at improving the performances of the BKM approach. Experiments with several sets of real data show that by considered the developed selection initial modes method, the clustering algorithm produces more accurate results.

Analysis of Web User Identification Methods

Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methods