Data Analysis Techniques for Predictive Maintenance on Fleet of Heavy-Duty Vehicles

The present study proposes a methodology for the efficient daily management of fleet vehicles and construction machinery. The application covers the area of remote monitoring of heavy-duty vehicles operation parameters, where specific sensor data are stored and examined in order to provide information about the vehicle’s health. The vehicle diagnostics allow the user to inspect whether maintenance tasks need to be performed before a fault occurs. A properly designed machine learning model is proposed for the detection of two different types of faults through classification. Cross validation is used and the accuracy of the trained model is checked with the confusion matrix.

Genetic Algorithms for Feature Generation in the Context of Audio Classification

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

The Design of a Vehicle Traffic Flow Prediction Model for a Gauteng Freeway Based on an Ensemble of Multi-Layer Perceptron

The cities of Johannesburg and Pretoria both located in the Gauteng province are separated by a distance of 58 km. The traffic queues on the Ben Schoeman freeway which connects these two cities can stretch for almost 1.5 km. Vehicle traffic congestion impacts negatively on the business and the commuter’s quality of life. The goal of this paper is to identify variables that influence the flow of traffic and to design a vehicle traffic prediction model, which will predict the traffic flow pattern in advance. The model will unable motorist to be able to make appropriate travel decisions ahead of time. The data used was collected by Mikro’s Traffic Monitoring (MTM). Multi-Layer perceptron (MLP) was used individually to construct the model and the MLP was also combined with Bagging ensemble method to training the data. The cross—validation method was used for evaluating the models. The results obtained from the techniques were compared using predictive and prediction costs. The cost was computed using combination of the loss matrix and the confusion matrix. The predicted models designed shows that the status of the traffic flow on the freeway can be predicted using the following parameters travel time, average speed, traffic volume and day of month. The implications of this work is that commuters will be able to spend less time travelling on the route and spend time with their families. The logistics industry will save more than twice what they are currently spending.

Least-Squares Support Vector Machine for Characterization of Clusters of Microcalcifications

Clusters of Microcalcifications (MCCs) are most frequent symptoms of Ductal Carcinoma in Situ (DCIS) recognized by mammography. Least-Square Support Vector Machine (LS-SVM) is a variant of the standard SVM. In the paper, LS-SVM is proposed as a classifier for classifying MCCs as benign or malignant based on relevant extracted features from enhanced mammogram. To establish the credibility of LS-SVM classifier for classifying MCCs, a comparative evaluation of the relative performance of LS-SVM classifier for different kernel functions is made. For comparative evaluation, confusion matrix and ROC analysis are used. Experiments are performed on data extracted from mammogram images of DDSM database. A total of 380 suspicious areas are collected, which contain 235 malignant and 145 benign samples, from mammogram images of DDSM database. A set of 50 features is calculated for each suspicious area. After this, an optimal subset of 23 most suitable features is selected from 50 features by Particle Swarm Optimization (PSO). The results of proposed study are quite promising.

Isolation and Classification of Red Blood Cells in Anemic Microscopic Images

Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. Anemia is a lack of RBCs is characterized by its level compared to the normal hemoglobin level. In this study, a system based image processing methodology was developed to localize and extract RBCs from microscopic images. Also, the machine learning approach is adopted to classify the localized anemic RBCs images. Several textural and geometrical features are calculated for each extracted RBCs. The training set of features was analyzed using principal component analysis (PCA). With the proposed method, RBCs were isolated in 4.3secondsfrom an image containing 18 to 27 cells. The reasons behind using PCA are its low computation complexity and suitability to find the most discriminating features which can lead to accurate classification decisions. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network RBFNN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained within short time period, and the results became better when PCA was used.

N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts

This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.

Puff Noise Detection and Cancellation for Robust Speech Recognition

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Satellite Data Classification Accuracy Assessment Based from Reference Dataset

In order to develop forest management strategies in tropical forest in Malaysia, surveying the forest resources and monitoring the forest area affected by logging activities is essential. There are tremendous effort has been done in classification of land cover related to forest resource management in this country as it is a priority in all aspects of forest mapping using remote sensing and related technology such as GIS. In fact classification process is a compulsory step in any remote sensing research. Therefore, the main objective of this paper is to assess classification accuracy of classified forest map on Landsat TM data from difference number of reference data (200 and 388 reference data). This comparison was made through observation (200 reference data), and interpretation and observation approaches (388 reference data). Five land cover classes namely primary forest, logged over forest, water bodies, bare land and agricultural crop/mixed horticultural can be identified by the differences in spectral wavelength. Result showed that an overall accuracy from 200 reference data was 83.5 % (kappa value 0.7502459; kappa variance 0.002871), which was considered acceptable or good for optical data. However, when 200 reference data was increased to 388 in the confusion matrix, the accuracy slightly improved from 83.5% to 89.17%, with Kappa statistic increased from 0.7502459 to 0.8026135, respectively. The accuracy in this classification suggested that this strategy for the selection of training area, interpretation approaches and number of reference data used were importance to perform better classification result.

Autonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set

The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter  of the widely used RBF kernel. While for two-class SVMs the parameters can be tuned using cross-validation based on the confusion matrix, for a one-class SVM this is not possible, because only true positives and false negatives can occur during training. This paper proposes an approach to find the optimal set of parameters for SVDD solely based on a training set from one class and without any user parameterisation. Results on artificial and real data sets are presented, underpinning the usefulness of the approach.

Words Reordering based on Statistical Language Model

There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.