Abstract: The present study proposes a methodology for the efficient daily management of fleet vehicles and construction machinery. The application covers the area of remote monitoring of heavy-duty vehicles operation parameters, where specific sensor data are stored and examined in order to provide information about the vehicle’s health. The vehicle diagnostics allow the user to inspect whether maintenance tasks need to be performed before a fault occurs. A properly designed machine learning model is proposed for the detection of two different types of faults through classification. Cross validation is used and the accuracy of the trained model is checked with the confusion matrix.
Abstract: Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.
Abstract: The cities of Johannesburg and Pretoria both located in the Gauteng province are separated by a distance of 58 km. The traffic queues on the Ben Schoeman freeway which connects these two cities can stretch for almost 1.5 km. Vehicle traffic congestion impacts negatively on the business and the commuter’s quality of life. The goal of this paper is to identify variables that influence the flow of traffic and to design a vehicle traffic prediction model, which will predict the traffic flow pattern in advance. The model will unable motorist to be able to make appropriate travel decisions ahead of time. The data used was collected by Mikro’s Traffic Monitoring (MTM). Multi-Layer perceptron (MLP) was used individually to construct the model and the MLP was also combined with Bagging ensemble method to training the data. The cross—validation method was used for evaluating the models. The results obtained from the techniques were compared using predictive and prediction costs. The cost was computed using combination of the loss matrix and the confusion matrix. The predicted models designed shows that the status of the traffic flow on the freeway can be predicted using the following parameters travel time, average speed, traffic volume and day of month. The implications of this work is that commuters will be able to spend less time travelling on the route and spend time with their families. The logistics industry will save more than twice what they are currently spending.
Abstract: Clusters of Microcalcifications (MCCs) are most frequent symptoms of Ductal Carcinoma in Situ (DCIS) recognized by mammography. Least-Square Support Vector Machine (LS-SVM) is a variant of the standard SVM. In the paper, LS-SVM is proposed as a classifier for classifying MCCs as benign or malignant based on relevant extracted features from enhanced mammogram. To establish the credibility of LS-SVM classifier for classifying MCCs, a comparative evaluation of the relative performance of LS-SVM classifier for different kernel functions is made. For comparative evaluation, confusion matrix and ROC analysis are used. Experiments are performed on data extracted from mammogram images of DDSM database. A total of 380 suspicious areas are collected, which contain 235 malignant and 145 benign samples, from mammogram images of DDSM database. A set of 50 features is calculated for each suspicious area. After this, an optimal subset of 23 most suitable features is selected from 50 features by Particle Swarm Optimization (PSO). The results of proposed study are quite promising.
Abstract: Red blood cells (RBCs) are among the most
commonly and intensively studied type of blood cells in cell biology.
Anemia is a lack of RBCs is characterized by its level compared to
the normal hemoglobin level. In this study, a system based image
processing methodology was developed to localize and extract RBCs
from microscopic images. Also, the machine learning approach is
adopted to classify the localized anemic RBCs images. Several
textural and geometrical features are calculated for each extracted
RBCs. The training set of features was analyzed using principal
component analysis (PCA). With the proposed method, RBCs were
isolated in 4.3secondsfrom an image containing 18 to 27 cells. The
reasons behind using PCA are its low computation complexity and
suitability to find the most discriminating features which can lead to
accurate classification decisions. Our classifier algorithm yielded
accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor
(K-NN) algorithm, support vector machine (SVM), and neural
network RBFNN, respectively. Classification was evaluated in highly
sensitivity, specificity, and kappa statistical parameters. In
conclusion, the classification results were obtained within short time
period, and the results became better when PCA was used.
Abstract: This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.
Abstract: In this paper, an algorithm for detecting and attenuating
puff noises frequently generated under the mobile environment is
proposed. As a baseline system, puff detection system is designed
based on Gaussian Mixture Model (GMM), and 39th Mel Frequency
Cepstral Coefficient (MFCC) is extracted as feature parameters. To
improve the detection performance, effective acoustic features for puff
detection are proposed. In addition, detected puff intervals are
attenuated by high-pass filtering. The speech recognition rate was
measured for evaluation and confusion matrix and ROC curve are used
to confirm the validity of the proposed system.
Abstract: In order to develop forest management strategies in
tropical forest in Malaysia, surveying the forest resources and
monitoring the forest area affected by logging activities is essential.
There are tremendous effort has been done in classification of land
cover related to forest resource management in this country as it is a
priority in all aspects of forest mapping using remote sensing and
related technology such as GIS. In fact classification process is a
compulsory step in any remote sensing research. Therefore, the main
objective of this paper is to assess classification accuracy of
classified forest map on Landsat TM data from difference number of
reference data (200 and 388 reference data). This comparison was
made through observation (200 reference data), and interpretation
and observation approaches (388 reference data). Five land cover
classes namely primary forest, logged over forest, water bodies, bare
land and agricultural crop/mixed horticultural can be identified by
the differences in spectral wavelength. Result showed that an overall
accuracy from 200 reference data was 83.5 % (kappa value
0.7502459; kappa variance 0.002871), which was considered
acceptable or good for optical data. However, when 200 reference
data was increased to 388 in the confusion matrix, the accuracy
slightly improved from 83.5% to 89.17%, with Kappa statistic
increased from 0.7502459 to 0.8026135, respectively. The accuracy
in this classification suggested that this strategy for the selection of
training area, interpretation approaches and number of reference data
used were importance to perform better classification result.
Abstract: The one-class support vector machine “support vector
data description” (SVDD) is an ideal approach for anomaly or outlier
detection. However, for the applicability of SVDD in real-world
applications, the ease of use is crucial. The results of SVDD are
massively determined by the choice of the regularisation parameter C
and the kernel parameter of the widely used RBF kernel. While for
two-class SVMs the parameters can be tuned using cross-validation
based on the confusion matrix, for a one-class SVM this is not
possible, because only true positives and false negatives can occur
during training. This paper proposes an approach to find the optimal
set of parameters for SVDD solely based on a training set from
one class and without any user parameterisation. Results on artificial
and real data sets are presented, underpinning the usefulness of the
approach.
Abstract: There are multiple reasons to expect that detecting the
word order errors in a text will be a difficult problem, and detection
rates reported in the literature are in fact low. Although grammatical
rules constructed by computer linguists improve the performance of
grammar checker in word order diagnosis, the repairing task is still
very difficult. This paper presents an approach for repairing word
order errors in English text by reordering words in a sentence and
choosing the version that maximizes the number of trigram hits
according to a language model. The novelty of this method concerns
the use of an efficient confusion matrix technique for reordering the
words. The comparative advantage of this method is that works with
a large set of words, and avoids the laborious and costly process of
collecting word order errors for creating error patterns.