Abstract: Biometric identification is to identify unique features in a person like fingerprints, iris, ear, and voice recognition that need the subject's permission and physical contact. Gait biometric is used to identify the unique gait of the person by extracting moving features. The main advantage of gait biometric to identify the gait of a person at a distance, without any physical contact. In this work, the gait biometric is used for person re-identification. The person walking naturally compared with the same person walking with bag, coat and case recorded using long wave infrared, short wave infrared, medium wave infrared and visible cameras. The videos are recorded in rural and in urban environments. The pre-processing technique includes human identified using You Only Look Once, background subtraction, silhouettes extraction and synthesis Gait Entropy Image by averaging the silhouettes. The moving features are extracted from the Gait Entropy Energy Image. The extracted features are dimensionality reduced by the Principal Component Analysis and recognized using different classifiers. The comparative results with the different classifier show that Linear Discriminant Analysis outperform other classifiers with 95.8% for visible in the rural dataset and 94.8% for longwave infrared in the urban dataset.
Abstract: Network security is role of the ICT environment
because malicious users are continually growing that realm of
education, business, and then related with ICT. The network security
contravention is typically described and examined centrally based
on a security event management system. The firewalls, Intrusion
Detection System (IDS), and Intrusion Prevention System are
becoming essential to monitor or prevent of potential violations,
incidents attack, and imminent threats. In this system, the firewall
rules are set only for where the system policies are needed. Dataset
deployed in this system are derived from the testbed environment. The
traffic as in DoS and PortScan traffics are applied in the testbed with
firewall and IDS implementation. The network traffics are classified
as normal or attacks in the existing testbed environment based on
six machine learning classification methods applied in the system.
It is required to be tested to get datasets and applied for DoS and
PortScan. The dataset is based on CICIDS2017 and some features
have been added. This system tested 26 features from the applied
dataset. The system is to reduce false positive rates and to improve
accuracy in the implemented testbed design. The system also proves
good performance by selecting important features and comparing
existing a dataset by machine learning classifiers.
Abstract: In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.
Abstract: This paper aims at bringing a scientific contribution to the cardiac arrhythmia biomedical diagnosis systems; more precisely to the study of the amelioration of cardiac arrhythmia classification performance using artificial neural network, adaptive neuro-fuzzy and fuzzy inference systems classifiers. The purpose of this amelioration is to enable cardiologists to make reliable diagnosis through automatic cardiac arrhythmia analyzes and classifications based on high confidence classifiers. In this study, six classes of the most commonly encountered arrhythmias are considered: the Right Bundle Branch Block, the Left Bundle Branch Block, the Ventricular Extrasystole, the Auricular Extrasystole, the Atrial Fibrillation and the Normal Cardiac rate beat. From the electrocardiogram (ECG) extracted parameters, we constructed a matrix (360x360) serving as an input data sample for the classifiers based on neural networks and a matrix (1x6) for the classifier based on fuzzy logic. By varying three parameters (the quality of the neural network learning, the data size and the quality of the input parameters) the automatic classification permitted us to obtain the following performances: in terms of correct classification rate, 83.6% was obtained using the fuzzy logic based classifier, 99.7% using the neural network based classifier and 99.8% for the adaptive neuro-fuzzy based classifier. These results are based on signals containing at least 360 cardiac cycles. Based on the comparative analysis of the aforementioned three arrhythmia classifiers, the classifiers based on neural networks exhibit a better performance.
Abstract: The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.
Abstract: With recent trends in Big Data and advancements
in Information and Communication Technologies, the healthcare
industry is at the stage of its transition from clinician oriented to
technology oriented. Many people around the world die of cancer
because the diagnosis of disease was not done at an early stage.
Nowadays, the computational methods in the form of Machine
Learning (ML) are used to develop automated decision support
systems that can diagnose cancer with high confidence in a timely
manner. This paper aims to carry out the comparative evaluation
of a selected set of ML classifiers on two existing datasets: breast
cancer and cervical cancer. The ML classifiers compared in this study
are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest
Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and
Artificial Neural Networks (ANN). The evaluation is carried out based
on standard evaluation metrics Precision (P), Recall (R), F1-score and
Accuracy. The experimental results based on the evaluation metrics
show that ANN showed the highest-level accuracy (99.4%) when
tested with breast cancer dataset. On the other hand, when these
ML classifiers are tested with the cervical cancer dataset, Ensemble
(Bagged Tree) technique gave better accuracy (93.1%) in comparison
to other classifiers.
Abstract: This paper investigates successful sub-bands of wave atom transform via classification of mammograms, when the coefficients of sub-bands are used as features. A computer-aided diagnosis system is constructed by using wave atom transform, support vector machine and k-nearest neighbor classifiers. Two-class classification is studied in detail using two data sets, separately. The successful sub-bands are determined according to the accuracy rates, coefficient numbers, and sensitivity rates.
Abstract: The road environment information is needed accurately for applications such as road maintenance and virtual 3D city modeling. Mobile laser scanning (MLS) produces dense point clouds from huge areas efficiently from which the road and its environment can be modeled in detail. Objects such as buildings, cars and trees are an important part of road environments. Different methods have been developed for detection of above such objects, but still there is a lack of accuracy due to the problems of illumination, environmental changes, and multiple objects with same features. In this work the comparison between different classifiers such as Multiclass SVM, kNN and Multiclass LDA for the road environment detection is analyzed. Finally the classification accuracy for kNN with LBP feature improved the classification accuracy as 93.3% than the other classifiers.
Abstract: Sentiment analysis and opinion mining have become
emerging topics of research in recent years but most of the work
is focused on data in the English language. A comprehensive
research and analysis are essential which considers multiple
languages, machine translation techniques, and different classifiers.
This paper presents, a comparative analysis of different approaches
for multilingual sentiment analysis. These approaches are divided
into two parts: one using classification of text without language
translation and second using the translation of testing data to a
target language, such as English, before classification. The presented
research and results are useful for understanding whether machine
translation should be used for multilingual sentiment analysis or
building language specific sentiment classification systems is a better
approach. The effects of language translation techniques, features,
and accuracy of various classifiers for multilingual sentiment analysis
is also discussed in this study.
Abstract: Abstract—Attribute or feature selection is one of the basic
strategies to improve the performances of data classification tasks,
and, at the same time, to reduce the complexity of classifiers,
and it is a particularly fundamental one when the number
of attributes is relatively high. Its application to unsupervised
classification is restricted to a limited number of experiments in
the literature. Evolutionary computation has already proven itself
to be a very effective choice to consistently reduce the number
of attributes towards a better classification rate and a simpler
semantic interpretation of the inferred classifiers. We present a feature
selection wrapper model composed by a multi-objective evolutionary
algorithm, the clustering method Expectation-Maximization (EM),
and the classifier C4.5 for the unsupervised classification of data
extracted from a psychological test named BASC-II (Behavior
Assessment System for Children - II ed.) with two objectives:
Maximizing the likelihood of the clustering model and maximizing
the accuracy of the obtained classifier. We present a methodology
to integrate feature selection for unsupervised classification, model
evaluation, decision making (to choose the most satisfactory model
according to a a posteriori process in a multi-objective context), and
testing. We compare the performance of the classifier obtained by the
multi-objective evolutionary algorithms ENORA and NSGA-II, and
the best solution is then validated by the psychologists that collected
the data.
Abstract: One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.
Abstract: This paper presents a classifier ensemble approach for
predicting the survivability of the breast cancer patients using the
latest database version of the Surveillance, Epidemiology, and End
Results (SEER) Program of the National Cancer Institute. The system
consists of two main components; features selection and classifier
ensemble components. The features selection component divides the
features in SEER database into four groups. After that it tries to find
the most important features among the four groups that maximizes the
weighted average F-score of a certain classification algorithm. The
ensemble component uses three different classifiers, each of which
models different set of features from SEER through the features
selection module. On top of them, another classifier is used to give
the final decision based on the output decisions and confidence
scores from each of the underlying classifiers. Different classification
algorithms have been examined; the best setup found is by using the
decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the
underlying classifiers and Na¨ıve Bayes for the classifier ensemble
step. The system outperforms all published systems to date when
evaluated against the exact same data of SEER (period of 1973-2002).
It gives 87.39% weighted average F-score compared to 85.82% and
81.34% of the other published systems. By increasing the data size to
cover the whole database (period of 1973-2014), the overall weighted
average F-score jumps to 92.4% on the held out unseen test set.
Abstract: We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.
Abstract: Texture is an important characteristic in real and
synthetic scenes. Texture analysis plays a critical role in inspecting
surfaces and provides important techniques in a variety of
applications. Although several descriptors have been presented to
extract texture features, the development of object recognition is still a
difficult task due to the complex aspects of texture. Recently, many
robust and scaling-invariant image features such as SIFT, SURF and
ORB have been successfully used in image retrieval and object
recognition. In this paper, we have tried to compare the performance
for texture classification using these feature descriptors with k-means
clustering. Different classifiers including K-NN, Naive Bayes, Back
Propagation Neural Network , Decision Tree and Kstar were applied in
three texture image sets - UIUCTex, KTH-TIPS and Brodatz,
respectively. Experimental results reveal SIFTS as the best average
accuracy rate holder in UIUCTex, KTH-TIPS and SURF is
advantaged in Brodatz texture set. BP neuro network works best in the
test set classification among all used classifiers.
Abstract: As smartphones are equipped with various sensors,
there have been many studies focused on using these sensors to create
valuable applications. Human activity recognition is one such
application motivated by various welfare applications, such as the
support for the elderly, measurement of calorie consumption, lifestyle
and exercise patterns analyses, and so on. One of the challenges one
faces when using smartphone sensors for activity recognition is that
the number of sensors should be minimized to save battery power. In
this paper, we show that a fairly accurate classifier can be built that
can distinguish ten different activities by using only a single sensor
data, i.e., the smartphone accelerometer data. The approach that we
adopt to deal with this twelve-class problem uses various methods.
The features used for classifying these activities include not only the
magnitude of acceleration vector at each time point, but also the
maximum, the minimum, and the standard deviation of vector
magnitude within a time window. The experiments compared the
performance of four kinds of basic multi-class classifiers and the
performance of four kinds of ensemble learning methods based on
three kinds of basic multi-class classifiers. The results show that
while the method with the highest accuracy is ECOC based on
Random forest.
Abstract: This paper introduces an original method for
guaranteed estimation of the accuracy for an ensemble of Lipschitz
classifiers. The solution was obtained as a finite closed set of
alternative hypotheses, which contains an object of classification with
probability of not less than the specified value. Thus, the
classification is represented by a set of hypothetical classes. In this
case, the smaller the cardinality of the discrete set of hypothetical
classes is, the higher is the classification accuracy. Experiments have
shown that if cardinality of the classifiers ensemble is increased then
the cardinality of this set of hypothetical classes is reduced. The
problem of the guaranteed estimation of the accuracy for an ensemble
of Lipschitz classifiers is relevant in multichannel classification of
target events in C-OTDR monitoring systems. Results of suggested
approach practical usage to accuracy control in C-OTDR monitoring
systems are present.
Abstract: This paper introduces an original method of
parametric optimization of the structure for multimodal decisionlevel
fusion scheme which combines the results of the partial solution
of the classification task obtained from assembly of the mono-modal
classifiers. As a result, a multimodal fusion classifier which has the
minimum value of the total error rate has been obtained.
Abstract: Neurons in the nervous system communicate with
each other by producing electrical signals called spikes. To
investigate the physiological function of nervous system it is essential
to study the activity of neurons by detecting and sorting spikes in the
recorded signal. In this paper a method is proposed for considering
the spike sorting problem which is based on the nonlinear modeling
of spikes using exponential autoregressive model. The genetic
algorithm is utilized for model parameter estimation. In this regard
some selected model coefficients are used as features for sorting
purposes. For optimal selection of model coefficients, self-organizing
feature map is used. The results show that modeling of spikes with
nonlinear autoregressive model outperforms its linear counterpart.
Also the extracted features based on the coefficients of exponential
autoregressive model are better than wavelet based extracted features
and get more compact and well-separated clusters. In the case of
spikes different in small-scale structures where principal component
analysis fails to get separated clouds in the feature space, the
proposed method can obtain well-separated cluster which removes
the necessity of applying complex classifiers.
Abstract: Different strategies and tools are available at the oil
and gas industry for detecting and analyzing tension and possible
fractures in borehole walls. Most of these techniques are based on
manual observation of the captured borehole images. While this
strategy may be possible and convenient with small images and few
data, it may become difficult and suitable to errors when big
databases of images must be treated. While the patterns may differ
among the image area, depending on many characteristics (drilling
strategy, rock components, rock strength, etc.). In this work we
propose the inclusion of data-mining classification strategies in order
to create a knowledge database of the segmented curves. These
classifiers allow that, after some time using and manually pointing
parts of borehole images that correspond to tension regions and
breakout areas, the system will indicate and suggest automatically
new candidate regions, with higher accuracy. We suggest the use of
different classifiers methods, in order to achieve different knowledge
dataset configurations.
Abstract: A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.