Abstract: The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.
Abstract: The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.
Abstract: This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.
Abstract: Eyes are considered to be the most sensitive and
important organ for human being. Thus, any eye disorder will affect
the patient in all aspects of life. Cataract is one of those eye disorders
that lead to blindness if not treated correctly and quickly. This paper
demonstrates a model for automatic detection, classification, and
grading of cataracts based on image processing techniques and
artificial intelligence. The proposed system is developed to ease the
cataract diagnosis process for both ophthalmologists and patients.
The wavelet transform combined with 2D Log Gabor Wavelet
transform was used as feature extraction techniques for a dataset of
120 eye images followed by a classification process that classified the
image set into three classes; normal, early, and advanced stage. A
comparison between the two used classifiers, the support vector
machine SVM and the artificial neural network ANN were done for
the same dataset of 120 eye images. It was concluded that SVM gave
better results than ANN. SVM success rate result was 96.8%
accuracy where ANN success rate result was 92.3% accuracy.
Abstract: With the widespread adoption of the Internet-connected
devices, and with the prevalence of the Internet of Things (IoT)
applications, there is an increased interest in machine learning
techniques that can provide useful and interesting services in the
smart home domain. The areas that machine learning techniques
can help advance are varied and ever-evolving. Classifying smart
home inhabitants’ Activities of Daily Living (ADLs), is one
prominent example. The ability of machine learning technique to find
meaningful spatio-temporal relations of high-dimensional data is an
important requirement as well. This paper presents a comparative
evaluation of state-of-the-art machine learning techniques to classify
ADLs in the smart home domain. Forty-two synthetic datasets and
two real-world datasets with multiple inhabitants are used to evaluate
and compare the performance of the identified machine learning
techniques. Our results show significant performance differences
between the evaluated techniques. Such as AdaBoost, Cortical
Learning Algorithm (CLA), Decision Trees, Hidden Markov Model
(HMM), Multi-layer Perceptron (MLP), Structured Perceptron and
Support Vector Machines (SVM). Overall, neural network based
techniques have shown superiority over the other tested techniques.
Abstract: Osteoporosis is a common disease characterized by low bone mass and deterioration of micro-architectural bone tissue, which provokes an increased risk of fracture. This work treats the texture characterization of trabecular bone radiographs. The aim was to analyze according to clinical research a group of 174 subjects: 87 osteoporotic patients (OP) with various bone fracture types and 87 control cases (CC). To characterize osteoporosis, Fractal and MultiFractal (MF) methods were applied to images for features (attributes) extraction. In order to improve the results, a new method of MF spectrum based on the q-stucture function calculation was proposed and a combination of Fractal and MF attributes was used. The Support Vector Machines (SVM) was applied as a classifier to distinguish between OP patients and CC subjects. The features fusion (fractal and MF) allowed a good discrimination between the two groups with an accuracy rate of 96.22%.
Abstract: This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.
Abstract: Useful information has been extracted from the
road accident data in United Kingdom (UK), using data analytics
method, for avoiding possible accidents in rural and urban areas.
This analysis make use of several methodologies such as data
integration, support vector machines (SVM), correlation machines
and multinomial goodness. The entire datasets have been imported
from the traffic department of UK with due permission. The
information extracted from these huge datasets forms a basis for
several predictions, which in turn avoid unnecessary memory
lapses. Since data is expected to grow continuously over a period
of time, this work primarily proposes a new framework model
which can be trained and adapt itself to new data and make
accurate predictions. This work also throws some light on use of
SVM’s methodology for text classifiers from the obtained traffic
data. Finally, it emphasizes the uniqueness and adaptability of
SVMs methodology appropriate for this kind of research work.
Abstract: In this study, our goal was to perform tumor staging and subtype determination automatically using different texture analysis approaches for a very common cancer type, i.e., non-small cell lung carcinoma (NSCLC). Especially, we introduced a texture analysis approach, called Law’s texture filter, to be used in this context for the first time. The 18F-FDG PET images of 42 patients with NSCLC were evaluated. The number of patients for each tumor stage, i.e., I-II, III or IV, was 14. The patients had ~45% adenocarcinoma (ADC) and ~55% squamous cell carcinoma (SqCCs). MATLAB technical computing language was employed in the extraction of 51 features by using first order statistics (FOS), gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and Laws’ texture filters. The feature selection method employed was the sequential forward selection (SFS). Selected textural features were used in the automatic classification by k-nearest neighbors (k-NN) and support vector machines (SVM). In the automatic classification of tumor stage, the accuracy was approximately 59.5% with k-NN classifier (k=3) and 69% with SVM (with one versus one paradigm), using 5 features. In the automatic classification of tumor subtype, the accuracy was around 92.7% with SVM one vs. one. Texture analysis of FDG-PET images might be used, in addition to metabolic parameters as an objective tool to assess tumor histopathological characteristics and in automatic classification of tumor stage and subtype.
Abstract: Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.
Abstract: Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.
Abstract: Corporate credit rating prediction is one of the most important topics, which has been studied by researchers in the last decade. Over the last decade, researchers are pushing the limit to enhance the exactness of the corporate credit rating prediction model by applying several data-driven tools including statistical and artificial intelligence methods. Among them, multiclass support vector machine (MSVM) has been widely applied due to its good predictability. However, heuristics, for example, parameters of a kernel function, appropriate feature and instance subset, has become the main reason for the critics on MSVM, as they have dictate the MSVM architectural variables. This study presents a hybrid MSVM model that is intended to optimize all the parameter such as feature selection, instance selection, and kernel parameter. Our model adopts genetic algorithm (GA) to simultaneously optimize multiple heterogeneous design factors of MSVM.
Abstract: In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
Abstract: In this paper, an approach for the liver tumor detection
in computed tomography (CT) images is represented. The detection
process is based on classifying the features of target liver cell to
either tumor or non-tumor. Fractional differential (FD) is applied for
enhancement of Liver CT images, with the aim of enhancing texture
and edge features. Later on, a fusion method is applied to merge
between the various enhanced images and produce a variety of
feature improvement, which will increase the accuracy of
classification. Each image is divided into NxN non-overlapping
blocks, to extract the desired features. Support vector machines
(SVM) classifier is trained later on a supplied dataset different from
the tested one. Finally, the block cells are identified whether they are
classified as tumor or not. Our approach is validated on a group of
patients’ CT liver tumor datasets. The experiment results
demonstrated the efficiency of detection in the proposed technique.
Abstract: Advances in spatial and spectral resolution of satellite
images have led to tremendous growth in large image databases. The
data we acquire through satellites, radars, and sensors consists of
important geographical information that can be used for remote
sensing applications such as region planning, disaster management.
Spatial data classification and object recognition are important tasks
for many applications. However, classifying objects and identifying
them manually from images is a difficult task. Object recognition is
often considered as a classification problem, this task can be
performed using machine-learning techniques. Despite of many
machine-learning algorithms, the classification is done using
supervised classifiers such as Support Vector Machines (SVM) as the
area of interest is known. We proposed a classification method,
which considers neighboring pixels in a region for feature extraction
and it evaluates classifications precisely according to neighboring
classes for semantic interpretation of region of interest (ROI). A
dataset has been created for training and testing purpose; we
generated the attributes by considering pixel intensity values and
mean values of reflectance. We demonstrated the benefits of using
knowledge discovery and data-mining techniques, which can be on
image data for accurate information extraction and classification from
high spatial resolution remote sensing imagery.
Abstract: In this paper, we present a comparative study of three
methods of 2D face recognition system such as: Iso-Geodesic Curves
(IGC), Geodesic Distance (GD) and Geodesic-Intensity Histogram
(GIH). These approaches are based on computing of geodesic
distance between points of facial surface and between facial curves.
In this study we represented the image at gray level as a 2D surface in
a 3D space, with the third coordinate proportional to the intensity
values of pixels. In the classifying step, we use: Neural Networks
(NN), K-Nearest Neighbor (KNN) and Support Vector Machines
(SVM). The images used in our experiments are from two wellknown
databases of face images ORL and YaleB. ORL data base was
used to evaluate the performance of methods under conditions where
the pose and sample size are varied, and the database YaleB was used
to examine the performance of the systems when the facial
expressions and lighting are varied.
Abstract: The present paper attempts to investigate the
prediction of air entrainment rate and aeration efficiency of a free
overfall jets issuing from a triangular sharp crested weir by using
regression based modelling. The empirical equations, Support vector
machine (polynomial and radial basis function) models and the linear
regression techniques were applied on the triangular sharp crested
weirs relating the air entrainment rate and the aeration efficiency to
the input parameters namely drop height, discharge, and vertex angle.
It was observed that there exists a good agreement between the
measured values and the values obtained using empirical equations,
Support vector machine (Polynomial and rbf) models and the linear
regression techniques. The test results demonstrated that the SVM
based (Poly & rbf) model also provided acceptable prediction of the
measured values with reasonable accuracy along with empirical
equations and linear regression techniques in modelling the air
entrainment rate and the aeration efficiency of a free overfall jets
issuing from triangular sharp crested weir. Further sensitivity analysis
has also been performed to study the impact of input parameter on the
output in terms of air entrainment rate and aeration efficiency.
Abstract: The 3D body movement signals captured during
human-human conversation include clues not only to the content of
people’s communication but also to their culture and personality.
This paper is concerned with automatic extraction of this information
from body movement signals. For the purpose of this research, we
collected a novel corpus from 27 subjects, arranged them into groups
according to their culture. We arranged each group into pairs and
each pair communicated with each other about different topics.
A state-of-art recognition system is applied to the problems of
person, culture, and topic recognition. We borrowed modeling,
classification, and normalization techniques from speech recognition.
We used Gaussian Mixture Modeling (GMM) as the main technique
for building our three systems, obtaining 77.78%, 55.47%, and
39.06% from the person, culture, and topic recognition systems
respectively. In addition, we combined the above GMM systems with
Support Vector Machines (SVM) to obtain 85.42%, 62.50%, and
40.63% accuracy for person, culture, and topic recognition
respectively.
Although direct comparison among these three recognition
systems is difficult, it seems that our person recognition system
performs best for both GMM and GMM-SVM, suggesting that intersubject
differences (i.e. subject’s personality traits) are a major
source of variation. When removing these traits from culture and
topic recognition systems using the Nuisance Attribute Projection
(NAP) and the Intersession Variability Compensation (ISVC)
techniques, we obtained 73.44% and 46.09% accuracy from culture
and topic recognition systems respectively.
Abstract: In this study, data loss tolerance of Support Vector Machines (SVM) based activity recognition model and multi activity classification performance when data are received over a lossy wireless sensor network is examined. Initially, the classification algorithm we use is evaluated in terms of resilience to random data loss with 3D acceleration sensor data for sitting, lying, walking and standing actions. The results show that the proposed classification method can recognize these activities successfully despite high data loss. Secondly, the effect of differentiated quality of service performance on activity recognition success is measured with activity data acquired from a multi hop wireless sensor network, which introduces high data loss. The effect of number of nodes on the reliability and multi activity classification success is demonstrated in simulation environment. To the best of our knowledge, the effect of data loss in a wireless sensor network on activity detection success rate of an SVM based classification algorithm has not been studied before.
Abstract: ‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.