Abstract: In the past few years, the amount of malicious software
increased exponentially and, therefore, machine learning algorithms
became instrumental in identifying clean and malware files through
(semi)-automated classification. When working with very large
datasets, the major challenge is to reach both a very high malware
detection rate and a very low false positive rate. Another challenge
is to minimize the time needed for the machine learning algorithm to
do so. This paper presents a comparative study between different
machine learning techniques such as linear classifiers, ensembles,
decision trees or various hybrids thereof. The training dataset consists
of approximately 2 million clean files and 200.000 infected files,
which is a realistic quantitative mixture. The paper investigates the
above mentioned methods with respect to both their performance
(detection rate and false positive rate) and their practicability.
Abstract: Customer churn prediction is one of the most useful
areas of study in customer analytics. Due to the enormous amount
of data available for such predictions, machine learning and data
mining have been heavily used in this domain. There exist many
machine learning algorithms directly applicable for the problem of
customer churn prediction, and here, we attempt to experiment on
a novel approach by using a cognitive learning based technique in
an attempt to improve the results obtained by using a combination
of supervised learning methods, with cognitive unsupervised learning
methods.
Abstract: In this paper, we used data mining to extract
biomedical knowledge. In general, complex biomedical data
collected in studies of populations are treated by statistical methods,
although they are robust, they are not sufficient in themselves to
harness the potential wealth of data. For that you used in step two
learning algorithms: the Decision Trees and Support Vector Machine
(SVM). These supervised classification methods are used to make the
diagnosis of thyroid disease. In this context, we propose to promote
the study and use of symbolic data mining techniques.
Abstract: A Distributed Denial of Service (DDoS) attack is a
major threat to cyber security. It originates from the network layer or
the application layer of compromised/attacker systems which are
connected to the network. The impact of this attack ranges from the
simple inconvenience to use a particular service to causing major
failures at the targeted server. When there is heavy traffic flow to a
target server, it is necessary to classify the legitimate access and
attacks. In this paper, a novel method is proposed to detect DDoS
attacks from the traces of traffic flow. An access matrix is created
from the traces. As the access matrix is multi dimensional, Principle
Component Analysis (PCA) is used to reduce the attributes used for
detection. Two classifiers Naive Bayes and K-Nearest neighborhood
are used to classify the traffic as normal or abnormal. The
performance of the classifier with PCA selected attributes and actual
attributes of access matrix is compared by the detection rate and
False Positive Rate (FPR).
Abstract: Red blood cells (RBC) are the most common types of
blood cells and are the most intensively studied in cell biology. The
lack of RBCs is a condition in which the amount of hemoglobin level
is lower than normal and is referred to as “anemia”. Abnormalities in
RBCs will affect the exchange of oxygen. This paper presents a
comparative study for various techniques for classifying the RBCs as
normal or abnormal (anemic) using WEKA. WEKA is an open
source consists of different machine learning algorithms for data
mining applications. The algorithms tested are Radial Basis Function
neural network, Support vector machine, and K-Nearest Neighbors
algorithm. Two sets of combined features were utilized for
classification of blood cells images. The first set, exclusively consist
of geometrical features, was used to identify whether the tested blood
cell has a spherical shape or non-spherical cells. While the second
set, consist mainly of textural features was used to recognize the
types of the spherical cells. We have provided an evaluation based on
applying these classification methods to our RBCs image dataset
which were obtained from Serdang Hospital - Malaysia, and
measuring the accuracy of test results. The best achieved
classification rates are 97%, 98%, and 79% for Support vector
machines, Radial Basis Function neural network, and K-Nearest
Neighbors algorithm respectively.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.
Abstract: As internet continues to expand its usage with an
enormous number of applications, cyber-threats have significantly
increased accordingly. Thus, accurate detection of malicious traffic in
a timely manner is a critical concern in today’s Internet for security.
One approach for intrusion detection is to use Machine Learning (ML)
techniques. Several methods based on ML algorithms have been
introduced over the past years, but they are largely limited in terms of
detection accuracy and/or time and space complexity to run. In this
work, we present a novel method for intrusion detection that
incorporates a set of supervised learning algorithms. The proposed
technique provides high accuracy and outperforms existing techniques
that simply utilizes a single learning method. In addition, our
technique relies on partial flow information (rather than full
information) for detection, and thus, it is light-weight and desirable for
online operations with the property of early identification. With the
mid-Atlantic CCDC intrusion dataset publicly available, we show that
our proposed technique yields a high degree of detection rate over 99%
with a very low false alarm rate (0.4%).
Abstract: Recent experimental evidences have shown that because
of a fast convergence and a nice accuracy, neural networks training
via extended kalman filter (EKF) method is widely applied. However,
as to an uncertainty of the system dynamics or modeling error, the
performance of the method is unreliable. In order to overcome this
problem in this paper, a new finite impulse response (FIR) filter based
learning algorithm is proposed to train radial basis function neural
networks (RBFN) for nonlinear function approximation. Compared
to the EKF training method, the proposed FIR filter training method
is more robust to those environmental conditions. Furthermore , the
number of centers will be considered since it affects the performance
of approximation.
Abstract: This paper presents content-based image retrieval (CBIR) frameworks with relevance feedback (RF) based on combined learning of support vector machines (SVM) and AdaBoosts. The framework incorporates only most relevant images obtained from both the learning algorithm. To speed up the system, it removes irrelevant images from the database, which are returned from SVM learner. It is the key to achieve the effective retrieval performance in terms of time and accuracy. The experimental results show that this framework had significant improvement in retrieval effectiveness, which can finally improve the retrieval performance.
Abstract: A gradient learning method to regulate the trajectories
of some nonlinear chaotic systems is proposed. The method is
motivated by the gradient descent learning algorithms for neural
networks. It is based on two systems: dynamic optimization system
and system for finding sensitivities. Numerical results of several
examples are presented, which convincingly illustrate the efficiency
of the method.
Abstract: Reliable water level forecasts are particularly
important for warning against dangerous flood and inundation. The
current study aims at investigating the suitability of the adaptive
network based fuzzy inference system for continuous water level
modeling. A hybrid learning algorithm, which combines the least
square method and the back propagation algorithm, is used to
identify the parameters of the network. For this study, water levels
data are available for a hydrological year of 2002 with a sampling
interval of 1-hour. The number of antecedent water level that should
be included in the input variables is determined by two statistical
methods, i.e. autocorrelation function and partial autocorrelation
function between the variables. Forecasting was done for 1-hour until
12-hour ahead in order to compare the models generalization at
higher horizons. The results demonstrate that the adaptive networkbased
fuzzy inference system model can be applied successfully and
provide high accuracy and reliability for river water level estimation.
In general, the adaptive network-based fuzzy inference system
provides accurate and reliable water level prediction for 1-hour ahead
where the MAPE=1.15% and correlation=0.98 was achieved. Up to
12-hour ahead prediction, the model still shows relatively good
performance where the error of prediction resulted was less than
9.65%. The information gathered from the preliminary results
provide a useful guidance or reference for flood early warning
system design in which the magnitude and the timing of a potential
extreme flood are indicated.
Abstract: The goal of a network-based intrusion detection
system is to classify activities of network traffics into two major
categories: normal and attack (intrusive) activities. Nowadays, data
mining and machine learning plays an important role in many
sciences; including intrusion detection system (IDS) using both
supervised and unsupervised techniques. However, one of the
essential steps of data mining is feature selection that helps in
improving the efficiency, performance and prediction rate of
proposed approach. This paper applies unsupervised K-means
clustering algorithm with information gain (IG) for feature selection
and reduction to build a network intrusion detection system. For our
experimental analysis, we have used the new NSL-KDD dataset,
which is a modified dataset for KDDCup 1999 intrusion detection
benchmark dataset. With a split of 60.0% for the training set and the
remainder for the testing set, a 2 class classifications have been
implemented (Normal, Attack). Weka framework which is a java
based open source software consists of a collection of machine
learning algorithms for data mining tasks has been used in the testing
process. The experimental results show that the proposed approach is
very accurate with low false positive rate and high true positive rate
and it takes less learning time in comparison with using the full
features of the dataset with the same algorithm.
Abstract: This paper presents the development of recurrent neural network based fuzzy inference system for identification and control of dynamic nonlinear plant. The structure and algorithms of fuzzy system based on recurrent neural network are described. To train unknown parameters of the system the supervised learning algorithm is used. As a result of learning, the rules of neuro-fuzzy system are formed. The neuro-fuzzy system is used for the identification and control of nonlinear dynamic plant. The simulation results of identification and control systems based on recurrent neuro-fuzzy network are compared with the simulation results of other neural systems. It is found that the recurrent neuro-fuzzy based system has better performance than the others.
Abstract: The optimal control is one of the possible controllers
for a dynamic system, having a linear quadratic regulator and using
the Pontryagin-s principle or the dynamic programming method .
Stochastic disturbances may affect the coefficients (multiplicative
disturbances) or the equations (additive disturbances), provided that
the shocks are not too great . Nevertheless, this approach encounters
difficulties when uncertainties are very important or when the probability
calculus is of no help with very imprecise data. The fuzzy
logic contributes to a pragmatic solution of such a problem since it
operates on fuzzy numbers. A fuzzy controller acts as an artificial
decision maker that operates in a closed-loop system in real time.
This contribution seeks to explore the tracking problem and control
of dynamic macroeconomic models using a fuzzy learning algorithm.
A two inputs - single output (TISO) fuzzy model is applied to the
linear fluctuation model of Phillips and to the nonlinear growth model
of Goodwin.
Abstract: In neural networks, when new patterns are learned by a network, the new information radically interferes with previously stored patterns. This drawback is called catastrophic forgetting or catastrophic interference. In this paper, we propose a biologically inspired neural network model which overcomes this problem. The proposed model consists of two distinct networks: one is a Hopfield type of chaotic associative memory and the other is a multilayer neural network. We consider that these networks correspond to the hippocampus and the neocortex of the brain, respectively. Information given is firstly stored in the hippocampal network with fast learning algorithm. Then the stored information is recalled by chaotic behavior of each neuron in the hippocampal network. Finally, it is consolidated in the neocortical network by using pseudopatterns. Computer simulation results show that the proposed model has much better ability to avoid catastrophic forgetting in comparison with conventional models.
Abstract: A direct adaptive controller for a class of unknown nonlinear discrete-time systems is presented in this article. The proposed controller is constructed by fuzzy rules emulated network (FREN). With its simple structure, the human knowledge about the plant is transferred to be if-then rules for setting the network. These adjustable parameters inside FREN are tuned by the learning mechanism with time varying step size or learning rate. The variation of learning rate is introduced by main theorem to improve the system performance and stabilization. Furthermore, the boundary of adjustable parameters is guaranteed through the on-line learning and membership functions properties. The validation of the theoretical findings is represented by some illustrated examples.
Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.
Abstract: Modeling of complex dynamic systems, which are
very complicated to establish mathematical models, requires new and
modern methodologies that will exploit the existing expert
knowledge, human experience and historical data. Fuzzy cognitive
maps are very suitable, simple, and powerful tools for simulation and
analysis of these kinds of dynamic systems. However, human experts
are subjective and can handle only relatively simple fuzzy cognitive
maps; therefore, there is a need of developing new approaches for an
automated generation of fuzzy cognitive maps using historical data.
In this study, a new learning algorithm, which is called Big Bang-Big
Crunch, is proposed for the first time in literature for an automated
generation of fuzzy cognitive maps from data. Two real-world
examples; namely a process control system and radiation therapy
process, and one synthetic model are used to emphasize the
effectiveness and usefulness of the proposed methodology.
Abstract: Mobile agents are a powerful approach to develop distributed systems since they migrate to hosts on which they have the resources to execute individual tasks. In a dynamic environment like a peer-to-peer network, Agents have to be generated frequently and dispatched to the network. Thus they will certainly consume a certain amount of bandwidth of each link in the network if there are too many agents migration through one or several links at the same time, they will introduce too much transferring overhead to the links eventually, these links will be busy and indirectly block the network traffic, therefore, there is a need of developing routing algorithms that consider about traffic load. In this paper we seek to create cooperation between a probabilistic manner according to the quality measure of the network traffic situation and the agent's migration decision making to the next hop based on decision tree learning algorithms.
Abstract: Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.