Abstract: Due to the rapid increase of Internet, web opinion
sources dynamically emerge which is useful for both potential
customers and product manufacturers for prediction and decision
purposes. These are the user generated contents written in natural
languages and are unstructured-free-texts scheme. Therefore, opinion
mining techniques become popular to automatically process customer
reviews for extracting product features and user opinions expressed
over them. Since customer reviews may contain both opinionated and
factual sentences, a supervised machine learning technique applies
for subjectivity classification to improve the mining performance. In
this paper, we dedicate our work is the task of opinion
summarization. Therefore, product feature and opinion extraction is
critical to opinion summarization, because its effectiveness
significantly affects the identification of semantic relationships. The
polarity and numeric score of all the features are determined by
Senti-WordNet Lexicon. The problem of opinion summarization
refers how to relate the opinion words with respect to a certain
feature. Probabilistic based model of supervised learning will
improve the result that is more flexible and effective.
Abstract: Artificial Neural Network (ANN) can be trained using
back propagation (BP). It is the most widely used algorithm for
supervised learning with multi-layered feed-forward networks.
Efficient learning by the BP algorithm is required for many practical
applications. The BP algorithm calculates the weight changes of
artificial neural networks, and a common approach is to use a twoterm
algorithm consisting of a learning rate (LR) and a momentum
factor (MF). The major drawbacks of the two-term BP learning
algorithm are the problems of local minima and slow convergence
speeds, which limit the scope for real-time applications. Recently the
addition of an extra term, called a proportional factor (PF), to the
two-term BP algorithm was proposed. The third increases the speed
of the BP algorithm. However, the PF term also reduces the
convergence of the BP algorithm, and criteria for evaluating
convergence are required to facilitate the application of the three
terms BP algorithm. Although these two seem to be closely related,
as described later, we summarize various improvements to overcome
the drawbacks. Here we compare the different methods of
convergence of the new three-term BP algorithm.
Abstract: This paper aims at finding a suitable neural network
for monitoring congestion level in electrical power systems. In this
paper, the input data has been framed properly to meet the target
objective through supervised learning mechanism by defining normal
and abnormal operating conditions for the system under study. The
congestion level, expressed as line congestion index (LCI), is
evaluated for each operating condition and is presented to the NN
along with the bus voltages to represent the input and target data.
Once, the training goes successful, the NN learns how to deal with a
set of newly presented data through validation and testing
mechanism. The crux of the results presented in this paper rests on
performance comparison of a multi-layered feed forward neural
network with eleven types of back propagation techniques so as to
evolve the best training criteria. The proposed methodology has been
tested on the standard IEEE-14 bus test system with the support of
MATLAB based NN toolbox. The results presented in this paper
signify that the Levenberg-Marquardt backpropagation algorithm
gives best training performance of all the eleven cases considered in
this paper, thus validating the proposed methodology.
Abstract: An extensive amount of work has been done in data
clustering research under the unsupervised learning technique in Data
Mining during the past two decades. Moreover, several approaches
and methods have been emerged focusing on clustering diverse data
types, features of cluster models and similarity rates of clusters.
However, none of the single clustering algorithm exemplifies its best
nature in extracting efficient clusters. Consequently, in order to
rectify this issue, a new challenging technique called Cluster
Ensemble method was bloomed. This new approach tends to be the
alternative method for the cluster analysis problem. The main
objective of the Cluster Ensemble is to aggregate the diverse
clustering solutions in such a way to attain accuracy and also to
improve the eminence the individual clustering algorithms. Due to
the massive and rapid development of new methods in the globe of
data mining, it is highly mandatory to scrutinize a vital analysis of
existing techniques and the future novelty. This paper shows the
comparative analysis of different cluster ensemble methods along
with their methodologies and salient features. Henceforth this
unambiguous analysis will be very useful for the society of clustering
experts and also helps in deciding the most appropriate one to resolve
the problem in hand.
Abstract: Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.
Abstract: As internet continues to expand its usage with an
enormous number of applications, cyber-threats have significantly
increased accordingly. Thus, accurate detection of malicious traffic in
a timely manner is a critical concern in today’s Internet for security.
One approach for intrusion detection is to use Machine Learning (ML)
techniques. Several methods based on ML algorithms have been
introduced over the past years, but they are largely limited in terms of
detection accuracy and/or time and space complexity to run. In this
work, we present a novel method for intrusion detection that
incorporates a set of supervised learning algorithms. The proposed
technique provides high accuracy and outperforms existing techniques
that simply utilizes a single learning method. In addition, our
technique relies on partial flow information (rather than full
information) for detection, and thus, it is light-weight and desirable for
online operations with the property of early identification. With the
mid-Atlantic CCDC intrusion dataset publicly available, we show that
our proposed technique yields a high degree of detection rate over 99%
with a very low false alarm rate (0.4%).
Abstract: Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Abstract: Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.
Abstract: This paper presents the development of recurrent neural network based fuzzy inference system for identification and control of dynamic nonlinear plant. The structure and algorithms of fuzzy system based on recurrent neural network are described. To train unknown parameters of the system the supervised learning algorithm is used. As a result of learning, the rules of neuro-fuzzy system are formed. The neuro-fuzzy system is used for the identification and control of nonlinear dynamic plant. The simulation results of identification and control systems based on recurrent neuro-fuzzy network are compared with the simulation results of other neural systems. It is found that the recurrent neuro-fuzzy based system has better performance than the others.
Abstract: Identification of cancer genes that might anticipate
the clinical behaviors from different types of cancer disease is
challenging due to the huge number of genes and small number of
patients samples. The new method is being proposed based on
supervised learning of classification like support vector machines
(SVMs).A new solution is described by the introduction of the
Maximized Margin (MM) in the subset criterion, which permits to
get near the least generalization error rate. In class prediction
problem, gene selection is essential to improve the accuracy and to
identify genes for cancer disease. The performance of the new
method was evaluated with real-world data experiment. It can give
the better accuracy for classification.
Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.
Abstract: Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.
Abstract: An evolutionary computing technique for solving initial value problems in Ordinary Differential Equations is proposed in this paper. Neural network is used as a universal approximator while the adaptive parameters of neural networks are optimized by genetic algorithm. The solution is achieved on the continuous grid of time instead of discrete as in other numerical techniques. The comparison is carried out with classical numerical techniques and the solution is found with a uniform accuracy of MSE ≈ 10-9 .
Abstract: Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Abstract: In this paper, a comparative study of application of
supervised and unsupervised learning algorithms on illumination
invariant face recognition has been carried out. The supervised
learning has been carried out with the help of using a bi-layered
artificial neural network having one input, two hidden and one output
layer. The gradient descent with momentum and adaptive learning
rate back propagation learning algorithm has been used to implement
the supervised learning in a way that both the inputs and
corresponding outputs are provided at the time of training the
network, thus here is an inherent clustering and optimized learning of
weights which provide us with efficient results.. The unsupervised
learning has been implemented with the help of a modified
Counterpropagation network. The Counterpropagation network
involves the process of clustering followed by application of Outstar
rule to obtain the recognized face. The face recognition system has
been developed for recognizing faces which have varying
illumination intensities, where the database images vary in lighting
with respect to angle of illumination with horizontal and vertical
planes. The supervised and unsupervised learning algorithms have
been implemented and have been tested exhaustively, with and
without application of histogram equalization to get efficient results.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Abstract: Nonlinear system identification is becoming an important tool which can be used to improve control performance. This paper describes the application of adaptive neuro-fuzzy inference system (ANFIS) model for controlling a car. The vehicle must follow a predefined path by supervised learning. Backpropagation gradient descent method was performed to train the ANFIS system. The performance of the ANFIS model was evaluated in terms of training performance and classification accuracies and the results confirmed that the proposed ANFIS model has potential in controlling the non linear system.
Abstract: In this paper, a model of self-organizing spiking neural networks is introduced and applied to mobile robot environment representation and path planning problem. A network of spike-response-model neurons with a recurrent architecture is used to create robot-s internal representation from surrounding environment. The overall activity of network simulates a self-organizing system with unsupervised learning. A modified A* algorithm is used to find the best path using this internal representation between starting and goal points. This method can be used with good performance for both known and unknown environments.