Abstract: Obtaining labeled data in supervised learning is often
difficult and expensive, and thus the trained learning algorithm tends
to be overfitting due to small number of training data. As a result,
some researchers have focused on using unlabeled data which may
not necessary to follow the same generative distribution as the labeled
data to construct a high-level feature for improving performance on
supervised learning tasks. In this paper, we investigate the impact of
the relationship between unlabeled and labeled data for classification
performance. Specifically, we will apply difference unlabeled data
which have different degrees of relation to the labeled data for
handwritten digit classification task based on MNIST dataset. Our
experimental results show that the higher the degree of relation
between unlabeled and labeled data, the better the classification
performance. Although the unlabeled data that is completely from
different generative distribution to the labeled data provides the lowest
classification performance, we still achieve high classification performance.
This leads to expanding the applicability of the supervised
learning algorithms using unsupervised learning.
Abstract: Term Extraction, a key data preparation step in Text
Mining, extracts the terms, i.e. relevant collocation of words,
attached to specific concepts (e.g. genetic-algorithms and decisiontrees
are terms associated to the concept “Machine Learning" ). In
this paper, the task of extracting interesting collocations is achieved
through a supervised learning algorithm, exploiting a few
collocations manually labelled as interesting/not interesting. From
these examples, the ROGER algorithm learns a numerical function,
inducing some ranking on the collocations. This ranking is optimized
using genetic algorithms, maximizing the trade-off between the false
positive and true positive rates (Area Under the ROC curve). This
approach uses a particular representation for the word collocations,
namely the vector of values corresponding to the standard statistical
interestingness measures attached to this collocation. As this
representation is general (over corpora and natural languages),
generality tests were performed by experimenting the ranking
function learned from an English corpus in Biology, onto a French
corpus of Curriculum Vitae, and vice versa, showing a good
robustness of the approaches compared to the state-of-the-art Support
Vector Machine (SVM).
Abstract: In this paper, we present user pattern learning
algorithm based MDSS (Medical Decision support system) under
ubiquitous. Most of researches are focus on hardware system, hospital
management and whole concept of ubiquitous environment even
though it is hard to implement. Our objective of this paper is to design
a MDSS framework. It helps to patient for medical treatment and
prevention of the high risk patient (COPD, heart disease, Diabetes).
This framework consist database, CAD (Computer Aided diagnosis
support system) and CAP (computer aided user vital sign prediction
system). It can be applied to develop user pattern learning algorithm
based MDSS for homecare and silver town service. Especially this
CAD has wise decision making competency. It compares current vital
sign with user-s normal condition pattern data. In addition, the CAP
computes user vital sign prediction using past data of the patient. The
novel approach is using neural network method, wireless vital sign
acquisition devices and personal computer DB system. An intelligent
agent based MDSS will help elder people and high risk patients to
prevent sudden death and disease, the physician to get the online
access to patients- data, the plan of medication service priority (e.g.
emergency case).
Abstract: This paper explores the effectiveness of machine
learning techniques in detecting firms that issue fraudulent financial
statements (FFS) and deals with the identification of factors
associated to FFS. To this end, a number of experiments have been
conducted using representative learning algorithms, which were
trained using a data set of 164 fraud and non-fraud Greek firms in the
recent period 2001-2002. The decision of which particular method to
choose is a complicated problem. A good alternative to choosing
only one method is to create a hybrid forecasting system
incorporating a number of possible solution methods as components
(an ensemble of classifiers). For this purpose, we have implemented
a hybrid decision support system that combines the representative
algorithms using a stacking variant methodology and achieves better
performance than any examined simple and ensemble method. To
sum up, this study indicates that the investigation of financial
information can be used in the identification of FFS and underline the
importance of financial ratios.
Abstract: Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.
Abstract: Data mining is the process of sifting through large
volumes of data, analyzing data from different perspectives and
summarizing it into useful information. One of the widely used
desktop applications for data mining is the Weka tool which is
nothing but a collection of machine learning algorithms implemented
in Java and open sourced under the General Public License (GPL). A
web service is a software system designed to support interoperable
machine to machine interaction over a network using SOAP
messages. Unlike a desktop application, a web service is easy to
upgrade, deliver and access and does not occupy any memory on the
system. Keeping in mind the advantages of a web service over a
desktop application, in this paper we are demonstrating how this Java
based desktop data mining application can be implemented as a web
service to support data mining across the internet.
Abstract: Using spatial models as a shared common basis of
information about the environment for different kinds of contextaware
systems has been a heavily researched topic in the last years.
Thereby the research focused on how to create, to update, and to
merge spatial models so as to enable highly dynamic, consistent and
coherent spatial models at large scale. In this paper however, we
want to concentrate on how context-aware applications could use this
information so as to adapt their behavior according to the situation
they are in. The main idea is to provide the spatial model
infrastructure with a situation recognition component based on
generic situation templates. A situation template is – as part of a
much larger situation template library – an abstract, machinereadable
description of a certain basic situation type, which could be
used by different applications to evaluate their situation. In this
paper, different theoretical and practical issues – technical, ethical
and philosophical ones – are discussed important for understanding
and developing situation dependent systems based on situation
templates. A basic system design is presented which allows for the
reasoning with uncertain data using an improved version of a
learning algorithm for the automatic adaption of situation templates.
Finally, for supporting the development of adaptive applications, we
present a new situation-aware adaptation concept based on
workflows.
Abstract: Bagging and boosting are among the most popular re-sampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noise-free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using an averaging methodology of bagging and boosting ensembles with 10 sub-learners in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-learners on standard benchmark datasets and the proposed ensemble gave better accuracy.
Abstract: A cognitive collaborative reinforcement learning
algorithm (CCRL) that incorporates an advisor into the learning
process is developed to improve supervised learning. An autonomous
learner is enabled with a self awareness cognitive skill to decide
when to solicit instructions from the advisor. The learner can also
assess the value of advice, and accept or reject it. The method is
evaluated for robotic motion planning using simulation. Tests are
conducted for advisors with skill levels from expert to novice. The
CCRL algorithm and a combined method integrating its logic with
Clouse-s Introspection Approach, outperformed a base-line fully
autonomous learner, and demonstrated robust performance when
dealing with various advisor skill levels, learning to accept advice
received from an expert, while rejecting that of less skilled
collaborators. Although the CCRL algorithm is based on RL, it fits
other machine learning methods, since advisor-s actions are only
added to the outer layer.
Abstract: On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Abstract: In this paper, we study the cooperative communications where multiple cognitive radio (CR) transmit-receive pairs competitive maximize their own throughputs. In CR networks, the influences of primary users and the spectrum availability are usually different among CR users. Due to the existence of multiple relay nodes and the different spectrum availability, each CR transmit-receive pair should not only select the relay node but also choose the appropriate channel. For this distributed problem, we propose a game theoretic framework to formulate this problem and we apply a regret-matching learning algorithm which is leading to correlated equilibrium. We further formulate a modified regret-matching learning algorithm which is fully distributed and only use the local information of each CR transmit-receive pair. This modified algorithm is more practical and suitable for the cooperative communications in CR network. Simulation results show the algorithm convergence and the modified learning algorithm can achieve comparable performance to the original regretmatching learning algorithm.
Abstract: Support vector machines (SVMs) are considered to be
the best machine learning algorithms for minimizing the predictive
probability of misclassification. However, their drawback is that for
large data sets the computation of the optimal decision boundary is a
time consuming function of the size of the training set. Hence several
methods have been proposed to speed up the SVM algorithm. Here
three methods used to speed up the computation of the SVM
classifiers are compared experimentally using a musical genre
classification problem. The simplest method pre-selects a random
sample of the data before the application of the SVM algorithm. Two
additional methods use proximity graphs to pre-select data that are
near the decision boundary. One uses k-Nearest Neighbor graphs and
the other Relative Neighborhood Graphs to accomplish the task.
Abstract: Heterogeneity of solid waste characteristics as well as the complex processes taking place within the landfill ecosystem motivated the implementation of soft computing methodologies such as artificial neural networks (ANN), fuzzy logic (FL), and their combination. The present work uses a hybrid ANN-FL model that employs knowledge-based FL to describe the process qualitatively and implements the learning algorithm of ANN to optimize model parameters. The model was developed to simulate and predict the landfill gas production at a given time based on operational parameters. The experimental data used were compiled from lab-scale experiment that involved various operating scenarios. The developed model was validated and statistically analyzed using F-test, linear regression between actual and predicted data, and mean squared error measures. Overall, the simulated landfill gas production rates demonstrated reasonable agreement with actual data. The discussion focused on the effect of the size of training datasets and number of training epochs.
Abstract: Recently, information security has become a key issue
in information technology as the number of computer security
breaches are exposed to an increasing number of security threats. A
variety of intrusion detection systems (IDS) have been employed for
protecting computers and networks from malicious network-based or
host-based attacks by using traditional statistical methods to new data
mining approaches in last decades. However, today's commercially
available intrusion detection systems are signature-based that are not
capable of detecting unknown attacks. In this paper, we present a
new learning algorithm for anomaly based network intrusion
detection system using decision tree algorithm that distinguishes
attacks from normal behaviors and identifies different types of
intrusions. Experimental results on the KDD99 benchmark network
intrusion detection dataset demonstrate that the proposed learning
algorithm achieved 98% detection rate (DR) in comparison with
other existing methods.
Abstract: This paper employs a new approach to regulate the
blood glucose level of type I diabetic patient under an intensive
insulin treatment. The closed-loop control scheme incorporates
expert knowledge about treatment by using reinforcement learning
theory to maintain the normoglycemic average of 80 mg/dl and the
normal condition for free plasma insulin concentration in severe
initial state. The insulin delivery rate is obtained off-line by using Qlearning
algorithm, without requiring an explicit model of the
environment dynamics. The implementation of the insulin delivery
rate, therefore, requires simple function evaluation and minimal
online computations. Controller performance is assessed in terms of
its ability to reject the effect of meal disturbance and to overcome the
variability in the glucose-insulin dynamics from patient to patient.
Computer simulations are used to evaluate the effectiveness of the
proposed technique and to show its superiority in controlling
hyperglycemia over other existing algorithms
Abstract: Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.
Abstract: Support Vector Domain Description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. As all kernel-based learning algorithms its performance depends heavily on the proper choice of the kernel parameter. This paper proposes a new approach to select kernel's parameter based on maximizing the distance between both gravity centers of normal and abnormal classes, and at the same time minimizing the variance within each class. The performance of the proposed algorithm is evaluated on several benchmarks. The experimental results demonstrate the feasibility and the effectiveness of the presented method.
Abstract: In this paper, the modelling and design of artificial neural network architecture for load forecasting purposes is investigated. The primary pre-requisite for power system planning is to arrive at realistic estimates of future demand of power, which is known as Load Forecasting. Short Term Load Forecasting (STLF) helps in determining the economic, reliable and secure operating strategies for power system. The dependence of load on several factors makes the load forecasting a very challenging job. An over estimation of the load may cause premature investment and unnecessary blocking of the capital where as under estimation of load may result in shortage of equipment and circuits. It is always better to plan the system for the load slightly higher than expected one so that no exigency may arise. In this paper, a load-forecasting model is proposed using a multilayer neural network with an appropriately modified back propagation learning algorithm. Once the neural network model is designed and trained, it can forecast the load of the power system 24 hours ahead on daily basis and can also forecast the cumulative load on daily basis. The real load data that is used for the Artificial Neural Network training was taken from LDC, Gujarat Electricity Board, Jambuva, Gujarat, India. The results show that the load forecasting of the ANN model follows the actual load pattern more accurately throughout the forecasted period.
Abstract: Classification is one of the primary themes in
computational biology. The accuracy of classification strongly
depends on quality of a dataset, and we need some method to
evaluate this quality. In this paper, we propose a new graphical
analysis method using 'Membership-Deviation Graph (MDG)' for
analyzing quality of a dataset. MDG represents degree of
membership and deviations for instances of a class in the dataset. The
result of MDG analysis is used for understanding specific feature and
for selecting best feature for classification.
Abstract: The control of sprayer boom undesired vibrations pose a great challenge to investigators due to various disturbances and conditions. Sprayer boom movements lead to reduce of spread efficiency and crop yield. This paper describes the design of a novel control method for an active suspension system applying proportional-integral-derivative (PID) controller with an active force control (AFC) scheme integration of an iterative learning algorithm employed to a sprayer boom. The iterative learning as an intelligent method is principally used as a method to calculate the best value of the estimated inertia of the sprayer boom needed for the AFC loop. Results show that the proposed AFC-based scheme performs much better than the standard PID control technique. Also, this shows that the system is more robust and accurate.