Abstract: We present an Electronic Nose (ENose), which is
aimed at identifying the presence of one out of two gases, possibly
detecting the presence of a mixture of the two. Estimation of the
concentrations of the components is also performed for a volatile
organic compound (VOC) constituted by methanol and acetone, for
the ranges 40-400 and 22-220 ppm (parts-per-million), respectively.
Our system contains 8 sensors, 5 of them being gas sensors (of the
class TGS from FIGARO USA, INC., whose sensing element is a tin
dioxide (SnO2) semiconductor), the remaining being a temperature
sensor (LM35 from National Semiconductor Corporation), a
humidity sensor (HIH–3610 from Honeywell), and a pressure sensor
(XFAM from Fujikura Ltd.).
Our integrated hardware–software system uses some machine
learning principles and least square regression principle to identify at
first a new gas sample, or a mixture, and then to estimate the
concentrations. In particular we adopt a training model using the
Support Vector Machine (SVM) approach with linear kernel to teach
the system how discriminate among different gases. Then we apply
another training model using the least square regression, to predict
the concentrations.
The experimental results demonstrate that the proposed
multiclassification and regression scheme is effective in the
identification of the tested VOCs of methanol and acetone with
96.61% correctness. The concentration prediction is obtained with
0.979 and 0.964 correlation coefficient for the predicted versus real
concentrations of methanol and acetone, respectively.
Abstract: A Novel fuzzy neural network combining with support vector learning mechanism called support-vector-based fuzzy neural networks (SVBFNN) is proposed. The SVBFNN combine the capability of minimizing the empirical risk (training error) and expected risk (testing error) of support vector learning in high dimensional data spaces and the efficient human-like reasoning of FNN.
Abstract: Fault-proneness of a software module is the
probability that the module contains faults. To predict faultproneness
of modules different techniques have been proposed which
includes statistical methods, machine learning techniques, neural
network techniques and clustering techniques. The aim of proposed
study is to explore whether metrics available in the early lifecycle
(i.e. requirement metrics), metrics available in the late lifecycle (i.e.
code metrics) and metrics available in the early lifecycle (i.e.
requirement metrics) combined with metrics available in the late
lifecycle (i.e. code metrics) can be used to identify fault prone
modules using Genetic Algorithm technique. This approach has been
tested with real time defect C Programming language datasets of
NASA software projects. The results show that the fusion of
requirement and code metric is the best prediction model for
detecting the faults as compared with commonly used code based
model.
Abstract: Serial Analysis of Gene Expression is a powerful
quantification technique for generating cell or tissue gene expression
data. The profile of the gene expression of cell or tissue in several
different states is difficult for biologists to analyze because of the large
number of genes typically involved. However, feature selection in
machine learning can successfully reduce this problem. The method
allows reducing the features (genes) in specific SAGE data, and
determines only relevant genes. In this study, we used a genetic
algorithm to implement feature selection, and evaluate the
classification accuracy of the selected features with the K-nearest
neighbor method. In order to validate the proposed method, we used
two SAGE data sets for testing. The results of this study conclusively
prove that the number of features of the original SAGE data set can be
significantly reduced and higher classification accuracy can be
achieved.
Abstract: More and more natural disasters are happening every
year: floods, earthquakes, volcanic eruptions, etc. In order to reduce
the risk of possible damages, governments all around the world are
investing into development of Early Warning Systems (EWS) for
environmental applications. The most important task of the EWS is
identification of the onset of critical situations affecting environment
and population, early enough to inform the authorities and general
public. This paper describes an approach for monitoring of flood
protections systems based on machine learning methods. An
Artificial Intelligence (AI) component has been developed for
detection of abnormal dike behaviour. The AI module has been
integrated into an EWS platform of the UrbanFlood project (EU
Seventh Framework Programme) and validated on real-time
measurements from the sensors installed in a dike.
Abstract: Fundamental sensor-motor couplings form the backbone
of most mobile robot control tasks, and often need to be implemented
fast, efficiently and nevertheless reliably. Machine learning
techniques are therefore often used to obtain the desired sensor-motor
competences.
In this paper we present an alternative to established machine
learning methods such as artificial neural networks, that is very fast,
easy to implement, and has the distinct advantage that it generates
transparent, analysable sensor-motor couplings: system identification
through nonlinear polynomial mapping.
This work, which is part of the RobotMODIC project at the
universities of Essex and Sheffield, aims to develop a theoretical understanding
of the interaction between the robot and its environment.
One of the purposes of this research is to enable the principled design
of robot control programs.
As a first step towards this aim we model the behaviour of the
robot, as this emerges from its interaction with the environment, with
the NARMAX modelling method (Nonlinear, Auto-Regressive, Moving
Average models with eXogenous inputs). This method produces
explicit polynomial functions that can be subsequently analysed using
established mathematical methods.
In this paper we demonstrate the fidelity of the obtained NARMAX
models in the challenging task of robot route learning; we present a
set of experiments in which a Magellan Pro mobile robot was taught
to follow four different routes, always using the same mechanism to
obtain the required control law.
Abstract: Many natural language expressions are ambiguous, and
need to draw on other sources of information to be interpreted.
Interpretation of the e word تعاون to be considered as a noun or a verb
depends on the presence of contextual cues. To interpret words we
need to be able to discriminate between different usages. This paper
proposes a hybrid of based- rules and a machine learning method for
tagging Arabic words. The particularity of Arabic word that may be
composed of stem, plus affixes and clitics, a small number of rules
dominate the performance (affixes include inflexional markers for
tense, gender and number/ clitics include some prepositions,
conjunctions and others). Tagging is closely related to the notion of
word class used in syntax. This method is based firstly on rules (that
considered the post-position, ending of a word, and patterns), and
then the anomaly are corrected by adopting a memory-based learning
method (MBL). The memory_based learning is an efficient method to
integrate various sources of information, and handling exceptional
data in natural language processing tasks. Secondly checking the
exceptional cases of rules and more information is made available to
the learner for treating those exceptional cases. To evaluate the
proposed method a number of experiments has been run, and in
order, to improve the importance of the various information in
learning.
Abstract: As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
Abstract: Keystroke authentication is a new access control system
to identify legitimate users via their typing behavior. In this paper,
machine learning techniques are adapted for keystroke authentication.
Seven learning methods are used to build models to differentiate user
keystroke patterns. The selected classification methods are Decision
Tree, Naive Bayesian, Instance Based Learning, Decision Table, One
Rule, Random Tree and K-star. Among these methods, three of them
are studied in more details. The results show that machine learning
is a feasible alternative for keystroke authentication. Compared to
the conventional Nearest Neighbour method in the recent research,
learning methods especially Decision Tree can be more accurate. In
addition, the experiment results reveal that 3-Grams is more accurate
than 2-Grams and 4-Grams for feature extraction. Also, combination
of attributes tend to result higher accuracy.
Abstract: Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.
Abstract: Recent scientific investigations indicate that
multimodal biometrics overcome the technical limitations of
unimodal biometrics, making them ideally suited for everyday life
applications that require a reliable authentication system. However,
for a successful adoption of multimodal biometrics, such systems
would require large heterogeneous datasets with complex multimodal
fusion and privacy schemes spanning various distributed
environments. From experimental investigations of current
multimodal systems, this paper reports the various issues related to
speed, error-recovery and privacy that impede the diffusion of such
systems in real-life. This calls for a robust mechanism that caters to
the desired real-time performance, robust fusion schemes,
interoperability and adaptable privacy policies.
The main objective of this paper is to present a framework that
addresses the abovementioned issues by leveraging on the
heterogeneous resource sharing capacities of Grid services and the
efficient machine learning capabilities of artificial neural networks
(ANN). Hence, this paper proposes a Grid-based neural network
framework for adopting multimodal biometrics with the view of
overcoming the barriers of performance, privacy and risk issues that
are associated with shared heterogeneous multimodal data centres.
The framework combines the concept of Grid services for reliable
brokering and privacy policy management of shared biometric
resources along with a momentum back propagation ANN (MBPANN)
model of machine learning for efficient multimodal fusion and
authentication schemes. Real-life applications would be able to adopt
the proposed framework to cater to the varying business requirements
and user privacies for a successful diffusion of multimodal
biometrics in various day-to-day transactions.
Abstract: Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.
Abstract: In this paper, we propose a hybrid machine learning
system based on Genetic Algorithm (GA) and Support Vector
Machines (SVM) for stock market prediction. A variety of indicators
from the technical analysis field of study are used as input features.
We also make use of the correlation between stock prices of different
companies to forecast the price of a stock, making use of technical
indicators of highly correlated stocks, not only the stock to be
predicted. The genetic algorithm is used to select the set of most
informative input features from among all the technical indicators.
The results show that the hybrid GA-SVM system outperforms the
stand alone SVM system.
Abstract: In this paper, we propose a robust disease detection
method, called adaptive orientation code matching (Adaptive OCM),
which is developed from a robust image registration algorithm:
orientation code matching (OCM), to achieve continuous and
site-specific detection of changes in plant disease. We use two-stage
framework for realizing our research purpose; in the first stage,
adaptive OCM was employed which could not only realize the
continuous and site-specific observation of disease development, but
also shows its excellent robustness for non-rigid plant object searching
in scene illumination, translation, small rotation and occlusion changes
and then in the second stage, a machine learning method of support
vector machine (SVM) based on a feature of two dimensional (2D)
xy-color histogram is further utilized for pixel-wise disease
classification and quantification. The indoor experiment results
demonstrate the feasibility and potential of our proposed algorithm,
which could be implemented in real field situation for better
observation of plant disease development.
Abstract: As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.
Abstract: Ensemble learning algorithms such as AdaBoost and
Bagging have been in active research and shown improvements in
classification results for several benchmarking data sets with mainly
decision trees as their base classifiers. In this paper we experiment to
apply these Meta learning techniques with classifiers such as random
forests, neural networks and support vector machines. The data sets
are from MAGIC, a Cherenkov telescope experiment. The task is to
classify gamma signals from overwhelmingly hadron and muon
signals representing a rare class classification problem. We compare
the individual classifiers with their ensemble counterparts and
discuss the results. WEKA a wonderful tool for machine learning has
been used for making the experiments.
Abstract: This paper proposes an innovative methodology for
Acceptance Sampling by Variables, which is a particular category of
Statistical Quality Control dealing with the assurance of products
quality. Our contribution lies in the exploitation of machine learning
techniques to address the complexity and remedy the drawbacks of
existing approaches. More specifically, the proposed methodology
exploits Artificial Neural Networks (ANNs) to aid decision making
about the acceptance or rejection of an inspected sample. For any
type of inspection, ANNs are trained by data from corresponding
tables of a standard-s sampling plan schemes. Once trained, ANNs
can give closed-form solutions for any acceptance quality level and
sample size, thus leading to an automation of the reading of the
sampling plan tables, without any need of compromise with the
values of the specific standard chosen each time. The proposed
methodology provides enough flexibility to quality control engineers
during the inspection of their samples, allowing the consideration of
specific needs, while it also reduces the time and the cost required for
these inspections. Its applicability and advantages are demonstrated
through two numerical examples.
Abstract: Term Extraction, a key data preparation step in Text
Mining, extracts the terms, i.e. relevant collocation of words,
attached to specific concepts (e.g. genetic-algorithms and decisiontrees
are terms associated to the concept “Machine Learning" ). In
this paper, the task of extracting interesting collocations is achieved
through a supervised learning algorithm, exploiting a few
collocations manually labelled as interesting/not interesting. From
these examples, the ROGER algorithm learns a numerical function,
inducing some ranking on the collocations. This ranking is optimized
using genetic algorithms, maximizing the trade-off between the false
positive and true positive rates (Area Under the ROC curve). This
approach uses a particular representation for the word collocations,
namely the vector of values corresponding to the standard statistical
interestingness measures attached to this collocation. As this
representation is general (over corpora and natural languages),
generality tests were performed by experimenting the ranking
function learned from an English corpus in Biology, onto a French
corpus of Curriculum Vitae, and vice versa, showing a good
robustness of the approaches compared to the state-of-the-art Support
Vector Machine (SVM).
Abstract: This paper explores the effectiveness of machine
learning techniques in detecting firms that issue fraudulent financial
statements (FFS) and deals with the identification of factors
associated to FFS. To this end, a number of experiments have been
conducted using representative learning algorithms, which were
trained using a data set of 164 fraud and non-fraud Greek firms in the
recent period 2001-2002. The decision of which particular method to
choose is a complicated problem. A good alternative to choosing
only one method is to create a hybrid forecasting system
incorporating a number of possible solution methods as components
(an ensemble of classifiers). For this purpose, we have implemented
a hybrid decision support system that combines the representative
algorithms using a stacking variant methodology and achieves better
performance than any examined simple and ensemble method. To
sum up, this study indicates that the investigation of financial
information can be used in the identification of FFS and underline the
importance of financial ratios.
Abstract: Injection molding is a very complicated process to
monitor and control. With its high complexity and many process
parameters, the optimization of these systems is a very challenging
problem. To meet the requirements and costs demanded by the
market, there has been an intense development and research with the
aim to maintain the process under control. This paper outlines the
latest advances in necessary algorithms for plastic injection process
and monitoring, and also a flexible data acquisition system that
allows rapid implementation of complex algorithms to assess their
correct performance and can be integrated in the quality control
process. This is the main topic of this paper. Finally, to demonstrate
the performance achieved by this combination, a real case of use is
presented.