Abstract: The dramatic rise in the use of Social Media (SM)
platforms such as Facebook and Twitter provide access to an
unprecedented amount of user data. Users may post reviews on
products and services they bought, write about their interests, share
ideas or give their opinions and views on political issues. There is a
growing interest in the analysis of SM data from organisations for
detecting new trends, obtaining user opinions on their products and
services or finding out about their online reputations. A recent
research trend in SM analysis is making predictions based on
sentiment analysis of SM. Often indicators of historic SM data are
represented as time series and correlated with a variety of real world
phenomena like the outcome of elections, the development of
financial indicators, box office revenue and disease outbreaks. This
paper examines the current state of research in the area of SM mining
and predictive analysis and gives an overview of the analysis
methods using opinion mining and machine learning techniques.
Abstract: Real-time or in-line process monitoring frameworks are designed to give early warnings for a fault along with meaningful identification of its assignable causes. In artificial intelligence and machine learning fields of pattern recognition various promising approaches have been proposed such as kernel-based nonlinear machine learning techniques. This work presents a kernel-based empirical monitoring scheme for batch type production processes with small sample size problem of partially unbalanced data. Measurement data of normal operations are easy to collect whilst special events or faults data are difficult to collect. In such situations, noise filtering techniques can be helpful in enhancing process monitoring performance. Furthermore, preprocessing of raw process data is used to get rid of unwanted variation of data. The performance of the monitoring scheme was demonstrated using three-dimensional batch data. The results showed that the monitoring performance was improved significantly in terms of detection success rate of process fault.
Abstract: The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Abstract: Hospital staff and managers are under pressure and
concerned for effective use and management of scarce resources. The
hospital admissions require many decisions that have complex and
uncertain consequences for hospital resource utilization and patient
flow. It is challenging to predict risk of admissions and length of stay
of a patient due to their vague nature. There is no method to capture
the vague definition of admission of a patient. Also, current methods
and tools used to predict patients at risk of admission fail to deal with
uncertainty in unplanned admission, LOS, patients- characteristics.
The main objective of this paper is to deal with uncertainty in
health system variables, and handles uncertain relationship among
variables. An introduction of machine learning techniques along with
statistical methods like Regression methods can be a proposed
solution approach to handle uncertainty in health system variables. A
model that adapts fuzzy methods to handle uncertain data and
uncertain relationships can be an efficient solution to capture the
vague definition of admission of a patient.
Abstract: Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.
Abstract: Financial forecasting using machine learning techniques has received great efforts in the last decide . In this ongoing work, we show how machine learning of graphical models will be able to infer a visualized causal interactions between different banks in the Saudi equities market. One important discovery from such learned causal graphs is how companies influence each other and to what extend. In this work, a set of graphical models named Gaussian graphical models with developed ensemble penalized feature selection methods that combine ; filtering method, wrapper method and a regularizer will be shown. A comparison between these different developed ensemble combinations will also be shown. The best ensemble method will be used to infer the causal relationships between banks in Saudi equities market.
Abstract: Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.
Abstract: In this paper, we present an innovative scheme of
blindly extracting message bits from an image distorted by an attack.
Support Vector Machine (SVM) is used to nonlinearly classify the
bits of the embedded message. Traditionally, a hard decoder is used
with the assumption that the underlying modeling of the Discrete
Cosine Transform (DCT) coefficients does not appreciably change.
In case of an attack, the distribution of the image coefficients is
heavily altered. The distribution of the sufficient statistics at the
receiving end corresponding to the antipodal signals overlap and a
simple hard decoder fails to classify them properly. We are
considering message retrieval of antipodal signal as a binary
classification problem. Machine learning techniques like SVM is
used to retrieve the message, when certain specific class of attacks is
most probable. In order to validate SVM based decoding scheme, we
have taken Gaussian noise as a test case. We generate a data set using
125 images and 25 different keys. Polynomial kernel of SVM has
achieved 100 percent accuracy on test data.
Abstract: Modeling the behavior of the dialogue management in
the design of a spoken dialogue system using statistical methodologies
is currently a growing research area. This paper presents a work
on developing an adaptive learning approach to optimize dialogue
strategy. At the core of our system is a method formalizing dialogue
management as a sequential decision making under uncertainty whose
underlying probabilistic structure has a Markov Chain. Researchers
have mostly focused on model-free algorithms for automating the
design of dialogue management using machine learning techniques
such as reinforcement learning. But in model-free algorithms there
exist a dilemma in engaging the type of exploration versus exploitation.
Hence we present a model-based online policy learning
algorithm using interconnected learning automata for optimizing
dialogue strategy. The proposed algorithm is capable of deriving
an optimal policy that prescribes what action should be taken in
various states of conversation so as to maximize the expected total
reward to attain the goal and incorporates good exploration and
exploitation in its updates to improve the naturalness of humancomputer
interaction. We test the proposed approach using the most
sophisticated evaluation framework PARADISE for accessing to the
railway information system.
Abstract: This paper investigates how the use of machine learning techniques can significantly predict the three major dimensions of learner-s emotions (pleasure, arousal and dominance) from brainwaves. This study has adopted an experimentation in which participants were exposed to a set of pictures from the International Affective Picture System (IAPS) while their electrical brain activity was recorded with an electroencephalogram (EEG). The pictures were already rated in a previous study via the affective rating system Self-Assessment Manikin (SAM) to assess the three dimensions of pleasure, arousal, and dominance. For each picture, we took the mean of these values for all subjects used in this previous study and associated them to the recorded brainwaves of the participants in our study. Correlation and regression analyses confirmed the hypothesis that brainwave measures could significantly predict emotional dimensions. This can be very useful in the case of impassive, taciturn or disabled learners. Standard classification techniques were used to assess the reliability of the automatic detection of learners- three major dimensions from the brainwaves. We discuss the results and the pertinence of such a method to assess learner-s emotions and integrate it into a brainwavesensing Intelligent Tutoring System.
Abstract: Fault-proneness of a software module is the
probability that the module contains faults. To predict faultproneness
of modules different techniques have been proposed which
includes statistical methods, machine learning techniques, neural
network techniques and clustering techniques. The aim of proposed
study is to explore whether metrics available in the early lifecycle
(i.e. requirement metrics), metrics available in the late lifecycle (i.e.
code metrics) and metrics available in the early lifecycle (i.e.
requirement metrics) combined with metrics available in the late
lifecycle (i.e. code metrics) can be used to identify fault prone
modules using Genetic Algorithm technique. This approach has been
tested with real time defect C Programming language datasets of
NASA software projects. The results show that the fusion of
requirement and code metric is the best prediction model for
detecting the faults as compared with commonly used code based
model.
Abstract: Keystroke authentication is a new access control system
to identify legitimate users via their typing behavior. In this paper,
machine learning techniques are adapted for keystroke authentication.
Seven learning methods are used to build models to differentiate user
keystroke patterns. The selected classification methods are Decision
Tree, Naive Bayesian, Instance Based Learning, Decision Table, One
Rule, Random Tree and K-star. Among these methods, three of them
are studied in more details. The results show that machine learning
is a feasible alternative for keystroke authentication. Compared to
the conventional Nearest Neighbour method in the recent research,
learning methods especially Decision Tree can be more accurate. In
addition, the experiment results reveal that 3-Grams is more accurate
than 2-Grams and 4-Grams for feature extraction. Also, combination
of attributes tend to result higher accuracy.
Abstract: This paper proposes an innovative methodology for
Acceptance Sampling by Variables, which is a particular category of
Statistical Quality Control dealing with the assurance of products
quality. Our contribution lies in the exploitation of machine learning
techniques to address the complexity and remedy the drawbacks of
existing approaches. More specifically, the proposed methodology
exploits Artificial Neural Networks (ANNs) to aid decision making
about the acceptance or rejection of an inspected sample. For any
type of inspection, ANNs are trained by data from corresponding
tables of a standard-s sampling plan schemes. Once trained, ANNs
can give closed-form solutions for any acceptance quality level and
sample size, thus leading to an automation of the reading of the
sampling plan tables, without any need of compromise with the
values of the specific standard chosen each time. The proposed
methodology provides enough flexibility to quality control engineers
during the inspection of their samples, allowing the consideration of
specific needs, while it also reduces the time and the cost required for
these inspections. Its applicability and advantages are demonstrated
through two numerical examples.
Abstract: This paper presents the methodology from machine
learning approaches for short-term rain forecasting system. Decision
Tree, Artificial Neural Network (ANN), and Support Vector Machine
(SVM) were applied to develop classification and prediction models
for rainfall forecasts. The goals of this presentation are to
demonstrate (1) how feature selection can be used to identify the
relationships between rainfall occurrences and other weather
conditions and (2) what models can be developed and deployed for
predicting the accurate rainfall estimates to support the decisions to
launch the cloud seeding operations in the northeastern part of
Thailand. Datasets collected during 2004-2006 from the
Chalermprakiat Royal Rain Making Research Center at Hua Hin,
Prachuap Khiri khan, the Chalermprakiat Royal Rain Making
Research Center at Pimai, Nakhon Ratchasima and Thai
Meteorological Department (TMD). A total of 179 records with 57
features was merged and matched by unique date. There are three
main parts in this work. Firstly, a decision tree induction algorithm
(C4.5) was used to classify the rain status into either rain or no-rain.
The overall accuracy of classification tree achieves 94.41% with the
five-fold cross validation. The C4.5 algorithm was also used to
classify the rain amount into three classes as no-rain (0-0.1 mm.),
few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall
accuracy of classification tree achieves 62.57%. Secondly, an ANN
was applied to predict the rainfall amount and the root mean square
error (RMSE) were used to measure the training and testing errors of
the ANN. It is found that the ANN yields a lower RMSE at 0.171 for
daily rainfall estimates, when compared to next-day and next-2-day
estimation. Thirdly, the ANN and SVM techniques were also used to
classify the rain amount into three classes as no-rain, few-rain, and
moderate-rain as above. The results achieved in 68.15% and 69.10%
of overall accuracy of same-day prediction for the ANN and SVM
models, respectively. The obtained results illustrated the comparison
of the predictive power of different methods for rainfall estimation.
Abstract: Understanding proteins functions is a major goal in
the post-genomic era. Proteins usually work in context of other
proteins and rarely function alone. Therefore, it is highly relevant to
study the interaction partners of a protein in order to understand its
function. Machine learning techniques have been widely applied to
predict protein-protein interactions. Kernel functions play an
important role for a successful machine learning technique. Choosing
the appropriate kernel function can lead to a better accuracy in a
binary classifier such as the support vector machines. In this paper,
we describe a Bayesian kernel for the support vector machine to
predict protein-protein interactions. The use of Bayesian kernel can
improve the classifier performance by incorporating the probability
characteristic of the available experimental protein-protein
interactions data that were compiled from different sources. In
addition, the probabilistic output from the Bayesian kernel can assist
biologists to conduct more research on the highly predicted
interactions. The results show that the accuracy of the classifier has
been improved using the Bayesian kernel compared to the standard
SVM kernels. These results imply that protein-protein interaction can
be predicted using Bayesian kernel with better accuracy compared to
the standard SVM kernels.
Abstract: Support vector machines (SVMs) have shown
superior performance compared to other machine learning techniques,
especially in classification problems. Yet one limitation of SVMs is
the lack of an explanation capability which is crucial in some
applications, e.g. in the medical and security domains. In this paper, a
novel approach for eclectic rule-extraction from support vector
machines is presented. This approach utilizes the knowledge acquired
by the SVM and represented in its support vectors as well as the
parameters associated with them. The approach includes three stages;
training, propositional rule-extraction and rule quality evaluation.
Results from four different experiments have demonstrated the value
of the approach for extracting comprehensible rules of high accuracy
and fidelity.