Abstract: Data mining can be called as a technique to extract
information from data. It is the process of obtaining hidden
information and then turning it into qualified knowledge by statistical
and artificial intelligence technique. One of its application areas is
medical area to form decision support systems for diagnosis just by
inventing meaningful information from given medical data. In this
study a decision support system for diagnosis of illness that make use
of data mining and three different artificial intelligence classifier
algorithms namely Multilayer Perceptron, Naive Bayes Classifier and
J.48. Pima Indian dataset of UCI Machine Learning Repository was
used. This dataset includes urinary and blood test results of 768
patients. These test results consist of 8 different feature vectors.
Obtained classifying results were compared with the previous studies.
The suggestions for future studies were presented.
Abstract: In this paper, we present an innovative scheme of
blindly extracting message bits from an image distorted by an attack.
Support Vector Machine (SVM) is used to nonlinearly classify the
bits of the embedded message. Traditionally, a hard decoder is used
with the assumption that the underlying modeling of the Discrete
Cosine Transform (DCT) coefficients does not appreciably change.
In case of an attack, the distribution of the image coefficients is
heavily altered. The distribution of the sufficient statistics at the
receiving end corresponding to the antipodal signals overlap and a
simple hard decoder fails to classify them properly. We are
considering message retrieval of antipodal signal as a binary
classification problem. Machine learning techniques like SVM is
used to retrieve the message, when certain specific class of attacks is
most probable. In order to validate SVM based decoding scheme, we
have taken Gaussian noise as a test case. We generate a data set using
125 images and 25 different keys. Polynomial kernel of SVM has
achieved 100 percent accuracy on test data.
Abstract: Music segmentation is a key issue in music information
retrieval (MIR) as it provides an insight into the
internal structure of a composition. Structural information about
a composition can improve several tasks related to MIR such
as searching and browsing large music collections, visualizing
musical structure, lyric alignment, and music summarization.
The authors of this paper present the MTSSM framework, a twolayer
framework for the multi-track segmentation of symbolic
music. The strength of this framework lies in the combination of
existing methods for local track segmentation and the application
of global structure information spanning via multiple tracks.
The first layer of the MTSSM uses various string matching
techniques to detect the best candidate segmentations for each
track of a multi-track composition independently. The second
layer combines all single track results and determines the best
segmentation for each track in respect to the global structure of
the composition.
Abstract: The myoelectric signal (MES) is one of the Biosignals
utilized in helping humans to control equipments. Recent approaches
in MES classification to control prosthetic devices employing pattern
recognition techniques revealed two problems, first, the classification
performance of the system starts degrading when the number of
motion classes to be classified increases, second, in order to solve the
first problem, additional complicated methods were utilized which
increase the computational cost of a multifunction myoelectric
control system. In an effort to solve these problems and to achieve a
feasible design for real time implementation with high overall
accuracy, this paper presents a new method for feature extraction in
MES recognition systems. The method works by extracting features
using Wavelet Packet Transform (WPT) applied on the MES from
multiple channels, and then employs Fuzzy c-means (FCM)
algorithm to generate a measure that judges on features suitability for
classification. Finally, Principle Component Analysis (PCA) is
utilized to reduce the size of the data before computing the
classification accuracy with a multilayer perceptron neural network.
The proposed system produces powerful classification results (99%
accuracy) by using only a small portion of the original feature set.
Abstract: This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.
Abstract: In recent years, real estate prediction or valuation has
been a topic of discussion in many developed countries. Improper
hype created by investors leads to fluctuating prices of real estate,
affecting many consumers to purchase their own homes. Therefore,
scholars from various countries have conducted research in real estate
valuation and prediction. With the back-propagation neural network
that has been popular in recent years and the orthogonal array in the
Taguchi method, this study aimed to find the optimal parameter
combination at different levels of orthogonal array after the system
presented different parameter combinations, so that the artificial
neural network obtained the most accurate results. The experimental
results also demonstrated that the method presented in the study had a
better result than traditional machine learning. Finally, it also showed
that the model proposed in this study had the optimal predictive effect,
and could significantly reduce the cost of time in simulation operation.
The best predictive results could be found with a fewer number of
experiments more efficiently. Thus users could predict a real estate
transaction price that is not far from the current actual prices.
Abstract: In the present study, a support vector machine (SVM) learning approach to character recognition is proposed. Simple
feature detectors, similar to those found in the human visual system, were used in the SVM classifier. Alphabetic characters were rotated
to 8 different angles and using the proposed cognitive model, all characters were recognized with 100% accuracy and specificity.
These same results were found in psychiatric studies of human character recognition.
Abstract: Combining classifiers is a useful method for solving
complex problems in machine learning. The ECOC (Error Correcting
Output Codes) method has been widely used for designing combining
classifiers with an emphasis on the diversity of classifiers. In this
paper, in contrast to the standard ECOC approach in which individual
classifiers are chosen homogeneously, classifiers are selected
according to the complexity of the corresponding binary problem. We
use SATIMAGE database (containing 6 classes) for our experiments.
The recognition error rate in our proposed method is %10.37 which
indicates a considerable improvement in comparison with the
conventional ECOC and stack generalization methods.
Abstract: Modeling the behavior of the dialogue management in
the design of a spoken dialogue system using statistical methodologies
is currently a growing research area. This paper presents a work
on developing an adaptive learning approach to optimize dialogue
strategy. At the core of our system is a method formalizing dialogue
management as a sequential decision making under uncertainty whose
underlying probabilistic structure has a Markov Chain. Researchers
have mostly focused on model-free algorithms for automating the
design of dialogue management using machine learning techniques
such as reinforcement learning. But in model-free algorithms there
exist a dilemma in engaging the type of exploration versus exploitation.
Hence we present a model-based online policy learning
algorithm using interconnected learning automata for optimizing
dialogue strategy. The proposed algorithm is capable of deriving
an optimal policy that prescribes what action should be taken in
various states of conversation so as to maximize the expected total
reward to attain the goal and incorporates good exploration and
exploitation in its updates to improve the naturalness of humancomputer
interaction. We test the proposed approach using the most
sophisticated evaluation framework PARADISE for accessing to the
railway information system.
Abstract: In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.
Abstract: Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.
Abstract: This paper explores the scalability issues associated
with solving the Named Entity Recognition (NER) problem using
Support Vector Machines (SVM) and high-dimensional features. The
performance results of a set of experiments conducted using binary
and multi-class SVM with increasing training data sizes are
examined. The NER domain chosen for these experiments is the
biomedical publications domain, especially selected due to its
importance and inherent challenges. A simple machine learning
approach is used that eliminates prior language knowledge such as
part-of-speech or noun phrase tagging thereby allowing for its
applicability across languages. No domain-specific knowledge is
included. The accuracy measures achieved are comparable to those
obtained using more complex approaches, which constitutes a
motivation to investigate ways to improve the scalability of multiclass
SVM in order to make the solution more practical and useable.
Improving training time of multi-class SVM would make support
vector machines a more viable and practical machine learning
solution for real-world problems with large datasets. An initial
prototype results in great improvement of the training time at the
expense of memory requirements.
Abstract: This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Abstract: This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.
Abstract: Complex assemblies of interacting proteins carry out
most of the interesting jobs in a cell, such as metabolism, DNA
synthesis, mitosis and cell division. These physiological properties
play out as a subtle molecular dance, choreographed by underlying
regulatory networks that control the activities of cyclin-dependent
kinases (CDK). The network can be modeled by a set of nonlinear
differential equations and its behavior predicted by numerical
simulation. In this paper, an innovative approach has been proposed
that uses genetic algorithms to mine a set of behavior data output by
a biological system in order to determine the kinetic parameters of
the system. In our approach, the machine learning method is
integrated with the framework of existent biological information in a
wiring diagram so that its findings are expressed in a form of system
dynamic behavior. By numerical simulations it has been illustrated
that the model is consistent with experiments and successfully shown
that such application of genetic algorithms will highly improve the
performance of mathematical model of the cell division cycle to
simulate such a complicated bio-system.
Abstract: This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Abstract: Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.
Abstract: This paper investigates how the use of machine learning techniques can significantly predict the three major dimensions of learner-s emotions (pleasure, arousal and dominance) from brainwaves. This study has adopted an experimentation in which participants were exposed to a set of pictures from the International Affective Picture System (IAPS) while their electrical brain activity was recorded with an electroencephalogram (EEG). The pictures were already rated in a previous study via the affective rating system Self-Assessment Manikin (SAM) to assess the three dimensions of pleasure, arousal, and dominance. For each picture, we took the mean of these values for all subjects used in this previous study and associated them to the recorded brainwaves of the participants in our study. Correlation and regression analyses confirmed the hypothesis that brainwave measures could significantly predict emotional dimensions. This can be very useful in the case of impassive, taciturn or disabled learners. Standard classification techniques were used to assess the reliability of the automatic detection of learners- three major dimensions from the brainwaves. We discuss the results and the pertinence of such a method to assess learner-s emotions and integrate it into a brainwavesensing Intelligent Tutoring System.
Abstract: Logic based methods for learning from structured data
is limited w.r.t. handling large search spaces, preventing large-sized
substructures from being considered by the resulting classifiers. A
novel approach to learning from structured data is introduced that
employs a structure transformation method, called finger printing, for
addressing these limitations. The method, which generates features
corresponding to arbitrarily complex substructures, is implemented in
a system, called DIFFER. The method is demonstrated to perform
comparably to an existing state-of-art method on some benchmark
data sets without requiring restrictions on the search space.
Furthermore, learning from the union of features generated by finger
printing and the previous method outperforms learning from each
individual set of features on all benchmark data sets, demonstrating
the benefit of developing complementary, rather than competing,
methods for structure classification.
Abstract: To create a solution for a specific problem in machine
learning, the solution is constructed from the data or by use a search
method. Genetic algorithms are a model of machine learning that can
be used to find nearest optimal solution. While the great advantage of
genetic algorithms is the fact that they find a solution through
evolution, this is also the biggest disadvantage. Evolution is inductive,
in nature life does not evolve towards a good solution but it evolves
away from bad circumstances. This can cause a species to evolve into
an evolutionary dead end. In order to reduce the effect of this
disadvantage we propose a new a learning tool (criteria) which can be
included into the genetic algorithms generations to compare the
previous population and the current population and then decide
whether is effective to continue with the previous population or the
current population, the proposed learning tool is called as Keeping
Efficient Population (KEP). We applied a GA based on KEP to the
production line layout problem, as a result KEP keep the evaluation
direction increases and stops any deviation in the evaluation.