Abstract: Bioinformatics methods for predicting the T cell
coreceptor usage from the array of membrane protein of HIV-1 are
investigated. In this study, we aim to propose an effective prediction
method for dealing with the three-class classification problem of
CXCR4 (X4), CCR5 (R5) and CCR5/CXCR4 (R5X4). We made
efforts in investigating the coreceptor prediction problem as follows: 1)
proposing a feature set of informative physicochemical properties
which is cooperated with SVM to achieve high prediction test
accuracy of 81.48%, compared with the existing method with
accuracy of 70.00%; 2) establishing a large up-to-date data set by
increasing the size from 159 to 1225 sequences to verify the proposed
prediction method where the mean test accuracy is 88.59%, and 3)
analyzing the set of 14 informative physicochemical properties to
further understand the characteristics of HIV-1coreceptors.
Abstract: Most of the Question Answering systems
composed of three main modules: question processing,
document processing and answer processing. Question
processing module plays an important role in QA systems. If
this module doesn't work properly, it will make problems for
other sections. Moreover answer processing module is an
emerging topic in Question Answering, where these systems
are often required to rank and validate candidate answers.
These techniques aiming at finding short and precise answers
are often based on the semantic classification.
This paper discussed about a new model for question
answering which improved two main modules, question
processing and answer processing.
There are two important components which are the bases
of the question processing. First component is question
classification that specifies types of question and answer.
Second one is reformulation which converts the user's
question into an understandable question by QA system in a
specific domain. Answer processing module, consists of
candidate answer filtering, candidate answer ordering
components and also it has a validation section for interacting
with user. This module makes it more suitable to find exact
answer. In this paper we have described question and answer
processing modules with modeling, implementing and
evaluating the system. System implemented in two versions.
Results show that 'Version No.1' gave correct answer to 70%
of questions (30 correct answers to 50 asked questions) and
'version No.2' gave correct answers to 94% of questions (47
correct answers to 50 asked questions).
Abstract: The paper describes a knowledge based system for
analysis of microscopic wear particles. Wear particles contained in
lubricating oil carry important information concerning machine
condition, in particular the state of wear. Experts (Tribologists) in the
field extract this information to monitor the operation of the machine
and ensure safety, efficiency, quality, productivity, and economy of
operation. This procedure is not always objective and it can also be
expensive. The aim is to classify these particles according to their
morphological attributes of size, shape, edge detail, thickness ratio,
color, and texture, and by using this classification thereby predict
wear failure modes in engines and other machinery. The attribute
knowledge links human expertise to the devised Knowledge Based
Wear Particle Analysis System (KBWPAS). The system provides an
automated and systematic approach to wear particle identification
which is linked directly to wear processes and modes that occur in
machinery. This brings consistency in wear judgment prediction
which leads to standardization and also less dependence on
Tribologists.
Abstract: A lot of research has been done in the past decade in the field of audio content analysis for extracting various information from audio signal. One such significant information is the "perceived mood" or the "emotions" related to a music or audio clip. This information is extremely useful in applications like creating or adapting the play-list based on the mood of the listener. This information could also be helpful in better classification of the music database. In this paper we have presented a method to classify music not just based on the meta-data of the audio clip but also include the "mood" factor to help improve the music classification. We propose an automated and efficient way of classifying music samples based on the mood detection from the audio data. We in particular try to classify the music based on mood for Indian bollywood music. The proposed method tries to address the following problem statement: Genre information (usually part of the audio meta-data) alone does not help in better music classification. For example the acoustic version of the song "nothing else matters by Metallica" can be classified as melody music and thereby a person in relaxing or chill out mood might want to listen to this track. But more often than not this track is associated with metal / heavy rock genre and if a listener classified his play-list based on the genre information alone for his current mood, the user shall miss out on listening to this track. Currently methods exist to detect mood in western or similar kind of music. Our paper tries to solve the issue for Indian bollywood music from an Indian cultural context
Abstract: We compare three categorical data clustering
algorithms with respect to the problem of classifying cultural data
related to the aesthetic judgment of comics artists. Such a
classification is very important in Comics Art theory since the
determination of any classes of similarities in such kind of data will
provide to art-historians very fruitful information of Comics Art-s
evolution. To establish this, we use a categorical data set and we
study it by employing three categorical data clustering algorithms.
The performances of these algorithms are compared each other,
while interpretations of the clustering results are also given.
Abstract: In this paper we present a GP-based method for automatically evolve projections, so that data can be more easily classified in the projected spaces. At the same time, our approach can reduce dimensionality by constructing more relevant attributes. Fitness of each projection measures how easy is to classify the dataset after applying the projection. This is quickly computed by a Simple Linear Perceptron. We have tested our approach in three domains. The experiments show that it obtains good results, compared to other Machine Learning approaches, while reducing dimensionality in many cases.
Abstract: In this paper, we propose a method of resolving dependency ambiguities of Korean subordinate clauses based on Support Vector Machines (SVMs). Dependency analysis of clauses is well known to be one of the most difficult tasks in parsing sentences, especially in Korean. In order to solve this problem, we assume that the dependency relation of Korean subordinate clauses is the dependency relation among verb phrase, verb and endings in the clauses. As a result, this problem is represented as a binary classification task. In order to apply SVMs to this problem, we selected two kinds of features: static and dynamic features. The experimental results on STEP2000 corpus show that our system achieves the accuracy of 73.5%.
Abstract: An early and accurate detection of Alzheimer's disease (AD) is an important stage in the treatment of individuals suffering from AD. We present an approach based on the use of structural magnetic resonance imaging (sMRI) phase images to distinguish between normal controls (NC), mild cognitive impairment (MCI) and AD patients with clinical dementia rating (CDR) of 1. Independent component analysis (ICA) technique is used for extracting useful features which form the inputs to the support vector machines (SVM), K nearest neighbour (kNN) and multilayer artificial neural network (ANN) classifiers to discriminate between the three classes. The obtained results are encouraging in terms of classification accuracy and effectively ascertain the usefulness of phase images for the classification of different stages of Alzheimer-s disease.
Abstract: Microaneurysm is a key indicator of diabetic retinopathy that can potentially cause damage to retina. Early detection and automatic quantification are the keys to prevent further damage. In this paper, which focuses on automatic microaneurysm detection in images acquired through non-dilated pupils, we present a series of experiments on feature selection and automatic microaneurysm pixel classification. We found that the best feature set is a combination of 10 features: the pixel-s intensity of shade corrected image, the pixel hue, the standard deviation of shade corrected image, DoG4, the area of the candidate MA, the perimeter of the candidate MA, the eccentricity of the candidate MA, the circularity of the candidate MA, the mean intensity of the candidate MA on shade corrected image and the ratio of the major axis length and minor length of the candidate MA. The overall sensitivity, specificity, precision, and accuracy are 84.82%, 99.99%, 89.01%, and 99.99%, respectively.
Abstract: In this paper a Pattern Recognition algorithm based on
a constrained version of the k-means clustering algorithm will be
presented. The proposed algorithm is a non parametric supervised
statistical pattern recognition algorithm, i.e. it works under very mild
assumptions on the dataset. The performance of the algorithm will
be tested, togheter with a feature extraction technique that captures
the information on the closed two-dimensional contour of an image,
on images of industrial mineral ores.
Abstract: In this note first we define the notions of intuitionistic
fuzzy dual positive implicative hyper K-ideals of types
1,2,3,4 and intuitionistic fuzzy dual hyper K-ideals. Then we
give some classifications about these notions according to the
level subsets. Also by given some examples we show that these
notions are not equivalent, however we prove some theorems
which show that there are some relationships between these
notions. Finally we define the notions of product and antiproduct
of two fuzzy subsets and then give some theorems
about the relationships between the intuitionistic fuzzy dual
positive implicative hyper K-ideal of types 1,2,3,4 and their
(anti-)products, in particular we give a main decomposition
theorem.
Abstract: Patients with diabetes are susceptible to chronic foot
wounds which may be difficult to manage and slow to heal.
Diagnosis and treatment currently rely on the subjective judgement of
experienced professionals. An objective method of tissue assessment
is required. In this paper, a data fusion approach was taken to wound
tissue classification. The supervised Maximum Likelihood and
unsupervised Multi-Modal Expectation Maximisation algorithms
were used to classify tissues within simulated wound models by
weighting the contributions of both colour and 3D depth information.
It was found that, at low weightings, depth information could show
significant improvements in classification accuracy when compared
to classification by colour alone, particularly when using the
maximum likelihood method. However, larger weightings were
found to have an entirely negative effect on accuracy.
Abstract: In this study, a fuzzy similarity approach for Arabic
web pages classification is presented. The approach uses a fuzzy
term-category relation by manipulating membership degree for the
training data and the degree value for a test web page. Six measures
are used and compared in this study. These measures include:
Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and
Bounded Difference approaches. These measures are applied and
compared using 50 different Arabic web pages. Einstein measure was
gave best performance among the other measures. An analysis of
these measures and concluding remarks are drawn in this study.
Abstract: Classification is an important topic in machine learning
and bioinformatics. Many datasets have been introduced for
classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset.
The power of classification for each feature differs. In this study, we
suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the
features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII
and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement
of the classification accuracy.
Abstract: A dissimilarity measure between the empiric
characteristic functions of the subsamples associated to the different
classes in a multivariate data set is proposed. This measure can be
efficiently computed, and it depends on all the cases of each class. It
may be used to find groups of similar classes, which could be joined
for further analysis, or it could be employed to perform an
agglomerative hierarchical cluster analysis of the set of classes. The
final tree can serve to build a family of binary classification models,
offering an alternative approach to the multi-class SVM problem. We
have tested this dendrogram based SVM approach with the oneagainst-
one SVM approach over four publicly available data sets,
three of them being microarray data. Both performances have been
found equivalent, but the first solution requires a smaller number of
binary SVM models.
Abstract: Naïve Bayes classifiers are simple probabilistic
classifiers. Classification extracts patterns by using data file with a set
of labeled training examples and is currently one of the most
significant areas in data mining. However, Naïve Bayes assumes the
independence among the features. Structural learning among the
features thus helps in the classification problem. In this study, the use
of structural learning in Bayesian Network is proposed to be applied
where there are relationships between the features when using the
Naïve Bayes. The improvement in the classification using structural
learning is shown if there exist relationship between the features or
when they are not independent.
Abstract: Several studies have been carried out, using various techniques, including neural networks, to discriminate vigilance states in humans from electroencephalographic (EEG) signals, but we are still far from results satisfactorily useable results. The work presented in this paper aims at improving this status with regards to 2 aspects. Firstly, we introduce an original procedure made of the association of two neural networks, a self organizing map (SOM) and a learning vector quantization (LVQ), that allows to automatically detect artefacted states and to separate the different levels of vigilance which is a major breakthrough in the field of vigilance. Lastly and more importantly, our study has been oriented toward real-worked situation and the resulting model can be easily implemented as a wearable device. It benefits from restricted computational and memory requirements and data access is very limited in time. Furthermore, some ongoing works demonstrate that this work should shortly results in the design and conception of a non invasive electronic wearable device.
Abstract: Many research works are carried out on the analysis of
traces in a digital learning environment. These studies produce large
volumes of usage tracks from the various actions performed by a
user. However, to exploit these data, compare and improve
performance, several issues are raised. To remedy this, several works
deal with this problem seen recently. This research studied a series of
questions about format and description of the data to be shared. Our
goal is to share thoughts on these issues by presenting our experience
in the analysis of trace-based log files, comparing several approaches
used in automatic classification applied to e-learning platforms.
Finally, the obtained results are discussed.
Abstract: Text document categorization involves large amount
of data or features. The high dimensionality of features is a
troublesome and can affect the performance of the classification.
Therefore, feature selection is strongly considered as one of the
crucial part in text document categorization. Selecting the best
features to represent documents can reduce the dimensionality of
feature space hence increase the performance. There were many
approaches has been implemented by various researchers to
overcome this problem. This paper proposed a novel hybrid approach
for feature selection in text document categorization based on Ant
Colony Optimization (ACO) and Information Gain (IG). We also
presented state-of-the-art algorithms by several other researchers.
Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.