Abstract: One of the approaches enabling people with amputated
limbs to establish some sort of interface with the real world includes
the utilization of the myoelectric signal (MES) from the remaining
muscles of those limbs. The MES can be used as a control input to a
multifunction prosthetic device. In this control scheme, known as the
myoelectric control, a pattern recognition approach is usually utilized
to discriminate between the MES signals that belong to different
classes of the forearm movements. Since the MES is recorded using
multiple channels, the feature vector size can become very large. In
order to reduce the computational cost and enhance the generalization
capability of the classifier, a dimensionality reduction method is
needed to identify an informative yet moderate size feature set. This
paper proposes a new fuzzy version of the well known Fisher-s
Linear Discriminant Analysis (LDA) feature projection technique.
Furthermore, based on the fact that certain muscles might contribute
more to the discrimination process, a novel feature weighting scheme
is also presented by employing Particle Swarm Optimization (PSO)
for estimating the weight of each feature. The new method, called
PSOFLDA, is tested on real MES datasets and compared with other
techniques to prove its superiority.
Abstract: This paper introduces a measure of similarity between
two clusterings of the same dataset produced by two different
algorithms, or even the same algorithm (K-means, for instance, with
different initializations usually produce different results in clustering
the same dataset). We then apply the measure to calculate the
similarity between pairs of clusterings, with special interest directed
at comparing the similarity between various machine clusterings and
human clustering of datasets. The similarity measure thus can be used
to identify the best (in terms of most similar to human) clustering
algorithm for a specific problem at hand. Experimental results
pertaining to the text categorization problem of a Portuguese corpus
(wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The
significance and other potential applications of the proposed measure
are discussed.
Abstract: The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Abstract: Missing data is a persistent problem in almost all
areas of empirical research. The missing data must be treated very
carefully, as data plays a fundamental role in every analysis.
Improper treatment can distort the analysis or generate biased results.
In this paper, we compare and contrast various imputation techniques
on missing data sets and make an empirical evaluation of these
methods so as to construct quality software models. Our empirical
study is based on NASA-s two public dataset. KC4 and KC1. The
actual data sets of 125 cases and 2107 cases respectively, without
any missing values were considered. The data set is used to create
Missing at Random (MAR) data Listwise Deletion(LD), Mean
Substitution(MS), Interpolation, Regression with an error term and
Expectation-Maximization (EM) approaches were used to compare
the effects of the various techniques.
Abstract: In this work, we present an automatic vehicle detection
system for airborne videos using combined features. We propose a
pixel-wise classification method for vehicle detection using Dynamic
Bayesian Networks. In spite of performing pixel-wise classification,
relations among neighboring pixels in a region are preserved in the
feature extraction process. The main novelty of the detection scheme is
that the extracted combined features comprise not only pixel-level
information but also region-level information. Afterwards, tracking is
performed on the detected vehicles. Tracking is performed using
efficient Kalman filter with dynamic particle sampling. Experiments
were conducted on a wide variety of airborne videos. We do not
assume prior information of camera heights, orientation, and target
object sizes in the proposed framework. The results demonstrate
flexibility and good generalization abilities of the proposed method on
a challenging dataset.
Abstract: Artificial Bee Colony (ABC) algorithm is a relatively new swarm intelligence technique for clustering. It produces higher
quality clusters compared to other population-based algorithms but with poor energy efficiency, cluster quality consistency and typically slower in convergence speed. Inspired by energy saving foraging behavior of natural honey bees this paper presents a Quality and Quantity Aware Artificial Bee Colony (Q2ABC) algorithm to improve quality of cluster identification, energy efficiency and convergence speed of the original ABC. To evaluate the performance of Q2ABC algorithm, experiments were conducted on a suite of ten benchmark UCI datasets. The results demonstrate Q2ABC outperformed ABC and K-means algorithm in the quality of clusters delivered.
Abstract: Recently, there are significant improvements in the
capabilities of mobile devices; rendering large terrain is tedious
because of the constraint in resources of mobile devices. This
paper focuses on the implementation of terrain rendering on
mobile device to observe some issues and current constraints
occurred. Experiments are performed using two datasets with
results based on rendering speed and appearance to ascertain both
the issues and constraints. The result shows a downfall of frame
rate performance because of the increase of triangles. Since the
resolution between computer and mobile device is different, the
terrain surface on mobile device looks more unrealistic compared
to on a computer. Thus, more attention in the development of
terrain rendering on mobile devices is required. The problems
highlighted in this paper will be the focus of future research and
will be a great importance for 3D visualization on mobile device.
Abstract: Fuzzy random variables have been introduced as an imprecise concept of numeric values for characterizing the imprecise knowledge. The descriptive parameters can be used to describe the primary features of a set of fuzzy random observations. In fuzzy environments, the expected values are usually represented as fuzzy-valued, interval-valued or numeric-valued descriptive parameters using various metrics. Instead of the concept of area metric that is usually adopted in the relevant studies, the numeric expected value is proposed by the concept of distance metric in this study based on two characters (fuzziness and randomness) of FRVs. Comparing with the existing measures, although the results show that the proposed numeric expected value is same with those using the different metric, if only triangular membership functions are used. However, the proposed approach has the advantages of intuitiveness and computational efficiency, when the membership functions are not triangular types. An example with three datasets is provided for verifying the proposed approach.
Abstract: The study of the interaction between humans and
computers has been emerging during the last few years. This
interaction will be more powerful if computers are able to perceive
and respond to human nonverbal communication such as emotions. In
this study, we present the image-based approach to emotion
classification through lower facial expression. We employ a set of
feature points in the lower face image according to the particular face
model used and consider their motion across each emotive expression
of images. The vector of displacements of all feature points input to
the Adaptive Support Vector Machines (A-SVMs) classifier that
classify it into seven basic emotions scheme, namely neutral, angry,
disgust, fear, happy, sad and surprise. The system was tested on the
Japanese Female Facial Expression (JAFFE) dataset of frontal view
facial expressions [7]. Our experiments on emotion classification
through lower facial expressions demonstrate the robustness of
Adaptive SVM classifier and verify the high efficiency of our
approach.
Abstract: A large amount of valuable information is available in
plain text clinical reports. New techniques and technologies are
applied to extract information from these reports. In this study, we
developed a domain based software system to transform 600
Otorhinolaryngology discharge notes to a structured form for
extracting clinical data from the discharge notes. In order to decrease
the system process time discharge notes were transformed into a data
table after preprocessing. Several word lists were constituted to
identify common section in the discharge notes, including patient
history, age, problems, and diagnosis etc. N-gram method was used
for discovering terms co-Occurrences within each section. Using this
method a dataset of concept candidates has been generated for the
validation step, and then Predictive Apriori algorithm for Association
Rule Mining (ARM) was applied to validate candidate concepts.
Abstract: Developing an accurate classifier for high dimensional microarray datasets is a challenging task due to availability of small sample size. Therefore, it is important to determine a set of relevant genes that classify the data well. Traditionally, gene selection method often selects the top ranked genes according to their discriminatory power. Often these genes are correlated with each other resulting in redundancy. In this paper, we have proposed a hybrid method using feature ranking and wrapper method (Genetic Algorithm with multiclass SVM) to identify a set of relevant genes that classify the data more accurately. A new fitness function for genetic algorithm is defined that focuses on selecting the smallest set of genes that provides maximum accuracy. Experiments have been carried on four well-known datasets1. The proposed method provides better results in comparison to the results found in the literature in terms of both classification accuracy and number of genes selected.
Abstract: The feature of HIV genome is in a wide range because
of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point,
R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to
classify by using the coreceptors of HIV genome.
The aim of this study is to develop the optimal Multilayer
Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer
was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the
signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model
(AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4
viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.
Abstract: In the past decade, artificial neural networks (ANNs)
have been regarded as an instrument for problem-solving and
decision-making; indeed, they have already done with a substantial
efficiency and effectiveness improvement in industries and businesses.
In this paper, the Back-Propagation neural Networks (BPNs) will be
modulated to demonstrate the performance of the collaborative
forecasting (CF) function of a Collaborative Planning, Forecasting and
Replenishment (CPFR®) system. CPFR functions the balance between
the sufficient product supply and the necessary customer demand in a
Supply and Demand Chain (SDC). Several classical standard BPN will
be grouped, collaborated and exploited for the easy implementation of
the proposed modular ANN framework based on the topology of a
SDC. Each individual BPN is applied as a modular tool to perform the
task of forecasting SKUs (Stock-Keeping Units) levels that are
managed and supervised at a POS (point of sale), a wholesaler, and a
manufacturer in an SDC. The proposed modular BPN-based CF
system will be exemplified and experimentally verified using lots of
datasets of the simulated SDC. The experimental results showed that a
complex CF problem can be divided into a group of simpler
sub-problems based on the single independent trading partners
distributed over SDC, and its SKU forecasting accuracy was satisfied
when the system forecasted values compared to the original simulated
SDC data. The primary task of implementing an autonomous CF
involves the study of supervised ANN learning methodology which
aims at making “knowledgeable" decision for the best SKU sales plan
and stocks management.
Abstract: As the majority of faults are found in a few of its
modules so there is a need to investigate the modules that are
affected severely as compared to other modules and proper
maintenance need to be done in time especially for the critical
applications. As, Neural networks, which have been already applied
in software engineering applications to build reliability growth
models predict the gross change or reusability metrics. Neural
networks are non-linear sophisticated modeling techniques that are
able to model complex functions. Neural network techniques are
used when exact nature of input and outputs is not known. A key
feature is that they learn the relationship between input and output
through training. In this present work, various Neural Network Based
techniques are explored and comparative analysis is performed for
the prediction of level of need of maintenance by predicting level
severity of faults present in NASA-s public domain defect dataset.
The comparison of different algorithms is made on the basis of Mean
Absolute Error, Root Mean Square Error and Accuracy Values. It is
concluded that Generalized Regression Networks is the best
algorithm for classification of the software components into different
level of severity of impact of the faults. The algorithm can be used to
develop model that can be used for identifying modules that are
heavily affected by the faults.
Abstract: This paper presents an information retrieval model on
XML documents based on tree matching. Queries and documents are
represented by extended trees. An extended tree is built starting from
the original tree, with additional weighted virtual links between each
node and its indirect descendants allowing to directly reach each
descendant. Therefore only one level separates between each node
and its indirect descendants. This allows to compare the user query
and the document with flexibility and with respect to the structural
constraints of the query. The content of each node is very important to
decide weither a document element is relevant or not, thus the content
should be taken into account in the retrieval process. We separate
between the structure-based and the content-based retrieval processes.
The content-based score of each node is commonly based on the
well-known Tf × Idf criteria. In this paper, we compare between
this criteria and another one we call Tf × Ief. The comparison
is based on some experiments into a dataset provided by INEX1 to
show the effectiveness of our approach on one hand and those of
both weighting functions on the other.
Abstract: In the last few years, the Semantic Web gained scientific acceptance as a means of relationships identification in knowledge base, widely known by semantic association. Query about complex relationships between entities is a strong requirement for many applications in analytical domains. In bioinformatics for example, it is critical to extract exchanges between proteins. Currently, the widely known result of such queries is to provide paths between connected entities from data graph. However, they do not always give good results while facing the user need by the best association or a set of limited best association, because they only consider all existing paths but ignore the path evaluation. In this paper, we present an approach for supporting association discovery queries. Our proposal includes (i) a query language PmSPRQL which provides a multiparadigm query expressions for association extraction and (ii) some quantification measures making easy the process of association ranking. The originality of our proposal is demonstrated by a performance evaluation of our approach on real world datasets.
Abstract: Mammographic images and data analysis to
facilitate modelling or computer aided diagnostic (CAD) software development should best be done using a common database that can handle various mammographic image file
formats and relate these to other patient information.
This would optimize the use of the data as both primary
reporting and enhanced information extraction of research data could be performed from the single dataset. One desired
improvement is the integration of DICOM file header information into the database, as an efficient and reliable source of supplementary patient information intrinsically
available in the images.
The purpose of this paper was to design a suitable database to link and integrate different types of image files and gather common information that can be further used for research
purposes. An interface was developed for accessing, adding,
updating, modifying and extracting data from the common
database, enhancing the future possible application of the data in CAD processing.
Technically, future developments envisaged include the creation of an advanced search function to selects image files
based on descriptor combinations. Results can be further used for specific CAD processing and other research. Design of a
user friendly configuration utility for importing of the required fields from the DICOM files must be done.
Abstract: There is lot of work done in prediction of the fault proneness of the software systems. But, it is the severity of the faults that is more important than number of faults existing in the developed system as the major faults matters most for a developer and those major faults needs immediate attention. In this paper, we tried to predict the level of impact of the existing faults in software systems. Neuro-Fuzzy based predictor models is applied NASA-s public domain defect dataset coded in C programming language. As Correlation-based Feature Selection (CFS) evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. So, CFS is used for the selecting the best metrics that have highly correlated with level of severity of faults. The results are compared with the prediction results of Logistic Models (LMT) that was earlier quoted as the best technique in [17]. The results are recorded in terms of Accuracy, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The results show that Neuro-fuzzy based model provide a relatively better prediction accuracy as compared to other models and hence, can be used for the modeling of the level of impact of faults in function based systems.
Abstract: Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.
Abstract: Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.