Abstract: Feature selection and attribute reduction are crucial
problems, and widely used techniques in the field of machine
learning, data mining and pattern recognition to overcome the
well-known phenomenon of the Curse of Dimensionality. This paper
presents a feature selection method that efficiently carries out attribute
reduction, thereby selecting the most informative features of a dataset.
It consists of two components: 1) a measure for feature subset
evaluation, and 2) a search strategy. For the evaluation measure,
we have employed the fuzzy-rough dependency degree (FRFDD)
of the lower approximation-based fuzzy-rough feature selection
(L-FRFS) due to its effectiveness in feature selection. As for the
search strategy, a modified version of a binary shuffled frog leaping
algorithm is proposed (B-SFLA). The proposed feature selection
method is obtained by hybridizing the B-SFLA with the FRDD. Nine
classifiers have been employed to compare the proposed approach
with several existing methods over twenty two datasets, including
nine high dimensional and large ones, from the UCI repository.
The experimental results demonstrate that the B-SFLA approach
significantly outperforms other metaheuristic methods in terms of the
number of selected features and the classification accuracy.
Abstract: We show that a simple transformation between the regular lattices (the square, the triangular, and the honeycomb) belonging to the same dimensionality can explain in a natural way the universality of the critical exponents found in phase transitions and critical phenomena. It suffices that the Hamiltonian and the lattice present similar writing forms. In addition, it appears that if a property can be calculated for a given lattice then it can be extrapolated simply to any other lattice belonging to the same dimensionality. In this study, we have restricted ourselves on the spectral power amplification (SPA), we note that the SPA does not have an effect on the critical exponents but does have an effect by the criticality temperature of the lattice; the generalisation to other lattice could be shown according to the containment principle.
Abstract: Electromyography (EMG) is one of the most important interfaces between humans and robots for rehabilitation. Decoding this signal helps to recognize muscle activation and converts it into smooth motion for the robots. Detecting each muscle’s pattern during walking and running is vital for improving the quality of a patient’s life. In this study, EMG data from 10 muscles in 10 subjects at 4 different speeds were analyzed. EMG signals are nonlinear with high dimensionality. To deal with this challenge, we extracted some features in time-frequency domain and used manifold learning and Laplacian Eigenmaps algorithm to find the intrinsic features that represent data in low-dimensional space. We then used the Bayesian classifier to identify various patterns of EMG signals for different muscles across a range of running speeds. The best result for vastus medialis muscle corresponds to 97.87±0.69 for sensitivity and 88.37±0.79 for specificity with 97.07±0.29 accuracy using Bayesian classifier. The results of this study provide important insight into human movement and its application for robotics research.
Abstract: One of the biggest challenges in nonparametric
regression is the curse of dimensionality. Additive models are known
to overcome this problem by estimating only the individual additive
effects of each covariate. However, if the model is misspecified, the
accuracy of the estimator compared to the fully nonparametric one
is unknown. In this work the efficiency of completely nonparametric
regression estimators such as the Loess is compared to the estimators
that assume additivity in several situations, including additive and
non-additive regression scenarios. The comparison is done by
computing the oracle mean square error of the estimators with regards
to the true nonparametric regression function. Then, a backward
elimination selection procedure based on the Akaike Information
Criteria is proposed, which is computed from either the additive or
the nonparametric model. Simulations show that if the additive model
is misspecified, the percentage of time it fails to select important
variables can be higher than that of the fully nonparametric approach.
A dimension reduction step is included when nonparametric estimator
cannot be computed due to the curse of dimensionality. Finally, the
Boston housing dataset is analyzed using the proposed backward
elimination procedure and the selected variables are identified.
Abstract: Hyperspectral imagery (HSI) typically provides a
wealth of information captured in a wide range of the
electromagnetic spectrum for each pixel in the image. Hence, a
pixel in HSI is a high-dimensional vector of intensities with a
large spectral range and a high spectral resolution. Therefore, the
semantic interpretation is a challenging task of HSI analysis. We
focused in this paper on object classification as HSI semantic
interpretation. However, HSI classification still faces some issues,
among which are the following: The spatial variability of spectral
signatures, the high number of spectral bands, and the high cost
of true sample labeling. Therefore, the high number of spectral
bands and the low number of training samples pose the problem of
the curse of dimensionality. In order to resolve this problem, we
propose to introduce the process of dimensionality reduction trying
to improve the classification of HSI. The presented approach is a
semi-supervised band selection method based on spatial hypergraph
embedding model to represent higher order relationships with
different weights of the spatial neighbors corresponding to the
centroid of pixel. This semi-supervised band selection has been
developed to select useful bands for object classification. The
presented approach is evaluated on AVIRIS and ROSIS HSIs
and compared to other dimensionality reduction methods. The
experimental results demonstrate the efficacy of our approach
compared to many existing dimensionality reduction methods for
HSI classification.
Abstract: The psychological profile has become one of the most important sources of information when it comes to individual selection and the hiring process in any organization. Psychological instruments are used to collect data about variables that are considered critically important for performance in work. However, because of conceptual chaos in organizational psychology, most of the information provided by psychological testing is not directly useful for Mexican human resources professionals to take hiring decisions. The aims of this paper are 1) to underline the lack of conceptual precision in theoretical testing foundations in Mexico and 2) presenting a reliability and validity analysis of a frustration tolerance instrument created as an alternative to a heuristically conduct individual assessment in organizations. First, a description of assessment conditions in Mexico is made. Second, an instrument and a theoretical framework is presented as an alternative to the assessment practices in the country. A total of 65 Psychology Iztacala Superior Studies Faculty students were assessed. Cronbach´s alpha coefficient was calculated and an exploratory factor analysis was carried out to prove the scale unidimensionality. Reliability analysis revealed good internal consistency of the scale (Cronbach’s α = 0.825). Factor analysis produced 4 factors for the scale. However, factor loadings and explained variation give proof to the scale unidimensionality. It is concluded that the instrument has good psychometric properties that will allow human resources professionals to collect useful data. Different possibilities to conduct psychological assessment are suggested for future development.
Abstract: The use of eXtensible Markup Language (XML) in
web, business and scientific databases lead to the development of
methods, techniques and systems to manage and analyze XML data.
Semi-structured documents suffer due to its heterogeneity and
dimensionality. XML structure and content mining represent
convergence for research in semi-structured data and text mining. As
the information available on the internet grows drastically, extracting
knowledge from XML documents becomes a harder task. Certainly,
documents are often so large that the data set returned as answer to a
query may also be very big to convey the required information. To
improve the query answering, a Semantic Tree Based Association
Rule (STAR) mining method is proposed. This method provides
intentional information by considering the structure, content and the
semantics of the content. The method is applied on Reuter’s dataset
and the results show that the proposed method outperforms well.
Abstract: Service quality is the highest requirement by users,
especially for the service in electronic government. During the past
decades, it has become a major area of academic investigation.
Considering this issue, there are a lot of researches that evaluated the
dimensions and e-service contexts. This study also identified the
dimensions of service quality, but focuses on a new concept and
provides a new methodology in developing measurement scales of
e-service quality such as information quality, service quality and
organization quality. Finally, this study will suggest a key factor to
evaluate e-government service quality better.
Abstract: As the network based technologies become
omnipresent, demands to secure networks/systems against threat
increase. One of the effective ways to achieve higher security is
through the use of intrusion detection systems (IDS), which are a
software tool to detect anomalous in the computer or network. In this
paper, an IDS has been developed using an improved machine
learning based algorithm, Locally Linear Neuro Fuzzy Model
(LLNF) for classification whereas this model is originally used for
system identification. A key technical challenge in IDS and LLNF
learning is the curse of high dimensionality. Therefore a feature
selection phase is proposed which is applicable to any IDS. While
investigating the use of three feature selection algorithms, in this
model, it is shown that adding feature selection phase reduces
computational complexity of our model. Feature selection algorithms
require the use of a feature goodness measure. The use of both a
linear and a non-linear measure - linear correlation coefficient and
mutual information- is investigated respectively
Abstract: The design of a pattern classifier includes an attempt
to select, among a set of possible features, a minimum subset of
weakly correlated features that better discriminate the pattern classes.
This is usually a difficult task in practice, normally requiring the
application of heuristic knowledge about the specific problem
domain. The selection and quality of the features representing each
pattern have a considerable bearing on the success of subsequent
pattern classification. Feature extraction is the process of deriving
new features from the original features in order to reduce the cost of
feature measurement, increase classifier efficiency, and allow higher
classification accuracy. Many current feature extraction techniques
involve linear transformations of the original pattern vectors to new
vectors of lower dimensionality. While this is useful for data
visualization and increasing classification efficiency, it does not
necessarily reduce the number of features that must be measured
since each new feature may be a linear combination of all of the
features in the original pattern vector. In this paper a new approach is
presented to feature extraction in which feature selection, feature
extraction, and classifier training are performed simultaneously using
a genetic algorithm. In this approach each feature value is first
normalized by a linear equation, then scaled by the associated weight
prior to training, testing, and classification. A knn classifier is used to
evaluate each set of feature weights. The genetic algorithm optimizes
a vector of feature weights, which are used to scale the individual
features in the original pattern vectors in either a linear or a nonlinear
fashion. By this approach, the number of features used in classifying
can be finely reduced.
Abstract: Dealing with hundreds of features in character
recognition systems is not unusual. This large number of features
leads to the increase of computational workload of recognition
process. There have been many methods which try to remove
unnecessary or redundant features and reduce feature dimensionality.
Besides because of the characteristics of Farsi scripts, it-s not
possible to apply other languages algorithms to Farsi directly. In this
paper some methods for feature subset selection using genetic
algorithms are applied on a Farsi optical character recognition (OCR)
system. Experimental results show that application of genetic
algorithms (GA) to feature subset selection in a Farsi OCR results in
lower computational complexity and enhanced recognition rate.
Abstract: The number of features required to represent an image
can be very huge. Using all available features to recognize objects
can suffer from curse dimensionality. Feature selection and
extraction is the pre-processing step of image mining. Main issues in
analyzing images is the effective identification of features and
another one is extracting them. The mining problem that has been
focused is the grouping of features for different shapes. Experiments
have been conducted by using shape outline as the features. Shape
outline readings are put through normalization and dimensionality
reduction process using an eigenvector based method to produce a
new set of readings. After this pre-processing step data will be
grouped through their shapes. Through statistical analysis, these
readings together with peak measures a robust classification and
recognition process is achieved. Tests showed that the suggested
methods are able to automatically recognize objects through their
shapes. Finally, experiments also demonstrate the system invariance
to rotation, translation, scale, reflection and to a small degree of
distortion.
Abstract: Current image-based individual human recognition
methods, such as fingerprints, face, or iris biometric modalities
generally require a cooperative subject, views from certain aspects,
and physical contact or close proximity. These methods cannot
reliably recognize non-cooperating individuals at a distance in the
real world under changing environmental conditions. Gait, which
concerns recognizing individuals by the way they walk, is a relatively
new biometric without these disadvantages. The inherent gait
characteristic of an individual makes it irreplaceable and useful in
visual surveillance.
In this paper, an efficient gait recognition system for human
identification by extracting two features namely width vector of
the binary silhouette and the MPEG-7-based region-based shape
descriptors is proposed. In the proposed method, foreground objects
i.e., human and other moving objects are extracted by estimating
background information by a Gaussian Mixture Model (GMM) and
subsequently, median filtering operation is performed for removing
noises in the background subtracted image. A moving target classification
algorithm is used to separate human being (i.e., pedestrian)
from other foreground objects (viz., vehicles). Shape and boundary
information is used in the moving target classification algorithm.
Subsequently, width vector of the outer contour of binary silhouette
and the MPEG-7 Angular Radial Transform coefficients are taken as
the feature vector. Next, the Principal Component Analysis (PCA)
is applied to the selected feature vector to reduce its dimensionality.
These extracted feature vectors are used to train an Hidden Markov
Model (HMM) for identification of some individuals. The proposed
system is evaluated using some gait sequences and the experimental
results show the efficacy of the proposed algorithm.