Abstract: Feature selection and attribute reduction are crucial
problems, and widely used techniques in the field of machine
learning, data mining and pattern recognition to overcome the
well-known phenomenon of the Curse of Dimensionality. This paper
presents a feature selection method that efficiently carries out attribute
reduction, thereby selecting the most informative features of a dataset.
It consists of two components: 1) a measure for feature subset
evaluation, and 2) a search strategy. For the evaluation measure,
we have employed the fuzzy-rough dependency degree (FRFDD)
of the lower approximation-based fuzzy-rough feature selection
(L-FRFS) due to its effectiveness in feature selection. As for the
search strategy, a modified version of a binary shuffled frog leaping
algorithm is proposed (B-SFLA). The proposed feature selection
method is obtained by hybridizing the B-SFLA with the FRDD. Nine
classifiers have been employed to compare the proposed approach
with several existing methods over twenty two datasets, including
nine high dimensional and large ones, from the UCI repository.
The experimental results demonstrate that the B-SFLA approach
significantly outperforms other metaheuristic methods in terms of the
number of selected features and the classification accuracy.
Abstract: One of the biggest challenges in nonparametric
regression is the curse of dimensionality. Additive models are known
to overcome this problem by estimating only the individual additive
effects of each covariate. However, if the model is misspecified, the
accuracy of the estimator compared to the fully nonparametric one
is unknown. In this work the efficiency of completely nonparametric
regression estimators such as the Loess is compared to the estimators
that assume additivity in several situations, including additive and
non-additive regression scenarios. The comparison is done by
computing the oracle mean square error of the estimators with regards
to the true nonparametric regression function. Then, a backward
elimination selection procedure based on the Akaike Information
Criteria is proposed, which is computed from either the additive or
the nonparametric model. Simulations show that if the additive model
is misspecified, the percentage of time it fails to select important
variables can be higher than that of the fully nonparametric approach.
A dimension reduction step is included when nonparametric estimator
cannot be computed due to the curse of dimensionality. Finally, the
Boston housing dataset is analyzed using the proposed backward
elimination procedure and the selected variables are identified.
Abstract: Hyperspectral imagery (HSI) typically provides a
wealth of information captured in a wide range of the
electromagnetic spectrum for each pixel in the image. Hence, a
pixel in HSI is a high-dimensional vector of intensities with a
large spectral range and a high spectral resolution. Therefore, the
semantic interpretation is a challenging task of HSI analysis. We
focused in this paper on object classification as HSI semantic
interpretation. However, HSI classification still faces some issues,
among which are the following: The spatial variability of spectral
signatures, the high number of spectral bands, and the high cost
of true sample labeling. Therefore, the high number of spectral
bands and the low number of training samples pose the problem of
the curse of dimensionality. In order to resolve this problem, we
propose to introduce the process of dimensionality reduction trying
to improve the classification of HSI. The presented approach is a
semi-supervised band selection method based on spatial hypergraph
embedding model to represent higher order relationships with
different weights of the spatial neighbors corresponding to the
centroid of pixel. This semi-supervised band selection has been
developed to select useful bands for object classification. The
presented approach is evaluated on AVIRIS and ROSIS HSIs
and compared to other dimensionality reduction methods. The
experimental results demonstrate the efficacy of our approach
compared to many existing dimensionality reduction methods for
HSI classification.
Abstract: Locality Sensitive Hashing (LSH) is one of the most
promising techniques for solving nearest neighbour search problem in
high dimensional space. Euclidean LSH is the most popular variation
of LSH that has been successfully applied in many multimedia
applications. However, the Euclidean LSH presents limitations that
affect structure and query performances. The main limitation of the
Euclidean LSH is the large memory consumption. In order to achieve
a good accuracy, a large number of hash tables is required. In this
paper, we propose a new hashing algorithm to overcome the storage
space problem and improve query time, while keeping a good
accuracy as similar to that achieved by the original Euclidean LSH.
The Experimental results on a real large-scale dataset show that the
proposed approach achieves good performances and consumes less
memory than the Euclidean LSH.
Abstract: A novel path planning approach is presented to solve
optimal path in stochastic, time-varying networks under priori traffic
information. Most existing studies make use of dynamic programming
to find optimal path. However, those methods are proved to
be unable to obtain global optimal value, moreover, how to design
efficient algorithms is also another challenge.
This paper employs a decision theoretic framework for defining
optimal path: for a given source S and destination D in urban transit
network, we seek an S - D path of lowest expected travel time
where its link travel times are discrete random variables. To solve
deficiency caused by the methods of dynamic programming, such as
curse of dimensionality and violation of optimal principle, an integer
programming model is built to realize assignment of discrete travel
time variables to arcs. Simultaneously, pruning techniques are also
applied to reduce computation complexity in the algorithm. The final
experiments show the feasibility of the novel approach.