Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: Many-core GPUs provide high computing ability and
substantial bandwidth; however, optimizing irregular applications
like SpMV on GPUs becomes a difficult but meaningful task. In this
paper, we propose a novel method to improve the performance of
SpMV on GPUs. A new storage format called HYB-R is proposed to
exploit GPU architecture more efficiently. The COO portion of the
matrix is partitioned recursively into a ELL portion and a COO
portion in the process of creating HYB-R format to ensure that there
are as many non-zeros as possible in ELL format. The method of
partitioning the matrix is an important problem for HYB-R kernel, so
we also try to tune the parameters to partition the matrix for higher
performance. Experimental results show that our method can get
better performance than the fastest kernel (HYB) in NVIDIA-s
SpMV library with as high as 17% speedup.
Abstract: The Continuously Adaptive Mean-Shift (CamShift)
algorithm, incorporating scene depth information is combined with
the l1-minimization sparse representation based method to form a
hybrid kernel and state space-based tracking algorithm. We take
advantage of the increased efficiency of the former with the
robustness to occlusion property of the latter. A simple interchange
scheme transfers control between algorithms based upon drift and
occlusion likelihood. It is quantified by the projection of target
candidates onto a depth map of the 2D scene obtained with a low cost
stereo vision webcam. Results are improved tracking in terms of drift
over each algorithm individually, in a challenging practical outdoor
multiple occlusion test case.
Abstract: This paper considers a robust recovery of sparse frequencies
from partial phase-only measurements. With the proposed
method, sparse frequencies can be reconstructed, which makes full
use of the sparse distribution in the Fourier representation of the
complex-valued time signal. Simulation experiments illustrate the
proposed method-s advantages over conventional methods in both
noiseless and additive white Gaussian noise cases.
Abstract: A number of routing algorithms based on learning
automata technique have been proposed for communication
networks. How ever, there has been little work on the effects of
variation of graph scarcity on the performance of these algorithms. In
this paper, a comprehensive study is launched to investigate the
performance of LASPA, the first learning automata based solution to
the dynamic shortest path routing, across different graph structures
with varying scarcities. The sensitivity of three main performance
parameters of the algorithm, being average number of processed
nodes, scanned edges and average time per update, to variation in
graph scarcity is reported. Simulation results indicate that the LASPA
algorithm can adapt well to the scarcity variation in graph structure
and gives much better outputs than the existing dynamic and fixed
algorithms in terms of performance criteria.
Abstract: The present study presents a new approach to automatic
data clustering and classification problems in large and complex
databases and, at the same time, derives specific types of explicit rules
describing each cluster. The method works well in both sparse and
dense multidimensional data spaces. The members of the data space
can be of the same nature or represent different classes. A number
of N-dimensional ellipsoids are used for enclosing the data clouds.
Due to the geometry of an ellipsoid and its free rotation in space
the detection of clusters becomes very efficient. The method is based
on genetic algorithms that are used for the optimization of location,
orientation and geometric characteristics of the hyper-ellipsoids. The
proposed approach can serve as a basis for the development of
general knowledge systems for discovering hidden knowledge and
unexpected patterns and rules in various large databases.
Abstract: In the last few years, three multivariate spectral
analysis techniques namely, Principal Component Analysis (PCA),
Independent Component Analysis (ICA) and Non-negative Matrix
Factorization (NMF) have emerged as effective tools for oscillation
detection and isolation. While the first method is used in determining
the number of oscillatory sources, the latter two methods
are used to identify source signatures by formulating the detection
problem as a source identification problem in the spectral domain.
In this paper, we present a critical drawback of the underlying linear
(mixing) model which strongly limits the ability of the associated
source separation methods to determine the number of sources
and/or identify the physical source signatures. It is shown that the
assumed mixing model is only valid if each unit of the process gives
equal weighting (all-pass filter) to all oscillatory components in its
inputs. This is in contrast to the fact that each unit, in general, acts
as a filter with non-uniform frequency response. Thus, the model
can only facilitate correct identification of a source with a single
frequency component, which is again unrealistic. To overcome
this deficiency, an iterative post-processing algorithm that correctly
identifies the physical source(s) is developed. An additional issue
with the existing methods is that they lack a procedure to pre-screen
non-oscillatory/noisy measurements which obscure the identification
of oscillatory sources. In this regard, a pre-screening procedure
is prescribed based on the notion of sparseness index to eliminate
the noisy and non-oscillatory measurements from the data set used
for analysis.