Abstract: Clustering algorithms help to understand the hidden
information present in datasets. A dataset may contain intrinsic and
nested clusters, the detection of which is of utmost importance. This
paper presents a Distributed Grid-based Density Clustering algorithm
capable of identifying arbitrary shaped embedded clusters as well as
multi-density clusters over large spatial datasets. For handling
massive datasets, we implemented our method using a 'sharednothing'
architecture where multiple computers are interconnected
over a network. Experimental results are reported to establish the
superiority of the technique in terms of scale-up, speedup as well as
cluster quality.
Abstract: In Thailand, the practice of pre-hospital Emergency
Medical Service (EMS) in each area reveals the different growth
rates and effectiveness of the practices. Those can be found as the
diverse quality and quantity. To shorten the learning curve prior to
speed-up the practices in other areas, story telling and lessons learnt
from the effective practices are valued as meaningful knowledge. To
this paper, it was to ascertain the factors, lessons learnt and best
practices that have impact as contributing to the success of prehospital
EMS system. Those were formulized as model prior to
speedup the practice in other areas. To develop the model, Malcolm
Baldrige National Quality Award (MBNQA), which is widely
recognized as a framework for organizational quality assessment and
improvement, was chosen as the discussion framework. Remarkably,
this study was based on the consideration of knowledge capture;
however it was not to complete the loop of knowledge activities.
Nevertheless, it was to highlight the recognition of knowledge
capture, which is the initiation of knowledge management.
Abstract: The main mission of Ezilla is to provide a friendly
interface to access the virtual machine and quickly deploy the high
performance computing environment. Ezilla has been developed by
Pervasive Computing Team at National Center for High-performance
Computing (NCHC). Ezilla integrates the Cloud middleware,
virtualization technology, and Web-based Operating System (WebOS)
to form a virtual computer in distributed computing environment. In
order to upgrade the dataset and speedup, we proposed the sensor
observation system to deal with a huge amount of data in the
Cassandra database. The sensor observation system is based on the
Ezilla to store sensor raw data into distributed database. We adopt the
Ezilla Cloud service to create virtual machines and login into virtual
machine to deploy the sensor observation system. Integrating the
sensor observation system with Ezilla is to quickly deploy experiment
environment and access a huge amount of data with distributed
database that support the replication mechanism to protect the data
security.
Abstract: Histogram plays an important statistical role in digital
image processing. However, the existing quantum image models are
deficient to do this kind of image statistical processing because
different gray scales are not distinguishable. In this paper, a novel
quantum image representation model is proposed firstly in which the
pixels with different gray scales can be distinguished and operated
simultaneously. Based on the new model, a fast quantum algorithm of
constructing histogram for quantum image is designed. Performance
comparison reveals that the new quantum algorithm could achieve an
approximately quadratic speedup than the classical counterpart. The
proposed quantum model and algorithm have significant meanings for
the future researches of quantum image processing.
Abstract: The problem addressed herein is the efficient management of the Grid/Cluster intense computation involved, when the preconditioned Bi-CGSTAB Krylov method is employed for the iterative solution of the large and sparse linear system arising from the discretization of the Modified Helmholtz-Dirichlet problem by the Hermite Collocation method. Taking advantage of the Collocation ma-trix's red-black ordered structure we organize efficiently the whole computation and map it on a pipeline architecture with master-slave communication. Implementation, through MPI programming tools, is realized on a SUN V240 cluster, inter-connected through a 100Mbps and 1Gbps ethernet network,and its performance is presented by speedup measurements included.
Abstract: Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.
Abstract: Recently, an enhanced hexagon-based search (EHS)
algorithm was proposed to speedup the original hexagon-based search
(HS) by exploiting the group-distortion information of some evaluated
points. In this paper, a second version of the EHS is proposed with a
new point-oriented inner search technique which can further speedup
the HS in both large and small motion environments. Experimental
results show that the enhanced hexagon-based search version-2
(EHS2) is faster than the HS up to 34% with negligible PSNR
degradation.
Abstract: Many-core GPUs provide high computing ability and
substantial bandwidth; however, optimizing irregular applications
like SpMV on GPUs becomes a difficult but meaningful task. In this
paper, we propose a novel method to improve the performance of
SpMV on GPUs. A new storage format called HYB-R is proposed to
exploit GPU architecture more efficiently. The COO portion of the
matrix is partitioned recursively into a ELL portion and a COO
portion in the process of creating HYB-R format to ensure that there
are as many non-zeros as possible in ELL format. The method of
partitioning the matrix is an important problem for HYB-R kernel, so
we also try to tune the parameters to partition the matrix for higher
performance. Experimental results show that our method can get
better performance than the fastest kernel (HYB) in NVIDIA-s
SpMV library with as high as 17% speedup.
Abstract: The goal of data mining algorithms is to discover
useful information embedded in large databases. One of the most
important data mining problems is discovery of frequently occurring
patterns in sequential data. In a multidimensional sequence each
event depends on more than one dimension. The search space is quite
large and the serial algorithms are not scalable for very large
datasets. To address this, it is necessary to study scalable parallel
implementations of sequence mining algorithms.
In this paper, we present a model for multidimensional sequence
and describe a parallel algorithm based on data parallelism.
Simulation experiments show good load balancing and scalable and
acceptable speedup over different processors and problem sizes and
demonstrate that our approach can works efficiently in a real parallel
computing environment.
Abstract: In this paper, 3X3 routing nodes are proposed to
provide speedup and parallel processing capability in Data Vortex
network architectures. The new design not only significantly
improves network throughput and latency, but also eliminates the
need for distributive traffic control mechanism originally embedded
among nodes and the need for nodal buffering. The cost effectiveness
is studied by a comparison study with the previously proposed 2-
input buffered networks, and considerable performance enhancement
can be achieved with similar or lower cost of hardware. Unlike
previous implementation, the network leaves small probability of
contention, therefore, the packet drop rate must be kept low for such
implementation to be feasible and attractive, and it can be achieved
with proper choice of operation conditions.
Abstract: One of the major problems in genomic field is to perform sequence comparison on DNA and protein sequences. Executing sequence comparison on the DNA and protein data is a computationally intensive task. Sequence comparison is the basic step for all algorithms in protein sequences similarity. Parallel computing is an attractive solution to provide the computational power needed to speedup the lengthy process of the sequence comparison. Our main research is to enhance the protein sequence algorithm using dynamic programming method. In our approach, we parallelize the dynamic programming algorithm using multithreaded program to perform the sequence comparison and also developed a distributed protein database among many PCs using Remote Method Interface (RMI). As a result, we showed how different sizes of protein sequences data and computation of scoring matrix of these protein sequence on different number of processors affected the processing time and speed, as oppose to sequential processing.
Abstract: Computation of facility location problem for every
location in the country is not easy simultaneously. Solving the
problem is described by using cluster computing. A technique is to
design parallel algorithm by using local search with single swap
method in order to solve that problem on clusters. Parallel
implementation is done by the use of portable parallel programming,
Message Passing Interface (MPI), on Microsoft Windows Compute
Cluster. In this paper, it presents the algorithm that used local search
with single swap method and implementation of the system of a
facility to be opened by using MPI on cluster. If large datasets are
considered, the process of calculating a reasonable cost for a facility
becomes time consuming. The result shows parallel computation of
facility location problem on cluster speedups and scales well as
problem size increases.