A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.

A Study of Distinctive Models for Pre-hospital EMS in Thailand: Knowledge Capture

In Thailand, the practice of pre-hospital Emergency Medical Service (EMS) in each area reveals the different growth rates and effectiveness of the practices. Those can be found as the diverse quality and quantity. To shorten the learning curve prior to speed-up the practices in other areas, story telling and lessons learnt from the effective practices are valued as meaningful knowledge. To this paper, it was to ascertain the factors, lessons learnt and best practices that have impact as contributing to the success of prehospital EMS system. Those were formulized as model prior to speedup the practice in other areas. To develop the model, Malcolm Baldrige National Quality Award (MBNQA), which is widely recognized as a framework for organizational quality assessment and improvement, was chosen as the discussion framework. Remarkably, this study was based on the consideration of knowledge capture; however it was not to complete the loop of knowledge activities. Nevertheless, it was to highlight the recognition of knowledge capture, which is the initiation of knowledge management.

Ezilla Cloud Service with Cassandra Database for Sensor Observation System

The main mission of Ezilla is to provide a friendly interface to access the virtual machine and quickly deploy the high performance computing environment. Ezilla has been developed by Pervasive Computing Team at National Center for High-performance Computing (NCHC). Ezilla integrates the Cloud middleware, virtualization technology, and Web-based Operating System (WebOS) to form a virtual computer in distributed computing environment. In order to upgrade the dataset and speedup, we proposed the sensor observation system to deal with a huge amount of data in the Cassandra database. The sensor observation system is based on the Ezilla to store sensor raw data into distributed database. We adopt the Ezilla Cloud service to create virtual machines and login into virtual machine to deploy the sensor observation system. Integrating the sensor observation system with Ezilla is to quickly deploy experiment environment and access a huge amount of data with distributed database that support the replication mechanism to protect the data security.

A Quantum Algorithm of Constructing Image Histogram

Histogram plays an important statistical role in digital image processing. However, the existing quantum image models are deficient to do this kind of image statistical processing because different gray scales are not distinguishable. In this paper, a novel quantum image representation model is proposed firstly in which the pixels with different gray scales can be distinguished and operated simultaneously. Based on the new model, a fast quantum algorithm of constructing histogram for quantum image is designed. Performance comparison reveals that the new quantum algorithm could achieve an approximately quadratic speedup than the classical counterpart. The proposed quantum model and algorithm have significant meanings for the future researches of quantum image processing.

Grid Computing for the Bi-CGSTAB Applied to the Solution of the Modified Helmholtz Equation

The problem addressed herein is the efficient management of the Grid/Cluster intense computation involved, when the preconditioned Bi-CGSTAB Krylov method is employed for the iterative solution of the large and sparse linear system arising from the discretization of the Modified Helmholtz-Dirichlet problem by the Hermite Collocation method. Taking advantage of the Collocation ma-trix's red-black ordered structure we organize efficiently the whole computation and map it on a pipeline architecture with master-slave communication. Implementation, through MPI programming tools, is realized on a SUN V240 cluster, inter-connected through a 100Mbps and 1Gbps ethernet network,and its performance is presented by speedup measurements included.

Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.

New Enhanced Hexagon-Based Search Using Point-Oriented Inner Search for Fast Block Motion Estimation

Recently, an enhanced hexagon-based search (EHS) algorithm was proposed to speedup the original hexagon-based search (HS) by exploiting the group-distortion information of some evaluated points. In this paper, a second version of the EHS is proposed with a new point-oriented inner search technique which can further speedup the HS in both large and small motion environments. Experimental results show that the enhanced hexagon-based search version-2 (EHS2) is faster than the HS up to 34% with negligible PSNR degradation.

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.

Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Speedup of Data Vortex Network Architecture

In this paper, 3X3 routing nodes are proposed to provide speedup and parallel processing capability in Data Vortex network architectures. The new design not only significantly improves network throughput and latency, but also eliminates the need for distributive traffic control mechanism originally embedded among nodes and the need for nodal buffering. The cost effectiveness is studied by a comparison study with the previously proposed 2- input buffered networks, and considerable performance enhancement can be achieved with similar or lower cost of hardware. Unlike previous implementation, the network leaves small probability of contention, therefore, the packet drop rate must be kept low for such implementation to be feasible and attractive, and it can be achieved with proper choice of operation conditions.

Parallezation Protein Sequence Similarity Algorithms using Remote Method Interface

One of the major problems in genomic field is to perform sequence comparison on DNA and protein sequences. Executing sequence comparison on the DNA and protein data is a computationally intensive task. Sequence comparison is the basic step for all algorithms in protein sequences similarity. Parallel computing is an attractive solution to provide the computational power needed to speedup the lengthy process of the sequence comparison. Our main research is to enhance the protein sequence algorithm using dynamic programming method. In our approach, we parallelize the dynamic programming algorithm using multithreaded program to perform the sequence comparison and also developed a distributed protein database among many PCs using Remote Method Interface (RMI). As a result, we showed how different sizes of protein sequences data and computation of scoring matrix of these protein sequence on different number of processors affected the processing time and speed, as oppose to sequential processing.

Solving Facility Location Problem on Cluster Computing

Computation of facility location problem for every location in the country is not easy simultaneously. Solving the problem is described by using cluster computing. A technique is to design parallel algorithm by using local search with single swap method in order to solve that problem on clusters. Parallel implementation is done by the use of portable parallel programming, Message Passing Interface (MPI), on Microsoft Windows Compute Cluster. In this paper, it presents the algorithm that used local search with single swap method and implementation of the system of a facility to be opened by using MPI on cluster. If large datasets are considered, the process of calculating a reasonable cost for a facility becomes time consuming. The result shows parallel computation of facility location problem on cluster speedups and scales well as problem size increases.