Abstract: Modern computing technology enters the era of parallel computing with the trend of sustainable and scalable parallelism. Single Instruction Multiple Data (SIMD) is an important way to go along with the trend. It is able to gather more and more computing ability by increasing the number of processor cores without the need of modifying the program. Meanwhile, in the field of scientific computing and engineering design, many computation intensive applications are facing the challenge of increasingly large amount of data. Data parallel computing will be an important way to further improve the performance of these applications. In this paper, we take the accurate collision detection in building information modeling as an example. We demonstrate a model for constructing a data parallel algorithm. According to the model, a complex object is decomposed into the sets of simple objects; collision detection among complex objects is converted into those among simple objects. The resulting algorithm is a typical SIMD algorithm, and its advantages in parallelism and scalability is unparalleled in respect to the traditional algorithms.
Abstract: The aim of this paper is to propose a general
framework for storing, analyzing, and extracting knowledge from
two-dimensional echocardiographic images, color Doppler images,
non-medical images, and general data sets. A number of high
performance data mining algorithms have been used to carry out this
task. Our framework encompasses four layers namely physical
storage, object identification, knowledge discovery, user level.
Techniques such as active contour model to identify the cardiac
chambers, pixel classification to segment the color Doppler echo
image, universal model for image retrieval, Bayesian method for
classification, parallel algorithms for image segmentation, etc., were
employed. Using the feature vector database that have been
efficiently constructed, one can perform various data mining tasks
like clustering, classification, etc. with efficient algorithms along
with image mining given a query image. All these facilities are
included in the framework that is supported by state-of-the-art user
interface (UI). The algorithms were tested with actual patient data
and Coral image database and the results show that their performance
is better than the results reported already.
Abstract: This work is the first dowel in a rather wide research
activity in collaboration with Euro Mediterranean Center for Climate
Changes, aimed at introducing scalable approaches in Ocean
Circulation Models. We discuss designing and implementation of
a parallel algorithm for solving the Variational Data Assimilation
(DA) problem on Graphics Processing Units (GPUs). The algorithm
is based on the fully scalable 3DVar DA model, previously proposed
by the authors, which uses a Domain Decomposition approach
(we refer to this model as the DD-DA model). We proceed with
an incremental porting process consisting of 3 distinct stages:
requirements and source code analysis, incremental development of
CUDA kernels, testing and optimization. Experiments confirm the
theoretic performance analysis based on the so-called scale up factor
demonstrating that the DD-DA model can be suitably mapped on
GPU architectures.
Abstract: In this paper, getting an high-efficiency parallel algorithm to solve sparse block pentadiagonal linear systems suitable for vectors and parallel processors, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are appropriate when the desired target is to maximize parallelism. Moreover, some theoretical results about these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block pentadiagonal H-matrices is also described. In addition, the availability of these preconditioners is illustrated with some numerical experiments arising from two dimensional biharmonic equation.
Abstract: A strip domain decomposition parallel algorithm for fast direct Poisson solver is presented on a 3D Cartesian staggered grid. The parallel algorithm follows the principles of sequential algorithm for fast direct Poisson solver. Both Dirichlet and Neumann boundary conditions are addressed. Several test cases are likewise addressed in order to shed light on accuracy and efficiency in the strip domain parallelization algorithm. Actually the current implementation shows a very high efficiency when dealing with a large grid mesh up to 3.6 * 109 under massive parallel approach, which explicitly demonstrates that the proposed algorithm is ready for massive parallel computing.
Abstract: This paper describes a new algorithm of arrangement
in parallel, based on Odd-Even Mergesort, called division and
concurrent mixes. The main idea of the algorithm is to achieve that
each processor uses a sequential algorithm for ordering a part of the
vector, and after that, for making the processors work in pairs in
order to mix two of these sections ordered in a greater one, also
ordered; after several iterations, the vector will be completely
ordered. The paper describes the implementation of the new
algorithm on a Message Passing environment (such as MPI). Besides,
it compares the obtained experimental results with the quicksort
sequential algorithm and with the parallel implementations (also on
MPI) of the algorithms quicksort and bitonic sort. The comparison
has been realized in an 8 processors cluster under GNU/Linux which
is running on a unique PC processor.
Abstract: The spectral action balance equation is an equation that
used to simulate short-crested wind-generated waves in shallow water
areas such as coastal regions and inland waters. This equation consists
of two spatial dimensions, wave direction, and wave frequency which
can be solved by finite difference method. When this equation with
dominating convection term are discretized using central differences,
stability problems occur when the grid spacing is chosen too coarse.
In this paper, we introduce the splitting upwind schemes for avoiding
stability problems and prove that it is consistent to the upwind scheme
with same accuracy. The splitting upwind schemes was adopted
to split the wave spectral action balance equation into four onedimensional
problems, which for each small problem obtains the
independently tridiagonal linear systems. For each smaller system
can be solved by direct or iterative methods at the same time which
is very fast when performed by a multi-processor computer.
Abstract: In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.
Abstract: In this paper developed and realized absolutely new
algorithm for solving three-dimensional Poisson equation. This
equation used in research of turbulent mixing, computational fluid
dynamics, atmospheric front, and ocean flows and so on. Moreover in
the view of rising productivity of difficult calculation there was
applied the most up-to-date and the most effective parallel
programming technology - MPI in combination with OpenMP
direction, that allows to realize problems with very large data
content. Resulted products can be used in solving of important
applications and fundamental problems in mathematics and physics.
Abstract: In order to make conventional implicit algorithm to be applicable in large scale parallel computers , an interface prediction and correction of discontinuous finite element method is presented to solve time-dependent neutron transport equations under 2-D cylindrical geometry. Domain decomposition is adopted in the computational domain.The numerical experiments show that our parallel algorithm with explicit prediction and implicit correction has good precision, parallelism and simplicity. Especially, it can reach perfect speedup even on hundreds of processors for large-scale problems.
Abstract: A parallel block method based on Backward
Differentiation Formulas (BDF) is developed for the parallel solution
of stiff Ordinary Differential Equations (ODEs). Most common
methods for solving stiff systems of ODEs are based on implicit
formulae and solved using Newton iteration which requires repeated
solution of systems of linear equations with coefficient matrix, I -
hβJ . Here, J is the Jacobian matrix of the problem. In this paper,
the matrix operations is paralleled in order to reduce the cost of the
iterations. Numerical results are given to compare the speedup and
efficiency of parallel algorithm and that of sequential algorithm.
Abstract: Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.
Abstract: The shortest path question is in a graph theory model
question, and it is applied in many fields. The most short-path
question may divide into two kinds: Single sources most short-path,
all apexes to most short-path. This article mainly introduces the
problem of all apexes to most short-path, and gives a new parallel
algorithm of all apexes to most short-path according to the Dijkstra
algorithm. At last this paper realizes the parallel algorithms in the
technology of C # multithreading.
Abstract: Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.
Abstract: The spectral action balance equation is an equation that
used to simulate short-crested wind-generated waves in shallow water
areas such as coastal regions and inland waters. This equation consists
of two spatial dimensions, wave direction, and wave frequency which
can be solved by finite difference method. When this equation with
dominating propagation velocity terms are discretized using central
differences, stability problems occur when the grid spacing is chosen
too coarse. In this paper, we introduce the splitting modified donorcell
scheme for avoiding stability problems and prove that it is
consistent to the modified donor-cell scheme with same accuracy. The
splitting modified donor-cell scheme was adopted to split the wave
spectral action balance equation into four one-dimensional problems,
which for each small problem obtains the independently tridiagonal
linear systems. For each smaller system can be solved by direct or
iterative methods at the same time which is very fast when performed
by a multi-cores computer.
Abstract: In this paper, parallelism in the solution of Ordinary
Differential Equations (ODEs) to increase the computational speed is
studied. The focus is the development of parallel algorithm of the two
point Block Backward Differentiation Formulas (PBBDF) that can
take advantage of the parallel architecture in computer technology.
Parallelism is obtained by using Message Passing Interface (MPI).
Numerical results are given to validate the efficiency of the PBBDF
implementation as compared to the sequential implementation.
Abstract: The goal of data mining algorithms is to discover
useful information embedded in large databases. One of the most
important data mining problems is discovery of frequently occurring
patterns in sequential data. In a multidimensional sequence each
event depends on more than one dimension. The search space is quite
large and the serial algorithms are not scalable for very large
datasets. To address this, it is necessary to study scalable parallel
implementations of sequence mining algorithms.
In this paper, we present a model for multidimensional sequence
and describe a parallel algorithm based on data parallelism.
Simulation experiments show good load balancing and scalable and
acceptable speedup over different processors and problem sizes and
demonstrate that our approach can works efficiently in a real parallel
computing environment.
Abstract: Computation of facility location problem for every
location in the country is not easy simultaneously. Solving the
problem is described by using cluster computing. A technique is to
design parallel algorithm by using local search with single swap
method in order to solve that problem on clusters. Parallel
implementation is done by the use of portable parallel programming,
Message Passing Interface (MPI), on Microsoft Windows Compute
Cluster. In this paper, it presents the algorithm that used local search
with single swap method and implementation of the system of a
facility to be opened by using MPI on cluster. If large datasets are
considered, the process of calculating a reasonable cost for a facility
becomes time consuming. The result shows parallel computation of
facility location problem on cluster speedups and scales well as
problem size increases.