Abstract: The star network is one of the promising
interconnection networks for future high speed parallel computers, it
is expected to be one of the future-generation networks. The star
network is both edge and vertex symmetry, it was shown to have
many gorgeous topological proprieties also it is owns hierarchical
structure framework. Although much of the research work has been
done on this promising network in literature, it still suffers from
having enough algorithms for load balancing problem. In this paper
we try to work on this issue by investigating and proposing an
efficient algorithm for load balancing problem for the star network.
The proposed algorithm is called Star Clustered Dimension Exchange
Method SCDEM to be implemented on the star network. The
proposed algorithm is based on the Clustered Dimension Exchange
Method (CDEM). The SCDEM algorithm is shown to be efficient in
redistributing the load balancing as evenly as possible among all
nodes of different factor networks.
Abstract: Advances in processors architecture, such as multicore,
increase the size of complexity of parallel computer systems.
With multi-core architecture there are different parallel languages
that can be used to run parallel programs. One of these languages is
OpenMP which embedded in C/Cµ or FORTRAN. Because of this
new architecture and the complexity, it is very important to evaluate
the performance of OpenMP constructs, kernels, and application
program on multi-core systems. Performance is the activity of
collecting the information about the execution characteristics of a
program. Performance tools consists of at least three interfacing
software layers, including instrumentation, measurement, and
analysis. The instrumentation layer defines the measured
performance events. The measurement layer determines what
performance event is actually captured and how it is measured by the
tool. The analysis layer processes the performance data and
summarizes it into a form that can be displayed in performance tools.
In this paper, a number of OpenMP performance tools are surveyed,
explaining how each is used to collect, analyse, and display data
collection.
Abstract: Many problems in computer vision and image
processing present potential for parallel implementations through one
of the three major paradigms of geometric parallelism, algorithmic
parallelism and processor farming. Static process scheduling
techniques are used successfully to exploit geometric and algorithmic
parallelism, while dynamic process scheduling is better suited to
dealing with the independent processes inherent in the process
farming paradigm. This paper considers the application of parallel or
multi-computers to a class of problems exhibiting spatial data
characteristic of the geometric paradigm. However, by using
processor farming paradigm, a dynamic scheduling technique is
developed to suit the MIMD structure of the multi-computers. A
hybrid scheme of scheduling is also developed and compared with
the other schemes. The specific problem chosen for the investigation
is the Hough transform for line detection.
Abstract: In this paper, we propose a fast and efficient method for drawing very large-scale graph data. The conventional force-directed method proposed by Fruchterman and Rheingold (FR method) is well-known. It defines repulsive forces between every pair of nodes and attractive forces between connected nodes on a edge and calculates corresponding potential energy. An optimal layout is obtained by iteratively updating node positions to minimize the potential energy. Here, the positions of the nodes are updated every global timestep at the same time. In the proposed method, each node has its own individual time and time step, and nodes are updated at different frequencies depending on the local situation. The proposed method is inspired by the hierarchical individual time step method used for the high accuracy calculations for dense particle fields such as star clusters in astrophysical dynamics. Experiments show that the proposed method outperforms the original FR method in both speed and accuracy. We implement the proposed method on the MDGRAPE-3 PCI-X special purpose parallel computer and realize a speed enhancement of several hundred times.
Abstract: Fair share is one of the scheduling objectives supported on many production systems. However, fair share has been shown to cause performance problems for some users, especially the users with difficult jobs. This work is focusing on extending goaloriented parallel computer job scheduling policies to cover the fair share objective. Goal-oriented parallel computer job scheduling policies have been shown to achieve good scheduling performances when conflicting objectives are required. Goal-oriented policies achieve such good performance by using anytime combinatorial search techniques to find a good compromised schedule within a time limit. The experimental results show that the proposed goal-oriented parallel computer job scheduling policy (namely Tradeofffs( Tw:avgX)) achieves good scheduling performances and also provides good fair share performance.
Abstract: In order to make conventional implicit algorithm to be applicable in large scale parallel computers , an interface prediction and correction of discontinuous finite element method is presented to solve time-dependent neutron transport equations under 2-D cylindrical geometry. Domain decomposition is adopted in the computational domain.The numerical experiments show that our parallel algorithm with explicit prediction and implicit correction has good precision, parallelism and simplicity. Especially, it can reach perfect speedup even on hundreds of processors for large-scale problems.
Abstract: The evaluation of residual reliability of large sized
parallel computer interconnection systems is not practicable with
the existing methods. Under such conditions, one must go for
approximation techniques which provide the upper bound and lower
bound on this reliability. In this context, a new approximation method
for providing bounds on residual reliability is proposed here. The
proposed method is well supported by two algorithms for simulation
purpose. The bounds on residual reliability of three different categories
of interconnection topologies are efficiently found by using
the proposed method
Abstract: Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.
Abstract: In this paper we present high performance
dynamically allocated multi-queue (DAMQ) buffer schemes for fault
tolerance systems on chip applications that require an interconnection
network. Two virtual channels shared the same buffer space. Fault
tolerant mechanisms for interconnection networks are becoming a
critical design issue for large massively parallel computers. It is also
important to high performance SoCs as the system complexity keeps
increasing rapidly. On the message switching layer, we make
improvement to boost system performance when there are faults
involved in the components communication. The proposed scheme is
when a node or a physical channel is deemed as faulty, the previous
hop node will terminate the buffer occupancy of messages destined
to the failed link. The buffer usage decisions are made at switching
layer without interactions with higher abstract layer, thus buffer
space will be released to messages destined to other healthy nodes
quickly. Therefore, the buffer space will be efficiently used in case
fault occurs at some nodes.
Abstract: Fair share objective has been included into the goaloriented
parallel computer job scheduling policy recently. However,
the previous work only presented the overall scheduling performance.
Thus, the per-user performance of the policy is still lacking. In this
work, the details of per-user fair share performance under the
Tradeoff-fs(Tx:avgX) policy will be further evaluated. A basic fair
share priority backfill policy namely RelShare(1d) is also studied.
The performance of all policies is collected using an event-driven
simulator with three real job traces as input. The experimental results
show that the high demand users are usually benefited under most
policies because their jobs are large or they have a lot of jobs. In the
large job case, one job executed may result in over-share during that
period. In the other case, the jobs may be backfilled for
performances. However, the users with a mixture of jobs may suffer
because if the smaller jobs are executing the priority of the remaining
jobs from the same user will be lower. Further analysis does not show
any significant impact of users with a lot of jobs or users with a large
runtime approximation error.