A Parallel Implementation of k-Means in MATLAB

The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Warning about the Risk of Blood Flow Stagnation after Transcatheter Aortic Valve Implantation

In this work, the hemodynamics in the sinuses of Valsalva after Transcatheter Aortic Valve Implantation is numerically examined. We focus on the physical results in the two-dimensional case. We use a finite element methodology based on a Lagrange multiplier technique that enables to couple the dynamics of blood flow and the leaflets’ movement. A massively parallel implementation of a monolithic and fully implicit solver allows more accuracy and significant computational savings. The elastic properties of the aortic valve are disregarded, and the numerical computations are performed under physiologically correct pressure loads. Computational results depict that blood flow may be subject to stagnation in the lower domain of the sinuses of Valsalva after Transcatheter Aortic Valve Implantation.

Using High Performance Computing for Online Flood Monitoring and Prediction

The main goal of this article is to describe the online flood monitoring and prediction system Floreon+ primarily developed for the Moravian-Silesian region in the Czech Republic and the basic process it uses for running automatic rainfall-runoff and hydrodynamic simulations along with their calibration and uncertainty modeling. It takes a long time to execute such process sequentially, which is not acceptable in the online scenario, so the use of a high performance computing environment is proposed for all parts of the process to shorten their duration. Finally, a case study on the Ostravice River catchment is presented that shows actual durations and their gain from the parallel implementation.

Processor Scheduling on Parallel Computers

Many problems in computer vision and image processing present potential for parallel implementations through one of the three major paradigms of geometric parallelism, algorithmic parallelism and processor farming. Static process scheduling techniques are used successfully to exploit geometric and algorithmic parallelism, while dynamic process scheduling is better suited to dealing with the independent processes inherent in the process farming paradigm. This paper considers the application of parallel or multi-computers to a class of problems exhibiting spatial data characteristic of the geometric paradigm. However, by using processor farming paradigm, a dynamic scheduling technique is developed to suit the MIMD structure of the multi-computers. A hybrid scheme of scheduling is also developed and compared with the other schemes. The specific problem chosen for the investigation is the Hough transform for line detection.

Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations

The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.

A Message Passing Implementation of a New Parallel Arrangement Algorithm

This paper describes a new algorithm of arrangement in parallel, based on Odd-Even Mergesort, called division and concurrent mixes. The main idea of the algorithm is to achieve that each processor uses a sequential algorithm for ordering a part of the vector, and after that, for making the processors work in pairs in order to mix two of these sections ordered in a greater one, also ordered; after several iterations, the vector will be completely ordered. The paper describes the implementation of the new algorithm on a Message Passing environment (such as MPI). Besides, it compares the obtained experimental results with the quicksort sequential algorithm and with the parallel implementations (also on MPI) of the algorithms quicksort and bitonic sort. The comparison has been realized in an 8 processors cluster under GNU/Linux which is running on a unique PC processor.

Some Computational Results on MPI Parallel Implementation of Dense Simplex Method

There are two major variants of the Simplex Algorithm: the revised method and the standard, or tableau method. Today, all serious implementations are based on the revised method because it is more efficient for sparse linear programming problems. Moreover, there are a number of applications that lead to dense linear problems so our aim in this paper is to present some computational results on parallel implementation of dense Simplex Method. Our implementation is implemented on a SMP cluster using C programming language and the Message Passing Interface MPI. Preliminary computational results on randomly generated dense linear programs support our results.

A Web Oriented Spread Spectrum Watermarking Procedure for MPEG-2 Videos

In the last decade digital watermarking procedures have become increasingly applied to implement the copyright protection of multimedia digital contents distributed on the Internet. To this end, it is worth noting that a lot of watermarking procedures for images and videos proposed in literature are based on spread spectrum techniques. However, some scepticism about the robustness and security of such watermarking procedures has arisen because of some documented attacks which claim to render the inserted watermarks undetectable. On the other hand, web content providers wish to exploit watermarking procedures characterized by flexible and efficient implementations and which can be easily integrated in their existing web services frameworks or platforms. This paper presents how a simple spread spectrum watermarking procedure for MPEG-2 videos can be modified to be exploited in web contexts. To this end, the proposed procedure has been made secure and robust against some well-known and dangerous attacks. Furthermore, its basic scheme has been optimized by making the insertion procedure adaptive with respect to the terminals used to open the videos and the network transactions carried out to deliver them to buyers. Finally, two different implementations of the procedure have been developed: the former is a high performance parallel implementation, whereas the latter is a portable Java and XML based implementation. Thus, the paper demonstrates that a simple spread spectrum watermarking procedure, with limited and appropriate modifications to the embedding scheme, can still represent a valid alternative to many other well-known and more recent watermarking procedures proposed in literature.

Parallel Explicit Group Domain Decomposition Methods for the Telegraph Equation

In a previous work, we presented the numerical solution of the two dimensional second order telegraph partial differential equation discretized by the centred and rotated five-point finite difference discretizations, namely the explicit group (EG) and explicit decoupled group (EDG) iterative methods, respectively. In this paper, we utilize a domain decomposition algorithm on these group schemes to divide the tasks involved in solving the same equation. The objective of this study is to describe the development of the parallel group iterative schemes under OpenMP programming environment as a way to reduce the computational costs of the solution processes using multicore technologies. A detailed performance analysis of the parallel implementations of points and group iterative schemes will be reported and discussed.

A Parallel Implementation of the Reverse Converter for the Moduli Set {2n, 2n–1, 2n–1–1}

In this paper, a new reverse converter for the moduli set {2n, 2n–1, 2n–1–1} is presented. We improved a previously introduced conversion algorithm for deriving an efficient hardware design for reverse converter. Hardware architecture of the proposed converter is based on carry-save adders and regular binary adders, without the requirement for modular adders. The presented design is faster than the latest introduced reverse converter for moduli set {2n, 2n–1, 2n–1–1}. Also, it has better performance than the reverse converters for the recently introduced moduli set {2n+1–1, 2n, 2n–1}

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Concurrency without Locking in Parallel Hash Structures used for Data Processing

Various mechanisms providing mutual exclusion and thread synchronization can be used to support parallel processing within a single computer. Instead of using locks, semaphores, barriers or other traditional approaches in this paper we focus on alternative ways for making better use of modern multithreaded architectures and preparing hash tables for concurrent accesses. Hash structures will be used to demonstrate and compare two entirely different approaches (rule based cooperation and hardware synchronization support) to an efficient parallel implementation using traditional locks. Comparison includes implementation details, performance ranking and scalability issues. We aim at understanding the effects the parallelization schemes have on the execution environment with special focus on the memory system and memory access characteristics.

HIV Modelling - Parallel Implementation Strategies

We report on the development of a model to understand why the range of experience with respect to HIV infection is so diverse, especially with respect to the latency period. To investigate this, an agent-based approach is used to extract highlevel behaviour which cannot be described analytically from the set of interaction rules at the cellular level. A network of independent matrices mimics the chain of lymph nodes. Dealing with massively multi-agent systems requires major computational effort. However, parallelisation methods are a natural consequence and advantage of the multi-agent approach and, using the MPI library, are here implemented, tested and optimized. Our current focus is on the various implementations of the data transfer across the network. Three communications strategies are proposed and tested, showing that the most efficient approach is communication based on the natural lymph-network connectivity.