JConqurr - A Multi-Core Programming Toolkit for Java

With the popularity of the multi-core and many-core architectures there is a great requirement for software frameworks which can support parallel programming methodologies. In this paper we introduce an Eclipse toolkit, JConqurr which is easy to use and provides robust support for flexible parallel progrmaming. JConqurr is a multi-core and many-core programming toolkit for Java which is capable of providing support for common parallel programming patterns which include task, data, divide and conquer and pipeline parallelism. The toolkit uses an annotation and a directive mechanism to convert the sequential code into parallel code. In addition to that we have proposed a novel mechanism to achieve the parallelism using graphical processing units (GPU). Experiments with common parallelizable algorithms have shown that our toolkit can be easily and efficiently used to convert sequential code to parallel code and significant performance gains can be achieved.

Simulation and Workspace Analysis of a Tripod Parallel Manipulator

Industrial robots play a vital role in automation however only little effort are taken for the application of robots in machining work such as Grinding, Cutting, Milling, Drilling, Polishing etc. Robot parallel manipulators have high stiffness, rigidity and accuracy, which cannot be provided by conventional serial robot manipulators. The aim of this paper is to perform the modeling and the workspace analysis of a 3 DOF Parallel Manipulator (3 DOF PM). The 3 DOF PM was modeled and simulated using 'ADAMS'. The concept involved is based on the transformation of motion from a screw joint to a spherical joint through a connecting link. This paper work has been planned to model the Parallel Manipulator (PM) using screw joints for very accurate positioning. A workspace analysis has been done for the determination of work volume of the 3 DOF PM. The position of the spherical joints connected to the moving platform and the circumferential points of the moving platform were considered for finding the workspace. After the simulation, the position of the joints of the moving platform was noted with respect to simulation time and these points were given as input to the 'MATLAB' for getting the work envelope. Then 'AUTOCAD' is used for determining the work volume. The obtained values were compared with analytical approach by using Pappus-Guldinus Theorem. The analysis had been dealt by considering the parameters, link length and radius of the moving platform. From the results it is found that the radius of moving platform is directly proportional to the work volume for a constant link length and the link length is also directly proportional to the work volume, at a constant radius of the moving platform.

A Mathematical Modelling to Predict Rhamnolipid Production by Pseudomonas aeruginosa under Nitrogen Limiting Fed-Batch Fermentation

In this study, a mathematical model was proposed and the accuracy of this model was assessed to predict the growth of Pseudomonas aeruginosa and rhamnolipid production under nitrogen limiting (sodium nitrate) fed-batch fermentation. All of the parameters used in this model were achieved individually without using any data from the literature. The overall growth kinetic of the strain was evaluated using a dual-parallel substrate Monod equation which was described by several batch experimental data. Fed-batch data under different glycerol (as the sole carbon source, C/N=10) concentrations and feed flow rates were used to describe the proposed fed-batch model and other parameters. In order to verify the accuracy of the proposed model several verification experiments were performed in a vast range of initial glycerol concentrations. While the results showed an acceptable prediction for rhamnolipid production (less than 10% error), in case of biomass prediction the errors were less than 23%. It was also found that the rhamnolipid production by P. aeruginosa was more sensitive at low glycerol concentrations. Based on the findings of this work, it was concluded that the proposed model could effectively be employed for rhamnolipid production by this strain under fed-batch fermentation on up to 80 g l- 1 glycerol.

Image Segment Matching Using Affine- Invariant Regions

In this paper, a method for matching image segments using triangle-based (geometrical) regions is proposed. Triangular regions are formed from triples of vertex points obtained from a keypoint detector (SIFT). However, triangle regions are subject to noise and distortion around the edges and vertices (especially acute angles). Therefore, these triangles are expanded into parallelogramshaped regions. The extracted image segments inherit an important triangle property; the invariance to affine distortion. Given two images, matching corresponding regions is conducted by computing the relative affine matrix, rectifying one of the regions w.r.t. the other one, then calculating the similarity between the reference and rectified region. The experimental tests show the efficiency and robustness of the proposed algorithm against geometrical distortion.

Design and Implementation of Shared Memory based Parallel File System Logging Method for High Performance Computing

I/O workload is a critical and important factor to analyze I/O pattern and file system performance. However tracing I/O operations on the fly distributed parallel file system is non-trivial due to collection overhead and a large volume of data. In this paper, we design and implement a parallel file system logging method for high performance computing using shared memory-based multi-layer scheme. It minimizes the overhead with reduced logging operation response time and provides efficient post-processing scheme through shared memory. Separated logging server can collect sequential logs from multiple clients in a cluster through packet communication. Implementation and evaluation result shows low overhead and high scalability of this architecture for high performance parallel logging analysis.

Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations

The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.

A Heuristic Algorithm Approach for Scheduling of Multi-criteria Unrelated Parallel Machines

In this paper we address a multi-objective scheduling problem for unrelated parallel machines. In unrelated parallel systems, the processing cost/time of a given job on different machines may vary. The objective of scheduling is to simultaneously determine the job-machine assignment and job sequencing on each machine. In such a way the total cost of the schedule is minimized. The cost function consists of three components, namely; machining cost, earliness/tardiness penalties and makespan related cost. Such scheduling problem is combinatorial in nature. Therefore, a Simulated Annealing approach is employed to provide good solutions within reasonable computational times. Computational results show that the proposed approach can efficiently solve such complicated problems.

Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining

This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.

Optimal Design of Two-Channel Recursive Parallelogram Quadrature Mirror Filter Banks

This paper deals with the optimal design of two-channel recursive parallelogram quadrature mirror filter (PQMF) banks. The analysis and synthesis filters of the PQMF bank are composed of two-dimensional (2-D) recursive digital all-pass filters (DAFs) with nonsymmetric half-plane (NSHP) support region. The design problem can be facilitated by using the 2-D doubly complementary half-band (DC-HB) property possessed by the analysis and synthesis filters. For finding the coefficients of the 2-D recursive NSHP DAFs, we appropriately formulate the design problem to result in an optimization problem that can be solved by using a weighted least-squares (WLS) algorithm in the minimax (L∞) optimal sense. The designed 2-D recursive PQMF bank achieves perfect magnitude response and possesses satisfactory phase response without requiring extra phase equalizer. Simulation results are also provided for illustration and comparison.

A New Efficient Scalable BIST Full Adder using Polymorphic Gates

Among various testing methodologies, Built-in Self- Test (BIST) is recognized as a low cost, effective paradigm. Also, full adders are one of the basic building blocks of most arithmetic circuits in all processing units. In this paper, an optimized testable 2- bit full adder as a test building block is proposed. Then, a BIST procedure is introduced to scale up the building block and to generate a self testable n-bit full adders. The target design can achieve 100% fault coverage using insignificant amount of hardware redundancy. Moreover, Overall test time is reduced by utilizing polymorphic gates and also by testing full adder building blocks in parallel.

Transient Currents in a Double Conductor Line above a Conducting Half-Space

Transient eddy current problem is solved in the present paper by the method of the Laplace transform for the case of a double conductor line located parallel to a conducting half-space. The Fourier sine and cosine integral transforms are used in order to find the Laplace transform of the solution. The inverse Laplace transform of the solution is found in closed form. The integrated electromotive force per unit length of the double conductor line is calculated in the form of an improper integral.

Fabrication of Single Crystal of Mg Alloys Containing Rare Earth Elements

Single crystals of Magnesium alloys such as Mg-1Al, Mg-1Zn-0.5Y, Mg-3Li, and AZ31 alloys were successfully fabricated in this study by employing the modified Bridgman method. Single crystals of pure Mg were also made in this study. To determine the exact orientation of crystals, Laue back-reflection method and pole figure measurement were carried out on each single crystal. Dimensions of single crystals were 10 mm in diameter and 120 mm in length. Hardness and compression tests were conducted and the results revealed that hardness and the strength strongly depended on the orientation. The closer to basal one the orientation was, the higher hardness and compressive strength were. The effect of alloying was not higher than that of orientation. After compressive deformation of single crystals, the orientation of the crystals was found to rotate and to be parallel to the basal orientation.

Noise-Improved Signal Detection in Nonlinear Threshold Systems

We discuss the signal detection through nonlinear threshold systems. The detection performance is assessed by the probability of error Per . We establish that: (1) when the signal is complete suprathreshold, noise always degrades the signal detection both in the single threshold system and in the parallel array of threshold devices. (2) When the signal is a little subthreshold, noise degrades signal detection in the single threshold system. But in the parallel array, noise can improve signal detection, i.e., stochastic resonance (SR) exists in the array. (3) When the signal is predominant subthreshold, noise always can improve signal detection and SR always exists not only in the single threshold system but also in the parallel array. (4) Array can improve signal detection by raising the number of threshold devices. These results extend further the applicability of SR in signal detection.

Spacecraft Neural Network Control System Design using FPGA

Designing and implementing intelligent systems has become a crucial factor for the innovation and development of better products of space technologies. A neural network is a parallel system, capable of resolving paradigms that linear computing cannot. Field programmable gate array (FPGA) is a digital device that owns reprogrammable properties and robust flexibility. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI and DSP chips. So, many researchers have made great efforts on the realization of neural network (NN) using FPGA technique. In this paper, an introduction of ANN and FPGA technique are briefly shown. Also, Hardware Description Language (VHDL) code has been proposed to implement ANNs as well as to present simulation results with floating point arithmetic. Synthesis results for ANN controller are developed using Precision RTL. Proposed VHDL implementation creates a flexible, fast method and high degree of parallelism for implementing ANN. The implementation of multi-layer NN using lookup table LUT reduces the resource utilization for implementation and time for execution.

Fast Wavelet Image Denoising Based on Local Variance and Edge Analysis

The approach based on the wavelet transform has been widely used for image denoising due to its multi-resolution nature, its ability to produce high levels of noise reduction and the low level of distortion introduced. However, by removing noise, high frequency components belonging to edges are also removed, which leads to blurring the signal features. This paper proposes a new method of image noise reduction based on local variance and edge analysis. The analysis is performed by dividing an image into 32 x 32 pixel blocks, and transforming the data into wavelet domain. Fast lifting wavelet spatial-frequency decomposition and reconstruction is developed with the advantages of being computationally efficient and boundary effects minimized. The adaptive thresholding by local variance estimation and edge strength measurement can effectively reduce image noise while preserve the features of the original image corresponding to the boundaries of the objects. Experimental results demonstrate that the method performs well for images contaminated by natural and artificial noise, and is suitable to be adapted for different class of images and type of noises. The proposed algorithm provides a potential solution with parallel computation for real time or embedded system application.

Isotropic Stress Distribution in Cu/(001) Fe Two Sheets

The nanotechnology based on epitaxial systems includes single or arranged misfit dislocations. In general, whatever is the type of dislocation or the geometry of the array formed by the dislocations; it is important for experimental studies to know exactly the stress distribution for which there is no analytical expression [1, 2]. This work, using a numerical analysis, deals with relaxation of epitaxial layers having at their interface a periodic network of edge misfit dislocations. The stress distribution is estimated by using isotropic elasticity. The results show that the thickness of the two sheets is a crucial parameter in the stress distributions and then in the profile of the two sheets. A comparative study between the case of single dislocation and the case of parallel network shows that the layers relaxed better when the interface is covered by a parallel arrangement of misfit. Consequently, a single dislocation at the interface produces an important stress field which can be reduced by inserting a parallel network of dislocations with suitable periodicity.

An Approach for Blind Source Separation using the Sliding DFT and Time Domain Independent Component Analysis

''Cocktail party problem'' is well known as one of the human auditory abilities. We can recognize the specific sound that we want to listen by this ability even if a lot of undesirable sounds or noises are mixed. Blind source separation (BSS) based on independent component analysis (ICA) is one of the methods by which we can separate only a special signal from their mixed signals with simple hypothesis. In this paper, we propose an online approach for blind source separation using the sliding DFT and the time domain independent component analysis. The proposed method can reduce calculation complexity in comparison with conventional methods, and can be applied to parallel processing by using digital signal processors (DSPs) and so on. We evaluate this method and show its availability.

Detection and Classification of Faults on Parallel Transmission Lines Using Wavelet Transform and Neural Network

The protection of parallel transmission lines has been a challenging task due to mutual coupling between the adjacent circuits of the line. This paper presents a novel scheme for detection and classification of faults on parallel transmission lines. The proposed approach uses combination of wavelet transform and neural network, to solve the problem. While wavelet transform is a powerful mathematical tool which can be employed as a fast and very effective means of analyzing power system transient signals, artificial neural network has a ability to classify non-linear relationship between measured signals by identifying different patterns of the associated signals. The proposed algorithm consists of time-frequency analysis of fault generated transients using wavelet transform, followed by pattern recognition using artificial neural network to identify the type of the fault. MATLAB/Simulink is used to generate fault signals and verify the correctness of the algorithm. The adaptive discrimination scheme is tested by simulating different types of fault and varying fault resistance, fault location and fault inception time, on a given power system model. The simulation results show that the proposed scheme for fault diagnosis is able to classify all the faults on the parallel transmission line rapidly and correctly.

FPGA Based Longitudinal and Lateral Controller Implementation for a Small UAV

This paper presents implementation of attitude controller for a small UAV using field programmable gate array (FPGA). Due to the small size constrain a miniature more compact and computationally extensive; autopilot platform is needed for such systems. More over UAV autopilot has to deal with extremely adverse situations in the shortest possible time, while accomplishing its mission. FPGAs in the recent past have rendered themselves as fast, parallel, real time, processing devices in a compact size. This work utilizes this fact and implements different attitude controllers for a small UAV in FPGA, using its parallel processing capabilities. Attitude controller is designed in MATLAB/Simulink environment. The discrete version of this controller is implemented using pipelining followed by retiming, to reduce the critical path and thereby clock period of the controller datapath. Pipelined, retimed, parallel PID controller implementation is done using rapidprototyping and testing efficient development tool of “system generator", which has been developed by Xilinx for FPGA implementation. The improved timing performance enables the controller to react abruptly to any changes made to the attitudes of UAV.

Low Power and Less Area Architecture for Integer Motion Estimation

Full search block matching algorithm is widely used for hardware implementation of motion estimators in video compression algorithms. In this paper we are proposing a new architecture, which consists of a 2D parallel processing unit and a 1D unit both working in parallel. The proposed architecture reduces both data access power and computational power which are the main causes of power consumption in integer motion estimation. It also completes the operations with nearly the same number of clock cycles as compared to a 2D systolic array architecture. In this work sum of absolute difference (SAD)-the most repeated operation in block matching, is calculated in two steps. The first step is to calculate the SAD for alternate rows by a 2D parallel unit. If the SAD calculated by the parallel unit is less than the stored minimum SAD, the SAD of the remaining rows is calculated by the 1D unit. Early termination, which stops avoidable computations has been achieved with the help of alternate rows method proposed in this paper and by finding a low initial SAD value based on motion vector prediction. Data reuse has been applied to the reference blocks in the same search area which significantly reduced the memory access.