IMLFQ Scheduling Algorithm with Combinational Fault Tolerant Method

Scheduling algorithms are used in operating systems to optimize the usage of processors. One of the most efficient algorithms for scheduling is Multi-Layer Feedback Queue (MLFQ) algorithm which uses several queues with different quanta. The most important weakness of this method is the inability to define the optimized the number of the queues and quantum of each queue. This weakness has been improved in IMLFQ scheduling algorithm. Number of the queues and quantum of each queue affect the response time directly. In this paper, we review the IMLFQ algorithm for solving these problems and minimizing the response time. In this algorithm Recurrent Neural Network has been utilized to find both the number of queues and the optimized quantum of each queue. Also in order to prevent any probable faults in processes' response time computation, a new fault tolerant approach has been presented. In this approach we use combinational software redundancy to prevent the any probable faults. The experimental results show that using the IMLFQ algorithm results in better response time in comparison with other scheduling algorithms also by using fault tolerant mechanism we improve IMLFQ performance.

An Implementation of MacMahon's Partition Analysis in Ordering the Lower Bound of Processing Elements for the Algorithm of LU Decomposition

A lot of Scientific and Engineering problems require the solution of large systems of linear equations of the form bAx in an effective manner. LU-Decomposition offers good choices for solving this problem. Our approach is to find the lower bound of processing elements needed for this purpose. Here is used the so called Omega calculus, as a computational method for solving problems via their corresponding Diophantine relation. From the corresponding algorithm is formed a system of linear diophantine equalities using the domain of computation which is given by the set of lattice points inside the polyhedron. Then is run the Mathematica program DiophantineGF.m. This program calculates the generating function from which is possible to find the number of solutions to the system of Diophantine equalities, which in fact gives the lower bound for the number of processors needed for the corresponding algorithm. There is given a mathematical explanation of the problem as well. Keywordsgenerating function, lattice points in polyhedron, lower bound of processor elements, system of Diophantine equationsand : calculus.

RFU Based Computational Unit Design For Reconfigurable Processors

Fully customized hardware based technology provides high performance and low power consumption by specializing the tasks in hardware but lacks design flexibility since any kind of changes require re-design and re-fabrication. Software based solutions operate with software instructions due to which a great flexibility is achieved from the easy development and maintenance of the software code. But this execution of instructions introduces a high overhead in performance and area consumption. In past few decades the reconfigurable computing domain has been introduced which overcomes the traditional trades-off between flexibility and performance and is able to achieve high performance while maintaining a good flexibility. The dramatic gains in terms of chip performance and design flexibility achieved through the reconfigurable computing systems are greatly dependent on the design of their computational units being integrated with reconfigurable logic resources. The computational unit of any reconfigurable system plays vital role in defining its strength. In this research paper an RFU based computational unit design has been presented using the tightly coupled, multi-threaded reconfigurable cores. The proposed design has been simulated for VLIW based architectures and a high gain in performance has been observed as compared to the conventional computing systems.

Qualitative Parametric Comparison of Load Balancing Algorithms in Parallel and Distributed Computing Environment

Decrease in hardware costs and advances in computer networking technologies have led to increased interest in the use of large-scale parallel and distributed computing systems. One of the biggest issues in such systems is the development of effective techniques/algorithms for the distribution of the processes/load of a parallel program on multiple hosts to achieve goal(s) such as minimizing execution time, minimizing communication delays, maximizing resource utilization and maximizing throughput. Substantive research using queuing analysis and assuming job arrivals following a Poisson pattern, have shown that in a multi-host system the probability of one of the hosts being idle while other host has multiple jobs queued up can be very high. Such imbalances in system load suggest that performance can be improved by either transferring jobs from the currently heavily loaded hosts to the lightly loaded ones or distributing load evenly/fairly among the hosts .The algorithms known as load balancing algorithms, helps to achieve the above said goal(s). These algorithms come into two basic categories - static and dynamic. Whereas static load balancing algorithms (SLB) take decisions regarding assignment of tasks to processors based on the average estimated values of process execution times and communication delays at compile time, Dynamic load balancing algorithms (DLB) are adaptive to changing situations and take decisions at run time. The objective of this paper work is to identify qualitative parameters for the comparison of above said algorithms. In future this work can be extended to develop an experimental environment to study these Load balancing algorithms based on comparative parameters quantitatively.

Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Control of Commutation of SR Motor Using Its Magnetic Characteristics and Back-of-Core Saturation Effects

The control of commutation of switched reluctance (SR) motor has nominally depended on a physical position detector. The physical rotor position sensor limits robustness and increases size and inertia of the SR drive system. The paper describes a method to overcome these limitations by using magnetization characteristics of the motor to indicate rotor and stator teeth overlap status. The method is using active current probing pulses of same magnitude that is used to simulate flux linkage in the winding being probed. A microprocessor is used for processing magnetization data to deduce rotor-stator teeth overlap status and hence rotor position. However, the back-of-core saturation and mutual coupling introduces overlap detection errors, hence that of commutation control. This paper presents the concept of the detection scheme and the effects of backof core saturation.

Parallel-computing Approach for FFT Implementation on Digital Signal Processor (DSP)

An efficient parallel form in digital signal processor can improve the algorithm performance. The butterfly structure is an important role in fast Fourier transform (FFT), because its symmetry form is suitable for hardware implementation. Although it can perform a symmetric structure, the performance will be reduced under the data-dependent flow characteristic. Even though recent research which call as novel memory reference reduction methods (NMRRM) for FFT focus on reduce memory reference in twiddle factor, the data-dependent property still exists. In this paper, we propose a parallel-computing approach for FFT implementation on digital signal processor (DSP) which is based on data-independent property and still hold the property of low-memory reference. The proposed method combines final two steps in NMRRM FFT to perform a novel data-independent structure, besides it is very suitable for multi-operation-unit digital signal processor and dual-core system. We have applied the proposed method of radix-2 FFT algorithm in low memory reference on TI TMSC320C64x DSP. Experimental results show the method can reduce 33.8% clock cycles comparing with the NMRRM FFT implementation and keep the low-memory reference property.

Numerical Study of Airfoils Aerodynamic Performance in Heavy Rain Environment

Heavy rainfall greatly affects the aerodynamic performance of the aircraft. There are many accidents of aircraft caused by aerodynamic efficiency degradation by heavy rain. In this Paper we have studied the heavy rain effects on the aerodynamic efficiency of cambered NACA 64-210 and symmetric NACA 0012 airfoils. Our results show significant increase in drag and decrease in lift. We used preprocessing software gridgen for creation of geometry and mesh, used fluent as solver and techplot as postprocessor. Discrete phase modeling called DPM is used to model the rain particles using two phase flow approach. The rain particles are assumed to be inert. Both airfoils showed significant decrease in lift and increase in drag in simulated rain environment. The most significant difference between these two airfoils was the NACA 64-210 more sensitivity than NACA 0012 to liquid water content (LWC). We believe that the results showed in this paper will be useful for the designer of the commercial aircrafts and UAVs, and will be helpful for training of the pilots to control the airplanes in heavy rain.

3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor

With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.

Performance Evaluation of Neural Network Prediction for Data Prefetching in Embedded Applications

Embedded systems need to respect stringent real time constraints. Various hardware components included in such systems such as cache memories exhibit variability and therefore affect execution time. Indeed, a cache memory access from an embedded microprocessor might result in a cache hit where the data is available or a cache miss and the data need to be fetched with an additional delay from an external memory. It is therefore highly desirable to predict future memory accesses during execution in order to appropriately prefetch data without incurring delays. In this paper, we evaluate the potential of several artificial neural networks for the prediction of instruction memory addresses. Neural network have the potential to tackle the nonlinear behavior observed in memory accesses during program execution and their demonstrated numerous hardware implementation emphasize this choice over traditional forecasting techniques for their inclusion in embedded systems. However, embedded applications execute millions of instructions and therefore millions of addresses to be predicted. This very challenging problem of neural network based prediction of large time series is approached in this paper by evaluating various neural network architectures based on the recurrent neural network paradigm with pre-processing based on the Self Organizing Map (SOM) classification technique.

Verification of On-Line Vehicle Collision Avoidance Warning System using DSRC

Many accidents were happened because of fast driving, habitual working overtime or tired spirit. This paper presents a solution of remote warning for vehicles collision avoidance using vehicular communication. The development system integrates dedicated short range communication (DSRC) and global position system (GPS) with embedded system into a powerful remote warning system. To transmit the vehicular information and broadcast vehicle position; DSRC communication technology is adopt as the bridge. The proposed system is divided into two parts of the positioning andvehicular units in a vehicle. The positioning unit is used to provide the position and heading information from GPS module, and furthermore the vehicular unit is used to receive the break, throttle, and othersignals via controller area network (CAN) interface connected to each mechanism. The mobile hardware are built with an embedded system using X86 processor in Linux system. A vehicle is communicated with other vehicles via DSRC in non-addressed protocol with wireless access in vehicular environments (WAVE) short message protocol. From the position data and vehicular information, this paper provided a conflict detection algorithm to do time separation and remote warning with error bubble consideration. And the warning information is on-line displayed in the screen. This system is able to enhance driver assistance service and realize critical safety by using vehicular information from the neighbor vehicles.KeywordsDedicated short range communication, GPS, Control area network, Collision avoidance warning system.

An Efficient Hardware Implementation of Extended and Fast Physical Addressing in Microprocessor-Based Systems Using Programmable Logic

This paper describes an efficient hardware implementation of a new technique for interfacing the data exchange between the microprocessor-based systems and the external devices. This technique, based on the use of software/hardware system and a reduced physical address, enlarges the interfacing capacity of the microprocessor-based systems, uses the Direct Memory Access (DMA) to increases the frequency of the new bus, and improves the speed of data exchange. While using this architecture in microprocessor-based system or in computer, the input of the hardware part of our system will be connected to the bus system, and the output, which is a new bus, will be connected to an external device. The new bus is composed of a data bus, a control bus and an address bus. A Xilinx Integrated Software Environment (ISE) 7.1i has been used for the programmable logic implementation.

Performance Evaluation of Para-virtualization on Modern Mobile Phone Platform

Emergence of smartphones brings to live the concept of converged devices with the availability of web amenities. Such trend also challenges the mobile devices manufactures and service providers in many aspects, such as security on mobile phones, complex and long time design flow, as well as higher development cost. Among these aspects, security on mobile phones is getting more and more attention. Microkernel based virtualization technology will play a critical role in addressing these challenges and meeting mobile market needs and preferences, since virtualization provides essential isolation for security reasons and it allows multiple operating systems to run on one processor accelerating development and cutting development cost. However, virtualization benefits do not come for free. As an additional software layer, it adds some inevitable virtualization overhead to the system, which may decrease the system performance. In this paper we evaluate and analyze the virtualization performance cost of L4 microkernel based virtualization on a competitive mobile phone by comparing the L4Linux, a para-virtualized Linux on top of L4 microkernel, with the native Linux performance using lmbench and a set of typical mobile phone applications.

Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.

An Ant Colony Optimization for Dynamic JobScheduling in Grid Environment

Grid computing is growing rapidly in the distributed heterogeneous systems for utilizing and sharing large-scale resources to solve complex scientific problems. Scheduling is the most recent topic used to achieve high performance in grid environments. It aims to find a suitable allocation of resources for each job. A typical problem which arises during this task is the decision of scheduling. It is about an effective utilization of processor to minimize tardiness time of a job, when it is being scheduled. This paper, therefore, addresses the problem by developing a general framework of grid scheduling using dynamic information and an ant colony optimization algorithm to improve the decision of scheduling. The performance of various dispatching rules such as First Come First Served (FCFS), Earliest Due Date (EDD), Earliest Release Date (ERD), and an Ant Colony Optimization (ACO) are compared. Moreover, the benefit of using an Ant Colony Optimization for performance improvement of the grid Scheduling is also discussed. It is found that the scheduling system using an Ant Colony Optimization algorithm can efficiently and effectively allocate jobs to proper resources.

A Hyper-Domain Image Watermarking Method based on Macro Edge Block and Wavelet Transform for Digital Signal Processor

In order to protect original data, watermarking is first consideration direction for digital information copyright. In addition, to achieve high quality image, the algorithm maybe can not run on embedded system because the computation is very complexity. However, almost nowadays algorithms need to build on consumer production because integrator circuit has a huge progress and cheap price. In this paper, we propose a novel algorithm which efficient inserts watermarking on digital image and very easy to implement on digital signal processor. In further, we select a general and cheap digital signal processor which is made by analog device company to fit consumer application. The experimental results show that the image quality by watermarking insertion can achieve 46 dB can be accepted in human vision and can real-time execute on digital signal processor.

Performance Analysis of Digital Signal Processors Using SMV Benchmark

Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.

Parallel Distributed Computational Microcontroller System for Adaptive Antenna Downlink Transmitter Power Optimization

This paper presents a tested research concept that implements a complex evolutionary algorithm, genetic algorithm (GA), in a multi-microcontroller environment. Parallel Distributed Genetic Algorithm (PDGA) is employed in adaptive beam forming technique to reduce power usage of adaptive antenna at WCDMA base station. Adaptive antenna has dynamic beam that requires more advanced beam forming algorithm such as genetic algorithm which requires heavy computation and memory space. Microcontrollers are low resource platforms that are normally not associated with GAs, which are typically resource intensive. The aim of this project was to design a cooperative multiprocessor system by expanding the role of small scale PIC microcontrollers to optimize WCDMA base station transmitter power. Implementation results have shown that PDGA multi-microcontroller system returned optimal transmitted power compared to conventional GA.

Genetic Algorithm Parameters Optimization for Bi-Criteria Multiprocessor Task Scheduling Using Design of Experiments

Multiprocessor task scheduling is a NP-hard problem and Genetic Algorithm (GA) has been revealed as an excellent technique for finding an optimal solution. In the past, several methods have been considered for the solution of this problem based on GAs. But, all these methods consider single criteria and in the present work, minimization of the bi-criteria multiprocessor task scheduling problem has been considered which includes weighted sum of makespan & total completion time. Efficiency and effectiveness of genetic algorithm can be achieved by optimization of its different parameters such as crossover, mutation, crossover probability, selection function etc. The effects of GA parameters on minimization of bi-criteria fitness function and subsequent setting of parameters have been accomplished by central composite design (CCD) approach of response surface methodology (RSM) of Design of Experiments. The experiments have been performed with different levels of GA parameters and analysis of variance has been performed for significant parameters for minimisation of makespan and total completion time simultaneously.

An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.