CScheme in Traditional Concurrency Problems

CScheme, a concurrent programming paradigm based on scheme concept enables concurrency schemes to be constructed from smaller synchronization units through a GUI based composer and latter be reused on other concurrency problems of a similar nature. This paradigm is particularly important in the multi-core environment prevalent nowadays. In this paper, we demonstrate techniques to separate concurrency from functional code using the CScheme paradigm. Then we illustrate how the CScheme methodology can be used to solve some of the traditional concurrency problems – critical section problem, and readers-writers problem - using synchronization schemes such as Single Threaded Execution Scheme, and Readers Writers Scheme.

Enhancing Cache Performance Based on Improved Average Access Time

A high performance computer includes a fast processor and millions bytes of memory. During the data processing, huge amount of information are shuffled between the memory and processor. Because of its small size and its effectiveness speed, cache has become a common feature of high performance computers. Enhancing cache performance proved to be essential in the speed up of cache-based computers. Most enhancement approaches can be classified as either software based or hardware controlled. The performance of the cache is quantified in terms of hit ratio or miss ratio. In this paper, we are optimizing the cache performance based on enhancing the cache hit ratio. The optimum cache performance is obtained by focusing on the cache hardware modification in the way to make a quick rejection to the missed line's tags from the hit-or miss comparison stage, and thus a low hit time for the wanted line in the cache is achieved. In the proposed technique which we called Even- Odd Tabulation (EOT), the cache lines come from the main memory into cache are classified in two types; even line's tags and odd line's tags depending on their Least Significant Bit (LSB). This division is exploited by EOT technique to reject the miss match line's tags in very low time compared to the time spent by the main comparator in the cache, giving an optimum hitting time for the wanted cache line. The high performance of EOT technique against the familiar mapping technique FAM is shown in the simulated results.

Fuzzy Logic Speed Controller for Direct Vector Control of Induction Motor

This paper presents a new method for the implementation of a direct rotor flux control (DRFOC) of induction motor (IM) drives. It is based on the rotor flux components regulation. The d and q axis rotor flux components feed proportional integral (PI) controllers. The outputs of which are the target stator voltages (vdsref and vqsref). While, the synchronous speed is depicted at the output of rotor speed controller. In order to accomplish variable speed operation, conventional PI like controller is commonly used. These controllers provide limited good performances over a wide range of operations even under ideal field oriented conditions. An alternate approach is to use the so called fuzzy logic controller. The overall investigated system is implemented using dSpace system based on digital signal processor (DSP). Simulation and experimental results have been presented for a one kw IM drives to confirm the validity of the proposed algorithms.

An Innovational Intermittent Algorithm in Networks-On-Chip (NOC)

Every day human life experiences new equipments more automatic and with more abilities. So the need for faster processors doesn-t seem to finish. Despite new architectures and higher frequencies, a single processor is not adequate for many applications. Parallel processing and networks are previous solutions for this problem. The new solution to put a network of resources on a chip is called NOC (network on a chip). The more usual topology for NOC is mesh topology. There are several routing algorithms suitable for this topology such as XY, fully adaptive, etc. In this paper we have suggested a new algorithm named Intermittent X, Y (IX/Y). We have developed the new algorithm in simulation environment to compare delay and power consumption with elders' algorithms.

A Tutorial on Dynamic Simulation of DC Motor and Implementation of Kalman Filter on a Floating Point DSP

With the advent of inexpensive 32 bit floating point digital signal processor-s availability in market, many computationally intensive algorithms such as Kalman filter becomes feasible to implement in real time. Dynamic simulation of a self excited DC motor using second order state variable model and implementation of Kalman Filter in a floating point DSP TMS320C6713 is presented in this paper with an objective to introduce and implement such an algorithm, for beginners. A fractional hp DC motor is simulated in both Matlab® and DSP and the results are included. A step by step approach for simulation of DC motor in Matlab® and “C" routines in CC Studio® is also given. CC studio® project file details and environmental setting requirements are addressed. This tutorial can be used with 6713 DSK, which is based on floating point DSP and CC Studio either in hardware mode or in simulation mode.

GPU-Based Volume Rendering for Medical Imagery

We present a method for fast volume rendering using graphics hardware (GPU). To our knowledge, it is the first implementation on the GPU. Based on the Shear-Warp algorithm, our GPU-based method provides real-time frame rates and outperforms the CPU-based implementation. When the number of slices is not sufficient, we add in-between slices computed by interpolation. This improves then the quality of the rendered images. We have also implemented the ray marching algorithm on the GPU. The results generated by the three algorithms (CPU-based and GPU-based Shear- Warp, GPU-based Ray Marching) for two test models has proved that the ray marching algorithm outperforms the shear-warp methods in terms of speed up and image quality.

Design and Control of DC-DC Converter for the Military Application Fuel Cell

This paper presents a 24 watts SEPIC converter design and control using microprocessor. SEPIC converter has advantages of a wide input range and miniaturization caused by the low stress at elements. There is also an advantage that the input and output are isolated in MOSFET-off state. This paper presents the PID control through the SEPIC converter transfer function using a DSP and the protective circuit for fuel cell from the over-current and inverse-voltage by using the characteristic of SEPIC converter. Then it derives them through the experiments.

Authenticated Mobile Device Proxy Service

In the current study we present a system that is capable to deliver proxy based differentiated service. It will help the carrier service node to sell a prepaid service to clients and limit the use to a particular mobile device or devices for a certain time. The system includes software and hardware architecture for a mobile device with moderate computational power, and a secure protocol for communication between it and its carrier service node. On the carrier service node a proxy runs on a centralized server to be capable of implementing cryptographic algorithms, while the mobile device contains a simple embedded processor capable of executing simple algorithms. One prerequisite is needed for the system to run efficiently that is a presence of Global Trusted Verification Authority (GTVA) which is equivalent to certifying authority in IP networks. This system appears to be of great interest for many commercial transactions, business to business electronic and mobile commerce, and military applications.

Optimal Placement of Processors based on Effective Communication Load

This paper presents a new technique for the optimum placement of processors to minimize the total effective communication load under multi-processor communication dominated environment. This is achieved by placing heavily loaded processors near each other and lightly loaded ones far away from one another in the physical grid locations. The results are mathematically proved for the Algorithms are described.

Two Wheels Balancing Robot with Line Following Capability

This project focuses on the development of a line follower algorithm for a Two Wheels Balancing Robot. In this project, ATMEGA32 is chosen as the brain board controller to react towards the data received from Balance Processor Chip on the balance board to monitor the changes of the environment through two infra-red distance sensor to solve the inclination angle problem. Hence, the system will immediately restore to the set point (balance position) through the implementation of internal PID algorithms at the balance board. Application of infra-red light sensors with the PID control is vital, in order to develop a smooth line follower robot. As a result of combination between line follower program and internal self balancing algorithms, we are able to develop a dynamically stabilized balancing robot with line follower function.

A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes

The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable.

A 10 Giga VPN Accelerator Board for Trust Channel Security System

This paper proposes a VPN Accelerator Board (VPN-AB), a virtual private network (VPN) protocol designed for trust channel security system (TCSS). TCSS supports safety communication channel between security nodes in internet. It furnishes authentication, confidentiality, integrity, and access control to security node to transmit data packets with IPsec protocol. TCSS consists of internet key exchange block, security association block, and IPsec engine block. The internet key exchange block negotiates crypto algorithm and key used in IPsec engine block. Security Association blocks setting-up and manages security association information. IPsec engine block treats IPsec packets and consists of networking functions for communication. The IPsec engine block should be embodied by H/W and in-line mode transaction for high speed IPsec processing. Our VPN-AB is implemented with high speed security processor that supports many cryptographic algorithms and in-line mode. We evaluate a small TCSS communication environment, and measure a performance of VPN-AB in the environment. The experiment results show that VPN-AB gets a performance throughput of maximum 15.645Gbps when we set the IPsec protocol with 3DES-HMAC-MD5 tunnel mode.

A Distributed Group Mutual Exclusion Algorithm for Soft Real Time Systems

The group mutual exclusion (GME) problem is an interesting generalization of the mutual exclusion problem. Several solutions of the GME problem have been proposed for message passing distributed systems. However, none of these solutions is suitable for real time distributed systems. In this paper, we propose a token-based distributed algorithms for the GME problem in soft real time distributed systems. The algorithm uses the concepts of priority queue, dynamic request set and the process state. The algorithm uses first come first serve approach in selecting the next session type between the same priority levels and satisfies the concurrent occupancy property. The algorithm allows all n processors to be inside their CS provided they request for the same session. The performance analysis and correctness proof of the algorithm has also been included in the paper.

Design of Auto Exposure Unit Based On 2-Way Histogram Equalization

Histogram equalization is often used in image enhancement, but it can be also used in auto exposure. However, conventional histogram equalization does not work well when many pixels are concentrated in a narrow luminance range.This paper proposes an auto exposure method based on 2-way histogram equalization. Two cumulative distribution functions are used, where one is from dark to bright and the other is from bright to dark. In this paper, the proposed auto exposure method is also designed and implemented for image signal processors with full-HD images.

64 bit Computer Architectures for Space Applications – A study

The more recent satellite projects/programs makes extensive usage of real – time embedded systems. 16 bit processors which meet the Mil-Std-1750 standard architecture have been used in on-board systems. Most of the Space Applications have been written in ADA. From a futuristic point of view, 32 bit/ 64 bit processors are needed in the area of spacecraft computing and therefore an effort is desirable in the study and survey of 64 bit architectures for space applications. This will also result in significant technology development in terms of VLSI and software tools for ADA (as the legacy code is in ADA). There are several basic requirements for a special processor for this purpose. They include Radiation Hardened (RadHard) devices, very low power dissipation, compatibility with existing operational systems, scalable architectures for higher computational needs, reliability, higher memory and I/O bandwidth, predictability, realtime operating system and manufacturability of such processors. Further on, these may include selection of FPGA devices, selection of EDA tool chains, design flow, partitioning of the design, pin count, performance evaluation, timing analysis etc. This project deals with a brief study of 32 and 64 bit processors readily available in the market and designing/ fabricating a 64 bit RISC processor named RISC MicroProcessor with added functionalities of an extended double precision floating point unit and a 32 bit signal processing unit acting as co-processors. In this paper, we emphasize the ease and importance of using Open Core (OpenSparc T1 Verilog RTL) and Open “Source" EDA tools such as Icarus to develop FPGA based prototypes quickly. Commercial tools such as Xilinx ISE for Synthesis are also used when appropriate.

Iterative Joint Power Control and Partial Crosstalk Cancellation in Upstream VDSL

Crosstalk is the major limiting issue in very high bit-rate digital subscriber line (VDSL) systems in terms of bit-rate or service coverage. At the central office side, joint signal processing accompanied by appropriate power allocation enables complex multiuser processors to provide near capacity rates. Unfortunately complexity grows with the square of the number of lines within a binder, so by taking into account that there are only a few dominant crosstalkers who contribute to main part of crosstalk power, the canceller structure can be simplified which resulted in a much lower run-time complexity. In this paper, a multiuser power control scheme, namely iterative waterfilling, is combined with previously proposed partial crosstalk cancellation approaches to demonstrate the best ever achieved performance which is verified by simulation results.

Performance Analysis of Load Balancing Algorithms

Load balancing is the process of improving the performance of a parallel and distributed system through a redistribution of load among the processors [1] [5]. In this paper we present the performance analysis of various load balancing algorithms based on different parameters, considering two typical load balancing approaches static and dynamic. The analysis indicates that static and dynamic both types of algorithm can have advancements as well as weaknesses over each other. Deciding type of algorithm to be implemented will be based on type of parallel applications to solve. The main purpose of this paper is to help in design of new algorithms in future by studying the behavior of various existing algorithms.

Implementation of Parallel Interface for Microprocessor Trainer

In this paper, parallel interface for microprocessor trainer was implemented. A programmable parallel–port device such as the IC 8255A is initialized for simple input or output and for handshake input or output by choosing kinds of modes. The hardware connections and the programs can be used to interface microprocessor trainer and a personal computer by using IC 8255A. The assembly programs edited on PC-s editor can be downloaded to the trainer.

Unified Fusion Approach with Application to SLAM

In this paper, we propose the pre-processor based on the Evidence Supporting Measure of Similarity (ESMS) filter and also propose the unified fusion approach (UFA) based on the general fusion machine coupled with ESMS filter, which improve the correctness and precision of information fusion in any fields of application. Here we mainly apply the new approach to Simultaneous Localization And Mapping (SLAM) of Pioneer II mobile robots. A simulation experiment was performed, where an autonomous virtual mobile robot with sonar sensors evolves in a virtual world map with obstacles. By comparing the result of building map according to the general fusion machine (here DSmT-based fusing machine and PCR5-based conflict redistributor considereded) coupling with ESMS filter and without ESMS filter, it shows the benefit of the selection of the sources as a prerequisite for improvement of the information fusion, and also testifies the superiority of the UFA in dealing with SLAM.

An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees

Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.