Abstract: CScheme, a concurrent programming paradigm based
on scheme concept enables concurrency schemes to be constructed
from smaller synchronization units through a GUI based composer
and latter be reused on other concurrency problems of a similar
nature. This paradigm is particularly important in the multi-core
environment prevalent nowadays. In this paper, we demonstrate
techniques to separate concurrency from functional code using the
CScheme paradigm. Then we illustrate how the CScheme
methodology can be used to solve some of the traditional
concurrency problems – critical section problem, and readers-writers
problem - using synchronization schemes such as Single Threaded
Execution Scheme, and Readers Writers Scheme.
Abstract: A high performance computer includes a fast
processor and millions bytes of memory. During the data processing,
huge amount of information are shuffled between the memory and
processor. Because of its small size and its effectiveness speed, cache
has become a common feature of high performance computers.
Enhancing cache performance proved to be essential in the speed up
of cache-based computers. Most enhancement approaches can be
classified as either software based or hardware controlled. The
performance of the cache is quantified in terms of hit ratio or miss
ratio. In this paper, we are optimizing the cache performance based
on enhancing the cache hit ratio. The optimum cache performance is
obtained by focusing on the cache hardware modification in the way
to make a quick rejection to the missed line's tags from the hit-or
miss comparison stage, and thus a low hit time for the wanted line in
the cache is achieved. In the proposed technique which we called
Even- Odd Tabulation (EOT), the cache lines come from the main
memory into cache are classified in two types; even line's tags and
odd line's tags depending on their Least Significant Bit (LSB). This
division is exploited by EOT technique to reject the miss match line's
tags in very low time compared to the time spent by the main
comparator in the cache, giving an optimum hitting time for the
wanted cache line. The high performance of EOT technique against
the familiar mapping technique FAM is shown in the simulated
results.
Abstract: This paper presents a new method for the
implementation of a direct rotor flux control (DRFOC) of induction
motor (IM) drives. It is based on the rotor flux components
regulation. The d and q axis rotor flux components feed proportional
integral (PI) controllers. The outputs of which are the target stator
voltages (vdsref and vqsref). While, the synchronous speed is depicted at
the output of rotor speed controller. In order to accomplish variable
speed operation, conventional PI like controller is commonly used.
These controllers provide limited good performances over a wide
range of operations even under ideal field oriented conditions. An
alternate approach is to use the so called fuzzy logic controller. The
overall investigated system is implemented using dSpace system
based on digital signal processor (DSP). Simulation and experimental
results have been presented for a one kw IM drives to confirm the
validity of the proposed algorithms.
Abstract: Every day human life experiences new equipments
more automatic and with more abilities. So the need for faster
processors doesn-t seem to finish. Despite new architectures and
higher frequencies, a single processor is not adequate for many
applications. Parallel processing and networks are previous solutions
for this problem. The new solution to put a network of resources on a
chip is called NOC (network on a chip). The more usual topology for
NOC is mesh topology. There are several routing algorithms suitable
for this topology such as XY, fully adaptive, etc. In this paper we
have suggested a new algorithm named Intermittent X, Y (IX/Y). We
have developed the new algorithm in simulation environment to
compare delay and power consumption with elders' algorithms.
Abstract: With the advent of inexpensive 32 bit floating point digital signal processor-s availability in market, many computationally intensive algorithms such as Kalman filter becomes feasible to implement in real time. Dynamic simulation of a self excited DC motor using second order state variable model and implementation of Kalman Filter in a floating point DSP TMS320C6713 is presented in this paper with an objective to introduce and implement such an algorithm, for beginners. A fractional hp DC motor is simulated in both Matlab® and DSP and the results are included. A step by step approach for simulation of DC motor in Matlab® and “C" routines in CC Studio® is also given. CC studio® project file details and environmental setting requirements are addressed. This tutorial can be used with 6713 DSK, which is based on floating point DSP and CC Studio either in hardware mode or in simulation mode.
Abstract: We present a method for fast volume rendering using
graphics hardware (GPU). To our knowledge, it is the first implementation
on the GPU. Based on the Shear-Warp algorithm, our
GPU-based method provides real-time frame rates and outperforms
the CPU-based implementation. When the number of slices is not
sufficient, we add in-between slices computed by interpolation. This
improves then the quality of the rendered images. We have also
implemented the ray marching algorithm on the GPU. The results
generated by the three algorithms (CPU-based and GPU-based Shear-
Warp, GPU-based Ray Marching) for two test models has proved that
the ray marching algorithm outperforms the shear-warp methods in
terms of speed up and image quality.
Abstract: This paper presents a 24 watts SEPIC converter design
and control using microprocessor. SEPIC converter has advantages of
a wide input range and miniaturization caused by the low stress at
elements. There is also an advantage that the input and output are
isolated in MOSFET-off state. This paper presents the PID control
through the SEPIC converter transfer function using a DSP and the
protective circuit for fuel cell from the over-current and
inverse-voltage by using the characteristic of SEPIC converter. Then it
derives them through the experiments.
Abstract: In the current study we present a system that is
capable to deliver proxy based differentiated service. It will help the
carrier service node to sell a prepaid service to clients and limit the
use to a particular mobile device or devices for a certain time. The
system includes software and hardware architecture for a mobile
device with moderate computational power, and a secure protocol for
communication between it and its carrier service node. On the
carrier service node a proxy runs on a centralized server to be
capable of implementing cryptographic algorithms, while the mobile
device contains a simple embedded processor capable of executing
simple algorithms. One prerequisite is needed for the system to run
efficiently that is a presence of Global Trusted Verification Authority
(GTVA) which is equivalent to certifying authority in IP networks.
This system appears to be of great interest for many commercial
transactions, business to business electronic and mobile commerce,
and military applications.
Abstract: This paper presents a new technique for the optimum
placement of processors to minimize the total effective
communication load under multi-processor communication
dominated environment. This is achieved by placing heavily loaded
processors near each other and lightly loaded ones far away from
one another in the physical grid locations. The results are
mathematically proved for the Algorithms are described.
Abstract: This project focuses on the development of a line
follower algorithm for a Two Wheels Balancing Robot. In this
project, ATMEGA32 is chosen as the brain board controller to react
towards the data received from Balance Processor Chip on the
balance board to monitor the changes of the environment through
two infra-red distance sensor to solve the inclination angle problem.
Hence, the system will immediately restore to the set point (balance
position) through the implementation of internal PID algorithms at
the balance board. Application of infra-red light sensors with the PID
control is vital, in order to develop a smooth line follower robot. As a
result of combination between line follower program and internal self
balancing algorithms, we are able to develop a dynamically
stabilized balancing robot with line follower function.
Abstract: The demand for higher performance graphics
continues to grow because of the incessant desire towards realism.
And, rapid advances in fabrication technology have enabled us to
build several processor cores on a single die. Hence, it is important to
develop single chip parallel architectures for such data-intensive
applications. In this paper, we propose an efficient PIM architectures
tailored for computer graphics which requires a large number of
memory accesses. We then address the two important tasks necessary
for maximally exploiting the parallelism provided by the architecture,
namely, partitioning and placement of graphic data, which affect
respectively load balances and communication costs. Under the
constraints of uniform partitioning, we develop approaches for optimal
partitioning and placement, which significantly reduce search space.
We also present heuristics for identifying near-optimal placement,
since the search space for placement is impractically large despite our
optimization. We then demonstrate the effectiveness of our partitioning
and placement approaches via analysis of example scenes; simulation
results show considerable search space reductions, and our heuristics
for placement performs close to optimal – the average ratio of
communication overheads between our heuristics and the optimal was
1.05. Our uniform partitioning showed average load-balance ratio of
1.47 for geometry processing and 1.44 for rasterization, which is
reasonable.
Abstract: This paper proposes a VPN Accelerator Board
(VPN-AB), a virtual private network (VPN) protocol designed for
trust channel security system (TCSS). TCSS supports safety
communication channel between security nodes in internet. It
furnishes authentication, confidentiality, integrity, and access control
to security node to transmit data packets with IPsec protocol. TCSS
consists of internet key exchange block, security association block,
and IPsec engine block. The internet key exchange block negotiates
crypto algorithm and key used in IPsec engine block. Security
Association blocks setting-up and manages security association
information. IPsec engine block treats IPsec packets and consists of
networking functions for communication. The IPsec engine block
should be embodied by H/W and in-line mode transaction for high
speed IPsec processing. Our VPN-AB is implemented with high speed
security processor that supports many cryptographic algorithms and
in-line mode. We evaluate a small TCSS communication environment,
and measure a performance of VPN-AB in the environment. The
experiment results show that VPN-AB gets a performance throughput
of maximum 15.645Gbps when we set the IPsec protocol with
3DES-HMAC-MD5 tunnel mode.
Abstract: The group mutual exclusion (GME) problem is an
interesting generalization of the mutual exclusion problem. Several
solutions of the GME problem have been proposed for message
passing distributed systems. However, none of these solutions is
suitable for real time distributed systems. In this paper, we propose a
token-based distributed algorithms for the GME problem in soft real
time distributed systems. The algorithm uses the concepts of priority
queue, dynamic request set and the process state. The algorithm uses
first come first serve approach in selecting the next session type
between the same priority levels and satisfies the concurrent
occupancy property. The algorithm allows all n processors to be
inside their CS provided they request for the same session. The
performance analysis and correctness proof of the algorithm has also
been included in the paper.
Abstract: Histogram equalization is often used in image enhancement, but it can be also used in auto exposure. However, conventional histogram equalization does not work well when many pixels are concentrated in a narrow luminance range.This paper proposes an auto exposure method based on 2-way histogram equalization. Two cumulative distribution functions are used, where one is from dark to bright and the other is from bright to dark. In this paper, the proposed auto exposure method is also designed and implemented for image signal processors with full-HD images.
Abstract: The more recent satellite projects/programs makes
extensive usage of real – time embedded systems. 16 bit processors
which meet the Mil-Std-1750 standard architecture have been used in
on-board systems. Most of the Space Applications have been written
in ADA. From a futuristic point of view, 32 bit/ 64 bit processors are
needed in the area of spacecraft computing and therefore an effort is
desirable in the study and survey of 64 bit architectures for space
applications. This will also result in significant technology
development in terms of VLSI and software tools for ADA (as the
legacy code is in ADA).
There are several basic requirements for a special processor for
this purpose. They include Radiation Hardened (RadHard) devices,
very low power dissipation, compatibility with existing operational
systems, scalable architectures for higher computational needs,
reliability, higher memory and I/O bandwidth, predictability, realtime
operating system and manufacturability of such processors.
Further on, these may include selection of FPGA devices, selection
of EDA tool chains, design flow, partitioning of the design, pin
count, performance evaluation, timing analysis etc.
This project deals with a brief study of 32 and 64 bit processors
readily available in the market and designing/ fabricating a 64 bit
RISC processor named RISC MicroProcessor with added
functionalities of an extended double precision floating point unit
and a 32 bit signal processing unit acting as co-processors. In this
paper, we emphasize the ease and importance of using Open Core
(OpenSparc T1 Verilog RTL) and Open “Source" EDA tools such as
Icarus to develop FPGA based prototypes quickly. Commercial tools
such as Xilinx ISE for Synthesis are also used when appropriate.
Abstract: Crosstalk is the major limiting issue in very high bit-rate digital subscriber line (VDSL) systems in terms of bit-rate or service coverage. At the central office side, joint signal processing accompanied by appropriate power allocation enables complex multiuser processors to provide near capacity rates. Unfortunately complexity grows with the square of the number of lines within a binder, so by taking into account that there are only a few dominant crosstalkers who contribute to main part of crosstalk power, the canceller structure can be simplified which resulted in a much lower run-time complexity. In this paper, a multiuser power control scheme, namely iterative waterfilling, is combined with previously proposed partial crosstalk cancellation approaches to demonstrate the best ever achieved performance which is verified by simulation results.
Abstract: Load balancing is the process of improving the
performance of a parallel and distributed system through a
redistribution of load among the processors [1] [5]. In this paper we
present the performance analysis of various load balancing
algorithms based on different parameters, considering two typical
load balancing approaches static and dynamic. The analysis indicates
that static and dynamic both types of algorithm can have
advancements as well as weaknesses over each other. Deciding type
of algorithm to be implemented will be based on type of parallel
applications to solve. The main purpose of this paper is to help in
design of new algorithms in future by studying the behavior of
various existing algorithms.
Abstract: In this paper, parallel interface for microprocessor
trainer was implemented. A programmable parallel–port device such
as the IC 8255A is initialized for simple input or output and for
handshake input or output by choosing kinds of modes. The hardware
connections and the programs can be used to interface
microprocessor trainer and a personal computer by using IC 8255A.
The assembly programs edited on PC-s editor can be downloaded to
the trainer.
Abstract: In this paper, we propose the pre-processor based on
the Evidence Supporting Measure of Similarity (ESMS) filter and also
propose the unified fusion approach (UFA) based on the general
fusion machine coupled with ESMS filter, which improve the
correctness and precision of information fusion in any fields of
application. Here we mainly apply the new approach to Simultaneous
Localization And Mapping (SLAM) of Pioneer II mobile robots. A
simulation experiment was performed, where an autonomous virtual
mobile robot with sonar sensors evolves in a virtual world map with
obstacles. By comparing the result of building map according to the
general fusion machine (here DSmT-based fusing machine and
PCR5-based conflict redistributor considereded) coupling with ESMS
filter and without ESMS filter, it shows the benefit of the selection of
the sources as a prerequisite for improvement of the information
fusion, and also testifies the superiority of the UFA in dealing with
SLAM.
Abstract: Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.