Abstract: An innovative approach to develop modified scaling free CORDIC based two parallel pipelined Multipath Delay Commutator (MDC) FFT and IFFT architectures for radix 22 FFT algorithm is presented. Multipliers and adders are the most important data paths in FFT and IFFT architectures. Multipliers occupy high area and consume more power. In order to optimize the area and power overhead, modified scaling-free CORDIC based complex multiplier is utilized in the proposed design. In general twiddle factor values are stored in RAM block. In the proposed work, modified scaling-free CORDIC based twiddle factor generator unit is used to generate the twiddle factor and efficient switching units are used. In addition to this, four point FFT operations are performed without complex multiplication which helps to reduce area and power in the last two stages of the pipelined architectures. The design proposed in this paper is based on multipath delay commutator method. The proposed design can be extended to any radix 2n based FFT/IFFT algorithm to improve the throughput. The work is synthesized using Synopsys design Compiler using TSMC 90-nm library. The proposed method proves to be better compared to the reference design in terms of area, throughput and power consumption. The comparative analysis of the proposed design with Xilinx FPGA platform is also discussed in the paper.
Abstract: The Simulation based VLSI Implementation of
FELICS (Fast Efficient Lossless Image Compression System)
Algorithm is proposed to provide the lossless image compression and
is implemented in simulation oriented VLSI (Very Large Scale
Integrated). To analysis the performance of Lossless image
compression and to reduce the image without losing image quality
and then implemented in VLSI based FELICS algorithm. In FELICS
algorithm, which consists of simplified adjusted binary code for
Image compression and these compression image is converted in
pixel and then implemented in VLSI domain. This parameter is used
to achieve high processing speed and minimize the area and power.
The simplified adjusted binary code reduces the number of arithmetic
operation and achieved high processing speed. The color difference
preprocessing is also proposed to improve coding efficiency with
simple arithmetic operation. Although VLSI based FELICS
Algorithm provides effective solution for hardware architecture
design for regular pipelining data flow parallelism with four stages.
With two level parallelisms, consecutive pixels can be classified into
even and odd samples and the individual hardware engine is
dedicated for each one. This method can be further enhanced by
multilevel parallelisms.
Abstract: An adder is one of the most integral component of a digital system like a digital signal processor or a microprocessor. Being an extremely computationally intensive part of a system, the optimization for speed and power consumption of the adder is of prime importance. In this paper we have designed a 1 bit full adder cell based on dynamic TSPC logic to achieve high speed operation. A high threshold voltage sleep transistor is used to reduce the static power dissipation in standby mode. The circuit is designed and simulated in TSPICE using TSMC 180nm CMOS process. Average power consumption, delay and power-delay product is measured which showed considerable improvement in performance over the existing full adder designs.
Abstract: In the literature, surfing technique has been proposed for single ended wave-pipelined serial interconnects to increase the data transfer rate. In this paper a novel surfing technique is proposed for differential wave-pipelined serial interconnects, which uses a 'Controllable inverter pair' for surfing. To evaluate the efficiency of this technique, a transceiver with transmitter, receiver, delay locked loop (DLL) along with 40mm metal 4 interconnects using the proposed surfing technique is implemented in UMC 180nm technology and their performances are studied through post layout simulations. From the study, it is observed that the proposed scheme permits 1.875 times higher data transmission rate compared to the single ended scheme whose maximum data transfer rate is 1.33 GB/s. The proposed scheme has the ability to receive the correct data even with stuck-at-faults in the complementary line.
Abstract: Design and implementation of a novel B-ACOSD CFAR algorithm is presented in this paper. It is proposed for detecting radar target in log-normal distribution environment. The BACOSD detector is capable to detect automatically the number interference target in the reference cells and detect the real target by an adaptive threshold. The detector is implemented as a System on Chip on FPGA Altera Stratix II using parallelism and pipelining technique. For a reference window of length 16 cells, the experimental results showed that the processor works properly with a processing speed up to 115.13MHz and processing time0.29 ┬Ás, thus meets real-time requirement for a typical radar system.
Abstract: SAD (Sum of Absolute Difference) algorithm is
heavily used in motion estimation which is computationally highly
demanding process in motion picture encoding. To enhance the
performance of motion picture encoding on a VLIW processor, an
efficient implementation of SAD algorithm on the VLIW processor is
essential. SAD algorithm is programmed as a nested loop with a
conditional branch. In VLIW processors, loop is usually optimized by
software pipelining, but researches on optimal scheduling of software
pipelining for nested loops, especially nested loops with conditional
branches are rare. In this paper, we propose an optimal scheduling and
implementation of SAD algorithm with conditional branch on a VLIW
DSP processor. The proposed optimal scheduling first transforms the
nested loop with conditional branch into a single loop with conditional
branch with consideration of full utilization of ILP capability of the
VLIW processor and realization of earlier escape from the loop. Next,
the proposed optimal scheduling applies a modulo scheduling
technique developed for single loop. Based on this optimal scheduling
strategy, optimal implementation of SAD algorithm on TMS320C67x,
a VLIW DSP is presented. Through experiments on TMS320C6713
DSK, it is shown that H.263 encoder with the proposed SAD
implementation performs better than other H.263 encoder with other
SAD implementations, and that the code size of the optimal SAD
implementation is small enough to be appropriate for embedded
environments.
Abstract: This paper presents implementation of attitude controller for a small UAV using field programmable gate array (FPGA). Due to the small size constrain a miniature more compact and computationally extensive; autopilot platform is needed for such systems. More over UAV autopilot has to deal with extremely adverse situations in the shortest possible time, while accomplishing its mission. FPGAs in the recent past have rendered themselves as fast, parallel, real time, processing devices in a compact size. This work utilizes this fact and implements different attitude controllers for a small UAV in FPGA, using its parallel processing capabilities. Attitude controller is designed in MATLAB/Simulink environment. The discrete version of this controller is implemented using pipelining followed by retiming, to reduce the critical path and thereby clock period of the controller datapath. Pipelined, retimed, parallel PID controller implementation is done using rapidprototyping and testing efficient development tool of “system generator", which has been developed by Xilinx for FPGA implementation. The improved timing performance enables the controller to react abruptly to any changes made to the attitudes of UAV.
Abstract: In this paper, we study FPGA implementation of a
novel supra-optimal receiver diversity combining technique,
generalized maximal ratio combining (GMRC), for wireless
transmission over fading channels in SIMO systems. Prior
published results using ML-detected GMRC diversity signal
driven by BPSK showed superior bit error rate performance to
the widely used MRC combining scheme in an imperfect
channel estimation (ICE) environment. Under perfect channel
estimation conditions, the performance of GMRC and MRC
were identical. The main drawback of the GMRC study was
that it was theoretical, thus successful FPGA implementation
of it using pipeline techniques is needed as a wireless
communication test-bed for practical real-life situations.
Simulation results showed that the hardware implementation
was efficient both in terms of speed and area. Since diversity
combining is especially effective in small femto- and picocells,
internet-associated wireless peripheral systems are to
benefit most from GMRC. As a result, many spinoff
applications can be made to the hardware of IP-based 4th
generation networks.
Abstract: Long number multiplications (n ≥ 128-bit) are a
primitive in most cryptosystems. They can be performed better by
using Karatsuba-Ofman technique. This algorithm is easy to
parallelize on workstation network and on distributed memory, and
it-s known as the practical method of choice. Multiplying long
numbers using Karatsuba-Ofman algorithm is fast but is highly
recursive. In this paper, we propose different designs of
implementing Karatsuba-Ofman multiplier. A mixture of sequential
and combinational system design techniques involving pipelining is
applied to our proposed designs. Multiplying large numbers can be
adapted flexibly to time, area and power criteria. Computationally
and occupation constrained in embedded systems such as: smart
cards, mobile phones..., multiplication of finite field elements can be
achieved more efficiently. The proposed designs are compared to
other existing techniques. Mathematical models (Area (n), Delay (n))
of our proposed designs are also elaborated and evaluated on
different FPGAs devices.
Abstract: In this paper, we propose a fully-utilized, block-based 2D DWT (discrete wavelet transform) architecture, which consists of four 1D DWT filters with two-channel QMF lattice structure. The proposed architecture requires about 2MN-3N registers to save the intermediate results for higher level decomposition, where M and N stand for the filter length and the row width of the image respectively. Furthermore, the proposed 2D DWT processes in horizontal and vertical directions simultaneously without an idle period, so that it computes the DWT for an N×N image in a period of N2(1-2-2J)/3. Compared to the existing approaches, the proposed architecture shows 100% of hardware utilization and high throughput rates. To mitigate the long critical path delay due to the cascaded lattices, we can apply the pipeline technique with four stages, while retaining 100% of hardware utilization. The proposed architecture can be applied in real-time video signal processing.
Abstract: In MPEG and H.26x standards, to eliminate the
temporal redundancy we use motion estimation. Given that the
motion estimation stage is very complex in terms of computational
effort, a hardware implementation on a re-configurable circuit is
crucial for the requirements of different real time multimedia
applications. In this paper, we present hardware architecture for
motion estimation based on "Full Search Block Matching" (FSBM)
algorithm. This architecture presents minimum latency, maximum
throughput, full utilization of hardware resources such as embedded
memory blocks, and combining both pipelining and parallel
processing techniques. Our design is described in VHDL language,
verified by simulation and implemented in a Stratix II
EP2S130F1020C4 FPGA circuit. The experiment result show that the
optimum operating clock frequency of the proposed design is 89MHz
which achieves 160M pixels/sec.
Abstract: A new code synchronization algorithm is proposed in
this paper for the secondary cell-search stage in wideband CDMA
systems. Rather than using the Cyclically Permutable (CP) code in the
Secondary Synchronization Channel (S-SCH) to simultaneously
determine the frame boundary and scrambling code group, the new
synchronization algorithm implements the same function with less
system complexity and less Mean Acquisition Time (MAT). The
Secondary Synchronization Code (SSC) is redesigned by splitting into
two sub-sequences. We treat the information of scrambling code group
as data bits and use simple time diversity BCH coding for further
reliability. It avoids involved and time-costly Reed-Solomon (RS)
code computations and comparisons. Analysis and simulation results
show that the Synchronization Error Rate (SER) yielded by the new
algorithm in Rayleigh fading channels is close to that of the
conventional algorithm in the standard. This new synchronization
algorithm reduces system complexities, shortens the average
cell-search time and can be implemented in the slot-based cell-search
pipeline. By taking antenna diversity and pipelining correlation
processes, the new algorithm also shows its flexible application in
multiple antenna systems.