A Low-Area Fully-Reconfigurable Hardware Design of Fast Fourier Transform System for 3GPP-LTE Standard

This paper presents a low-area and fully-reconfigurable Fast Fourier Transform (FFT) hardware design for 3GPP-LTE communication standard. It can fully support 32 different FFT sizes, up to 2048 FFT points. Besides, a special processing element is developed for making reconfigurable computing characteristics possible, while first-in first-out (FIFO) scheduling scheme design technique is proposed for hardware-friendly FIFO resource arranging. In a synthesis chip realization via TSMC 40 nm CMOS technology, the hardware circuit only occupies core area of 0.2325 mm2 and dissipates 233.5 mW at maximal operating frequency of 250 MHz.

Neural Network Implementation Using FPGA: Issues and Application

.Hardware realization of a Neural Network (NN), to a large extent depends on the efficient implementation of a single neuron. FPGA-based reconfigurable computing architectures are suitable for hardware implementation of neural networks. FPGA realization of ANNs with a large number of neurons is still a challenging task. This paper discusses the issues involved in implementation of a multi-input neuron with linear/nonlinear excitation functions using FPGA. Implementation method with resource/speed tradeoff is proposed to handle signed decimal numbers. The VHDL coding developed is tested using Xilinx XC V50hq240 Chip. To improve the speed of operation a lookup table method is used. The problems involved in using a lookup table (LUT) for a nonlinear function is discussed. The percentage saving in resource and the improvement in speed with an LUT for a neuron is reported. An attempt is also made to derive a generalized formula for a multi-input neuron that facilitates to estimate approximately the total resource requirement and speed achievable for a given multilayer neural network. This facilitates the designer to choose the FPGA capacity for a given application. Using the proposed method of implementation a neural network based application, namely, a Space vector modulator for a vector-controlled drive is presented

RFU Based Computational Unit Design For Reconfigurable Processors

Fully customized hardware based technology provides high performance and low power consumption by specializing the tasks in hardware but lacks design flexibility since any kind of changes require re-design and re-fabrication. Software based solutions operate with software instructions due to which a great flexibility is achieved from the easy development and maintenance of the software code. But this execution of instructions introduces a high overhead in performance and area consumption. In past few decades the reconfigurable computing domain has been introduced which overcomes the traditional trades-off between flexibility and performance and is able to achieve high performance while maintaining a good flexibility. The dramatic gains in terms of chip performance and design flexibility achieved through the reconfigurable computing systems are greatly dependent on the design of their computational units being integrated with reconfigurable logic resources. The computational unit of any reconfigurable system plays vital role in defining its strength. In this research paper an RFU based computational unit design has been presented using the tightly coupled, multi-threaded reconfigurable cores. The proposed design has been simulated for VLIW based architectures and a high gain in performance has been observed as compared to the conventional computing systems.

Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.