FPGA-based Systems for Evolvable Hardware

Since 1992, year where Hugo de Garis has published the first paper on Evolvable Hardware (EHW), a period of intense creativity has followed. It has been actively researched, developed and applied to various problems. Different approaches have been proposed that created three main classifications: extrinsic, mixtrinsic and intrinsic EHW. Each of these solutions has a real interest. Nevertheless, although the extrinsic evolution generates some excellent results, the intrinsic systems are not so advanced. This paper suggests 3 possible solutions to implement the run-time configuration intrinsic EHW system: FPGA-based Run-Time Configuration system, JBits-based Run-Time Configuration system and Multi-board functional-level Run-Time Configuration system. The main characteristic of the proposed architectures is that they are implemented on Field Programmable Gate Array. A comparison of proposed solutions demonstrates that multi-board functional-level run-time configuration is superior in terms of scalability, flexibility and the implementation easiness.

Efficient Hardware Realization of Truncated Multipliers using FPGA

Truncated multiplier is a good candidate for digital signal processing (DSP) applications including finite impulse response (FIR) and discrete cosine transform (DCT). Through truncated multiplier a significant reduction in Field Programmable Gate Array (FPGA) resources can be achieved. This paper presents for the first time a comparison of resource utilization of Spartan-3AN and Virtex-5 implementation of standard and truncated multipliers using Very High Speed Integrated Circuit Hardware Description Language (VHDL). The Virtex-5 FPGA shows significant improvement as compared to Spartan-3AN FPGA device. The Virtex-5 FPGA device shows better performance with a percentage ratio of number of occupied slices for standard to truncated multipliers is increased from 40% to 73.86% as compared to Spartan- 3AN is decreased from 68.75% to 58.78%. Results show that the anomaly in Spartan-3AN FPGA device average connection and maximum pin delay have been efficiently reduced in Virtex-5 FPGA device.

Performance Comparison of Real Time EDAC Systems for Applications On-Board Small Satellites

On-board Error Detection and Correction (EDAC) devices aim to secure data transmitted between the central processing unit (CPU) of a satellite onboard computer and its local memory. This paper presents a comparison of the performance of four low complexity EDAC techniques for application in Random Access Memories (RAMs) on-board small satellites. The performance of a newly proposed EDAC architecture is measured and compared with three different EDAC strategies, using the same FPGA technology. A statistical analysis of single-event upset (SEU) and multiple-bit upset (MBU) activity in commercial memories onboard Alsat-1 is given for a period of 8 years

Power Optimization Techniques in FPGA Devices: A Combination of System- and Low-Levels

This paper presents preliminary results regarding system-level power awareness for FPGA implementations in wireless sensor networks. Re-configurability of field programmable gate arrays (FPGA) allows for significant flexibility in its applications to embedded systems. However, high power consumption in FPGA becomes a significant factor in design considerations. We present several ideas and their experimental verifications on how to optimize power consumption at high level of designing process while maintaining the same energy per operation (low-level methods can be used additionally). This paper demonstrates that it is possible to estimate feasible power consumption savings even at the high level of designing process. It is envisaged that our results can be also applied to other embedded systems applications, not limited to FPGA-based.

An Improvement of PDLZW implementation with a Modified WSC Updating Technique on FPGA

In this paper, an improvement of PDLZW implementation with a new dictionary updating technique is proposed. A unique dictionary is partitioned into hierarchical variable word-width dictionaries. This allows us to search through dictionaries in parallel. Moreover, the barrel shifter is adopted for loading a new input string into the shift register in order to achieve a faster speed. However, the original PDLZW uses a simple FIFO update strategy, which is not efficient. Therefore, a new window based updating technique is implemented to better classify the difference in how often each particular address in the window is referred. The freezing policy is applied to the address most often referred, which would not be updated until all the other addresses in the window have the same priority. This guarantees that the more often referred addresses would not be updated until their time comes. This updating policy leads to an improvement on the compression efficiency of the proposed algorithm while still keep the architecture low complexity and easy to implement.

Efficient Large Numbers Karatsuba-Ofman Multiplier Designs for Embedded Systems

Long number multiplications (n ≥ 128-bit) are a primitive in most cryptosystems. They can be performed better by using Karatsuba-Ofman technique. This algorithm is easy to parallelize on workstation network and on distributed memory, and it-s known as the practical method of choice. Multiplying long numbers using Karatsuba-Ofman algorithm is fast but is highly recursive. In this paper, we propose different designs of implementing Karatsuba-Ofman multiplier. A mixture of sequential and combinational system design techniques involving pipelining is applied to our proposed designs. Multiplying large numbers can be adapted flexibly to time, area and power criteria. Computationally and occupation constrained in embedded systems such as: smart cards, mobile phones..., multiplication of finite field elements can be achieved more efficiently. The proposed designs are compared to other existing techniques. Mathematical models (Area (n), Delay (n)) of our proposed designs are also elaborated and evaluated on different FPGAs devices.

The Simulation and Realization of Input-Buffer Scheduling Algorithm in Satellite Switching System

Scheduling algorithm is a key technology in satellite switching system with input-buffer. In this paper, a new scheduling algorithm and its realization are proposed. Based on Crossbar switching fabric, the algorithm adopts serial scheduling strategy and adjusts the output port arbitrating strategy for the better equity of every port. Consequently, it increases the matching probability. The algorithm can greatly reduce the scheduling delay and cell loss rate. The analysis and simulation results by OPNET show that the proposed algorithm has the better performance than others in average delay and cell loss rate, and has the equivalent complexity. On the basis of these results, the hardware realization and simulation based on FPGA are completed, which validate the feasibility of the new scheduling algorithm.

Analysis of CNT Bundle and its Comparison with Copper for FPGAs Interconnects

Each new semiconductor technology node brings smaller transistors and wires. Although this makes transistors faster, wires get slower. In nano-scale regime, the standard copper (Cu) interconnect will become a major hurdle for FPGA interconnect due to their high resistivity and electromigration. This paper presents the comprehensive evaluation of mixed CNT bundle interconnects and investigates their prospects as energy efficient and high speed interconnect for future FPGA routing architecture. All HSPICE simulations are carried out at operating frequency of 1GHz and it is found that mixed CNT bundle implemented in FPGAs as interconnect can potentially provide a substantial delay and energy reduction over traditional interconnects at 32nm process technology.

An Innovative Wireless Sensor Network Protocol Implementation using a Hybrid FPGA Technology

Traditional development of wireless sensor network mote is generally based on SoC1 platform. Such method of development faces three main drawbacks: lack of flexibility in terms of development due to low resource and rigid architecture of SoC; low capability of evolution and portability versus performance if specific micro-controller architecture features are used; and the rapid obsolescence of micro-controller comparing to the long lifetime of power plants or any industrial installations. To overcome these drawbacks, we have explored a new approach of development of wireless sensor network mote using a hybrid FPGA technology. The application of such approach is illustrated through the implementation of an innovative wireless sensor network protocol called OCARI.

A Microcontroller Implementation of Model Predictive Control

Model Predictive Control (MPC) is increasingly being proposed for real time applications and embedded systems. However comparing to PID controller, the implementation of the MPC in miniaturized devices like Field Programmable Gate Arrays (FPGA) and microcontrollers has historically been very small scale due to its complexity in implementation and its computation time requirement. At the same time, such embedded technologies have become an enabler for future manufacturing enterprises as well as a transformer of organizations and markets. Recently, advances in microelectronics and software allow such technique to be implemented in embedded systems. In this work, we take advantage of these recent advances in this area in the deployment of one of the most studied and applied control technique in the industrial engineering. In fact in this paper, we propose an efficient framework for implementation of Generalized Predictive Control (GPC) in the performed STM32 microcontroller. The STM32 keil starter kit based on a JTAG interface and the STM32 board was used to implement the proposed GPC firmware. Besides the GPC, the PID anti windup algorithm was also implemented using Keil development tools designed for ARM processor-based microcontroller devices and working with C/Cµ langage. A performances comparison study was done between both firmwares. This performances study show good execution speed and low computational burden. These results encourage to develop simple predictive algorithms to be programmed in industrial standard hardware. The main features of the proposed framework are illustrated through two examples and compared with the anti windup PID controller.

FPGA Based Parallel Architecture for the Computation of Third-Order Cross Moments

Higher-order Statistics (HOS), also known as cumulants, cross moments and their frequency domain counterparts, known as poly spectra have emerged as a powerful signal processing tool for the synthesis and analysis of signals and systems. Algorithms used for the computation of cross moments are computationally intensive and require high computational speed for real-time applications. For efficiency and high speed, it is often advantageous to realize computation intensive algorithms in hardware. A promising solution that combines high flexibility together with the speed of a traditional hardware is Field Programmable Gate Array (FPGA). In this paper, we present FPGA-based parallel architecture for the computation of third-order cross moments. The proposed design is coded in Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL) and functionally verified by implementing it on Xilinx Spartan-3 XC3S2000FG900-4 FPGA. Implementation results are presented and it shows that the proposed design can operate at a maximum frequency of 86.618 MHz.

Design and Implementation of TMS320C31 DSP and FPGA for Conventional Direct Torque Control (DTC) of Induction Machines

This paper introduces a new digital logic design, which combines the DSP and FPGA to implement the conventional DTC of induction machine. The DSP will be used for floating point calculation whereas the FPGA main task is to implement the hysteresis-based controller. The emphasis is on FPGA digital logic design. The simulation and experimental results are presented and summarized.

FPGA Implementation of RSA Cryptosystem

In this paper, the hardware implementation of the RSA public-key cryptographic algorithm is presented. The RSA cryptographic algorithm is depends on the computation of repeated modular exponentials. The Montgomery algorithm is used and modified to reduce hardware resources and to achieve reasonable operating speed for FPGA. An efficient architecture for modular multiplications based on the array multiplier is proposed. We have implemented a RSA cryptosystem based on Montgomery algorithm. As a result, it is shown that proposed architecture contributes to small area and reasonable speed.

FPGA Implement of a Vision Based Lane Departure Warning System

Using vision based solution in intelligent vehicle application often needs large memory to handle video stream and image process which increase complexity of hardware and software. In this paper, we present a FPGA implement of a vision based lane departure warning system. By taking frame of videos, the line gradient of line is estimated and the lane marks are found. By analysis the position of lane mark, departure of vehicle will be detected in time. This idea has been implemented in Xilinx Spartan6 FPGA. The lane departure warning system used 39% logic resources and no memory of the device. The average availability is 92.5%. The frame rate is more than 30 frames per second (fps).

Digital Power Management Hardware Realization Using FPGA

This paper describes design of a digital feedback loop for a low switching frequency dc-dc switching converters. Low switching frequencies were selected in this design. A look up table for the digital PID (proportional integrator differentiator) compensator was implemented using Altera Stratix II with built-in ADC (analog-to-digital converter) to achieve this hardware realization. Design guidelines are given for the PID compensator, high frequency DPWM (digital pulse width modulator) and moving average filter.

Program Memories Error Detection and Correction On-Board Earth Observation Satellites

Memory Errors Detection and Correction aim to secure the transaction of data between the central processing unit of a satellite onboard computer and its local memory. In this paper, the application of a double-bit error detection and correction method is described and implemented in Field Programmable Gate Array (FPGA) technology. The performance of the proposed EDAC method is measured and compared with two different EDAC devices, using the same FPGA technology. Statistical analysis of single-event upset (SEU) and multiple-bit upset (MBU) activity in commercial memories onboard the first Algerian microsatellite Alsat-1 is given.

Fully Parameterizable FPGA based Crypto-Accelerator

In this paper, RSA encryption algorithm and its hardware implementation in Xilinx-s Virtex Field Programmable Gate Arrays (FPGA) is analyzed. The issues of scalability, flexible performance, and silicon efficiency for the hardware acceleration of public key crypto systems are being explored in the present work. Using techniques based on the interleaved math for exponentiation, the proposed RSA calculation architecture is compared to existing FPGA-based solutions for speed, FPGA utilization, and scalability. The paper covers the RSA encryption algorithm, interleaved multiplication, Miller Rabin algorithm for primality test, extended Euclidean math, basic FPGA technology, and the implementation details of the proposed RSA calculation architecture. Performance of several alternative hardware architectures is discussed and compared. Finally, conclusion is drawn, highlighting the advantages of a fully flexible & parameterized design.

Performance Evaluation of a Neural Network based General Purpose Space Vector Modulator

Space Vector Modulation (SVM) is an optimum Pulse Width Modulation (PWM) technique for an inverter used in a variable frequency drive applications. It is computationally rigorous and hence limits the inverter switching frequency. Increase in switching frequency can be achieved using Neural Network (NN) based SVM, implemented on application specific chips. This paper proposes a neural network based SVM technique for a Voltage Source Inverter (VSI). The network proposed is independent of switching frequency. Different architectures are investigated keeping the total number of neurons constant. The performance of the inverter is compared for various switching frequencies for different architectures of NN based SVM. From the results obtained, the network with minimum resource and appropriate word length is identified. The bit precision required for this application is identified. The network with 8-bit precision is implemented in the IC XCV 400 and the results are presented. The performance of NN based general purpose SVM with higher bit precision is discussed.

Compact Binary Tree Representation of Logic Function with Enhanced Throughput

An effective approach for realizing the binary tree structure, representing a combinational logic functionality with enhanced throughput, is discussed in this paper. The optimization in maximum operating frequency was achieved through delay minimization, which in turn was possible by means of reducing the depth of the binary network. The proposed synthesis methodology has been validated by experimentation with FPGA as the target technology. Though our proposal is technology independent, yet the heuristic enables better optimization in throughput even after technology mapping for such Boolean functionality; whose reduced CNF form is associated with a lesser literal cost than its reduced DNF form at the Boolean equation level. For cases otherwise, our method converges to similar results as that of [12]. The practical results obtained for a variety of case studies demonstrate an improvement in the maximum throughput rate for Spartan IIE (XC2S50E-7FT256) and Spartan 3 (XC3S50-4PQ144) FPGA logic families by 10.49% and 13.68% respectively. With respect to the LUTs and IOBUFs required for physical implementation of the requisite non-regenerative logic functionality, the proposed method enabled savings to the tune of 44.35% and 44.67% respectively, over the existing efficient method available in literature [12].

VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High-Speed Image Computing

This paper presents a VLSI design approach of a highspeed and real-time 2-D Discrete Wavelet Transform computing. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermore, an advanced twodimensional (2-D) discrete wavelet transform (DWT) implementation, with an efficient memory area, is designed to produce one output in every clock cycle. As a result, a very highspeed is attained. The system is verified, using JPEG2000 coefficients filters, on Xilinx Virtex-II Field Programmable Gate Array (FPGA) device without accessing any external memory. The resulting computing rate is up to 270 M samples/s and the (9,7) 2-D wavelet filter uses only 18 kb of memory (16 kb of first-in-first-out memory) with 256×256 image size. In this way, the developed design requests reduced memory and provide very high-speed processing as well as high PSNR quality.