A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Implementation of Edge Detection Based on Autofluorescence Endoscopic Image of Field Programmable Gate Array

Autofluorescence Imaging (AFI) is a technology for detecting early carcinogenesis of the gastrointestinal tract in recent years. Compared with traditional white light endoscopy (WLE), this technology greatly improves the detection accuracy of early carcinogenesis, because the colors of normal tissues are different from cancerous tissues. Thus, edge detection can distinguish them in grayscale images. In this paper, based on the traditional Sobel edge detection method, optimization has been performed on this method which considers the environment of the gastrointestinal, including adaptive threshold and morphological processing. All of the processes are implemented on our self-designed system based on the image sensor OV6930 and Field Programmable Gate Array (FPGA), The system can capture the gastrointestinal image taken by the lens in real time and detect edges. The final experiments verified the feasibility of our system and the effectiveness and accuracy of the edge detection algorithm.

FPGA Implementation of Adaptive Clock Recovery for TDMoIP Systems

Circuit switched networks widely used until the end of the 20th century have been transformed into packages switched networks. Time Division Multiplexing over Internet Protocol (TDMoIP) is a system that enables Time Division Multiplexing (TDM) traffic to be carried over packet switched networks (PSN). In TDMoIP systems, devices that send TDM data to the PSN and receive it from the network must operate with the same clock frequency. In this study, it was aimed to implement clock synchronization process in Field Programmable Gate Array (FPGA) chips using time information attached to the packages received from PSN. The designed hardware is verified using the datasets obtained for the different carrier types and comparing the results with the software model. Field tests are also performed by using the real time TDMoIP system.

Field-Programmable Gate Array Based Tester for Protective Relay

The reliability of the power grid depends on the successful operation of thousands of protective relays. The failure of one relay to operate as intended may lead the entire power grid to blackout. In fact, major power system failures during transient disturbances may be caused by unnecessary protective relay tripping rather than by the failure of a relay to operate. Adequate relay testing provides a first defense against false trips of the relay and hence improves power grid stability and prevents catastrophic bulk power system failures. The goal of this research project is to design and enhance the relay tester using a technology such as Field Programmable Gate Array (FPGA) card NI 7851. A PC based tester framework has been developed using Simulink power system model for generating signals under different conditions (faults or transient disturbances) and LabVIEW for developing the graphical user interface and configuring the FPGA. Besides, the interface system has been developed for outputting and amplifying the signals without distortion. These signals should be like the generated ones by the real power system and large enough for testing the relay’s functionality. The signals generated that have been displayed on the scope are satisfactory. Furthermore, the proposed testing system can be used for improving the performance of protective relay.

Run-Time Customisation of Soft-Core CPUs on Field Programmable Gate Array

The use of customised soft-core processors in which instructions can be integrated into a system in application hardware is increasing in the Field Programmable Gate Array (FPGA) field. Specifically, the partial run-time reconfiguration of FPGAs in specialised processors for a particular domain can be very beneficial. In this report, the design and implementation for the customisation of a soft-core MIPS processor using an FPGA and partial reconfiguration (PR) of FPGA technology will be addressed to achieve efficient resource use. This can be achieved using a PR design flow that helps the design fit into a smaller device. Moreover, the impact of static power consumption could be reduced due to runtime reconfiguration. This will be done by configurable custom instructions implemented in the hardware as an extension on the MIPS CPU. The aim of this project is to investigate the PR of FPGAs for run-time adaptations of the instruction set of a soft-core CPU, including the integration of custom instructions and the exploration of the potential to use the MultiBoot feature available in Xilinx FPGAs to carry out the PR process. The system will be evaluated and tested on a Nexus 3 development board featuring a Xilinx Spartran-6 FPGA. The system will be able to load reconfigurable custom instructions dynamically into user programs with the help of the trap handler when the custom instruction is called by the MIPS CPU. The results of this experiment demonstrate that custom instructions in hardware can speed up a certain function and many instructions can be saved when compared to a software implementation of the same function. Implementing custom instructions in hardware is perfectly possible and worth exploring.

A High Time Resolution Digital Pulse Width Modulator Based on Field Programmable Gate Array’s Phase Locked Loop Megafunction

The digital pulse width modulator (DPWM) is the crucial building block for digitally-controlled DC-DC switching converter, which converts the digital duty ratio signal into its analog counterpart to control the power MOSFET transistors on or off. With the increase of switching frequency of digitally-controlled DC-DC converter, the DPWM with higher time resolution is required. In this paper, a 15-bits DPWM with three-level hybrid structure is presented; the first level is composed of a7-bits counter and a comparator, the second one is a 5-bits delay line, and the third one is a 3-bits digital dither. The presented DPWM is designed and implemented using the PLL megafunction of FPGA (Field Programmable Gate Arrays), and the required frequency of clock signal is 128 times of switching frequency. The simulation results show that, for the switching frequency of 2 MHz, a DPWM which has the time resolution of 15 ps is achieved using a maximum clock frequency of 256MHz. The designed DPWM in this paper is especially useful for high-frequency digitally-controlled DC-DC switching converters.

Experimental Investigation of Indirect Field Oriented Control of Field Programmable Gate Array Based Five-Phase Induction Motor Drive

This paper analyzes the experimental investigation of indirect field oriented control of Field Programmable Gate Array (FPGA) based five-phase induction motor drive. A detailed d-q modeling and Space Vector Pulse Width Modulation (SVPWM) technique of 5-phase drive is elaborated in this paper. In the proposed work, the prototype model of 1 hp 5-phase Voltage Source Inverter (VSI) fed drive is implemented in hardware. SVPWM pulses are generated in FPGA platform through Very High Speed Integrated Circuit Hardware Description Language (VHDL) coding. The experimental results are observed under different loading conditions and compared with simulation results to validate the simulation model.

Field Programmable Gate Array Based Infinite Impulse Response Filter Using Multipliers

In this paper, an Infinite Impulse Response (IIR) filter has been designed and simulated on an Field Programmable Gate Arrays (FPGA). The implementation is based on Multiply Add and Accumulate (MAC) algorithm which uses multiply operations for design implementation. Parallel Pipelined structure is used to implement the proposed IIR Filter taking optimal advantage of the look up table of target device. The designed filter has been synthesized on Digital Signal Processor (DSP) slice based FPGA to perform multiplier function of MAC unit. The DSP slices are useful to enhance the speed performance. The proposed design is simulated with Matlab, synthesized with Xilinx Synthesis Tool, and implemented on FPGA devices. The Virtex 5 FPGA based design can operate at an estimated frequency of 81.5 MHz as compared to 40.5 MHz in case of Spartan 3 ADSP based design. The Virtex 5 based implementation also consumes less slices and slice flip flops of target FPGA in comparison to Spartan 3 ADSP based implementation to provide cost effective solution for signal processing applications.

Matrix Converter Fed Brushless DC Motor Using Field Programmable Gate Array

Brushless DC motors (BLDC) are widely used in industrial areas. The BLDC motors are driven either by indirect ACAC converters or by direct AC-AC converters. Direct AC-AC converters i.e. matrix converters are used in this paper to drive the three phase BLDC motor and it eliminates the bulky DC link energy storage element. A matrix converter converts the AC power supply to an AC voltage of variable amplitude and variable frequency. A control technique is designed to generate the switching pulses for the three phase matrix converter. For the control of speed of the BLDC motor a separate PI controller and Fuzzy Logic Controller (FLC) are designed and a hysteresis current controller is also designed for the control of motor torque. The control schemes are designed and tested separately. The simulation results of both the schemes are compared and contrasted in this paper. The results show that the fuzzy logic control scheme outperforms the PI control scheme in terms of dynamic performance of the BLDC motor. Simulation results are validated with the experimental results.

Supremacy of Differential Evolution Algorithm in Designing Multiplier-Less Low-Pass FIR Filter

In this communication, we have made an attempt to design multiplier-less low-pass finite impulse response (FIR) filter with the aid of various mutation strategies of Differential Evolution (DE) algorithm. Impulse response coefficient of the designed FIR filter has been represented as sums or differences of powers of two. Performance of the proposed filter has been evaluated in terms of its frequency response and associated hardware cost. Supremacy of our approach has been substantiated by comparing our result with many of the existing multiplier-less filter design algorithms of recent interest. It has also been demonstrated that DE-optimized filter outperforms Genetic Algorithm (GA) based design by a large margin.  Hardware efficiency of our algorithm has further been validated by implementing those filters on a Field Programmable Gate Array (FPGA) chip.

Comparison between Haar and Daubechies Wavelet Transformations on FPGA Technology

Recently, the Field Programmable Gate Array (FPGA) technology offers the potential of designing high performance systems at low cost. The discrete wavelet transform has gained the reputation of being a very effective signal analysis tool for many practical applications. However, due to its computation-intensive nature, current implementation of the transform falls short of meeting real-time processing requirements of most application. The objectives of this paper are implement the Haar and Daubechies wavelets using FPGA technology. In addition, the Bit Error Rate (BER) between the input audio signal and the reconstructed output signal for each wavelet is calculated. From the BER, it is seen that the implementations execute the operation of the wavelet transform correctly and satisfying the perfect reconstruction conditions. The design procedure has been explained and designed using the stat-ofart Electronic Design Automation (EDA) tools for system design on FPGA. Simulation, synthesis and implementation on the FPGA target technology has been carried out.

Low Power Approach for Decimation Filter Hardware Realization

There are multiple ways to implement a decimator filter. This paper addresses usage of CIC (cascaded-integrator-comb) filter and HB (half band) filter as the decimator filter to reduce the frequency sample rate by factor of 64 and detail of the implementation step to realize this design in hardware. Low power design approach for CIC filter and half band filter will be discussed. The filter design is implemented through MATLAB system modeling, ASIC (application specific integrated circuit) design flow and verified using a FPGA (field programmable gate array) board and MATLAB analysis.

A Novel Digital Implementation of AC Voltage Controller for Speed Control of Induction Motor

In this paper a novel, simple and reliable digital firing scheme has been implemented for speed control of three-phase induction motor using ac voltage controller. The system consists of three-phase supply connected to the three-phase induction motor via three triacs and its control circuit. The ac voltage controller has three modes of operation depending on the shape of supply current. The performance of the induction motor differs in each mode where the speed is directly proportional with firing angle in two modes and inversely in the third one. So, the control system has to detect the current mode of operation to choose the correct firing angle of triacs. Three sensors are used to feed the line currents to control system to detect the mode of operation. The control strategy is implemented using a low cost Xilinx Spartan-3E field programmable gate array (FPGA) device. Three PI-controllers are designed on FPGA to control the system in the three-modes. Simulation of the system is carried out using PSIM computer program. The simulation results show stable operation for different loading conditions especially in mode 2/3. The simulation results have been compared with the experimental results from laboratory prototype.

FPGA Implementation of the “PYRAMIDS“ Block Cipher

The “PYRAMIDS" Block Cipher is a symmetric encryption algorithm of a 64, 128, 256-bit length, that accepts a variable key length of 128, 192, 256 bits. The algorithm is an iterated cipher consisting of repeated applications of a simple round transformation with different operations and different sequence in each round. The algorithm was previously software implemented in Cµ code. In this paper, a hardware implementation of the algorithm, using Field Programmable Gate Arrays (FPGA), is presented. In this work, we discuss the algorithm, the implemented micro-architecture, and the simulation and implementation results. Moreover, we present a detailed comparison with other implemented standard algorithms. In addition, we include the floor plan as well as the circuit diagrams of the various micro-architecture modules.

Spacecraft Neural Network Control System Design using FPGA

Designing and implementing intelligent systems has become a crucial factor for the innovation and development of better products of space technologies. A neural network is a parallel system, capable of resolving paradigms that linear computing cannot. Field programmable gate array (FPGA) is a digital device that owns reprogrammable properties and robust flexibility. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI and DSP chips. So, many researchers have made great efforts on the realization of neural network (NN) using FPGA technique. In this paper, an introduction of ANN and FPGA technique are briefly shown. Also, Hardware Description Language (VHDL) code has been proposed to implement ANNs as well as to present simulation results with floating point arithmetic. Synthesis results for ANN controller are developed using Precision RTL. Proposed VHDL implementation creates a flexible, fast method and high degree of parallelism for implementing ANN. The implementation of multi-layer NN using lookup table LUT reduces the resource utilization for implementation and time for execution.

FPGA Based Longitudinal and Lateral Controller Implementation for a Small UAV

This paper presents implementation of attitude controller for a small UAV using field programmable gate array (FPGA). Due to the small size constrain a miniature more compact and computationally extensive; autopilot platform is needed for such systems. More over UAV autopilot has to deal with extremely adverse situations in the shortest possible time, while accomplishing its mission. FPGAs in the recent past have rendered themselves as fast, parallel, real time, processing devices in a compact size. This work utilizes this fact and implements different attitude controllers for a small UAV in FPGA, using its parallel processing capabilities. Attitude controller is designed in MATLAB/Simulink environment. The discrete version of this controller is implemented using pipelining followed by retiming, to reduce the critical path and thereby clock period of the controller datapath. Pipelined, retimed, parallel PID controller implementation is done using rapidprototyping and testing efficient development tool of “system generator", which has been developed by Xilinx for FPGA implementation. The improved timing performance enables the controller to react abruptly to any changes made to the attitudes of UAV.

Position Control of an AC Servo Motor Using VHDL and FPGA

In this paper, a new method of controlling position of AC Servomotor using Field Programmable Gate Array (FPGA). FPGA controller is used to generate direction and the number of pulses required to rotate for a given angle. Pulses are sent as a square wave, the number of pulses determines the angle of rotation and frequency of square wave determines the speed of rotation. The proposed control scheme has been realized using XILINX FPGA SPARTAN XC3S400 and tested using MUMA012PIS model Alternating Current (AC) servomotor. Experimental results show that the position of the AC Servo motor can be controlled effectively. KeywordsAlternating Current (AC), Field Programmable Gate Array (FPGA), Liquid Crystal Display (LCD).

FPGA-based Systems for Evolvable Hardware

Since 1992, year where Hugo de Garis has published the first paper on Evolvable Hardware (EHW), a period of intense creativity has followed. It has been actively researched, developed and applied to various problems. Different approaches have been proposed that created three main classifications: extrinsic, mixtrinsic and intrinsic EHW. Each of these solutions has a real interest. Nevertheless, although the extrinsic evolution generates some excellent results, the intrinsic systems are not so advanced. This paper suggests 3 possible solutions to implement the run-time configuration intrinsic EHW system: FPGA-based Run-Time Configuration system, JBits-based Run-Time Configuration system and Multi-board functional-level Run-Time Configuration system. The main characteristic of the proposed architectures is that they are implemented on Field Programmable Gate Array. A comparison of proposed solutions demonstrates that multi-board functional-level run-time configuration is superior in terms of scalability, flexibility and the implementation easiness.

Power Optimization Techniques in FPGA Devices: A Combination of System- and Low-Levels

This paper presents preliminary results regarding system-level power awareness for FPGA implementations in wireless sensor networks. Re-configurability of field programmable gate arrays (FPGA) allows for significant flexibility in its applications to embedded systems. However, high power consumption in FPGA becomes a significant factor in design considerations. We present several ideas and their experimental verifications on how to optimize power consumption at high level of designing process while maintaining the same energy per operation (low-level methods can be used additionally). This paper demonstrates that it is possible to estimate feasible power consumption savings even at the high level of designing process. It is envisaged that our results can be also applied to other embedded systems applications, not limited to FPGA-based.

A Microcontroller Implementation of Model Predictive Control

Model Predictive Control (MPC) is increasingly being proposed for real time applications and embedded systems. However comparing to PID controller, the implementation of the MPC in miniaturized devices like Field Programmable Gate Arrays (FPGA) and microcontrollers has historically been very small scale due to its complexity in implementation and its computation time requirement. At the same time, such embedded technologies have become an enabler for future manufacturing enterprises as well as a transformer of organizations and markets. Recently, advances in microelectronics and software allow such technique to be implemented in embedded systems. In this work, we take advantage of these recent advances in this area in the deployment of one of the most studied and applied control technique in the industrial engineering. In fact in this paper, we propose an efficient framework for implementation of Generalized Predictive Control (GPC) in the performed STM32 microcontroller. The STM32 keil starter kit based on a JTAG interface and the STM32 board was used to implement the proposed GPC firmware. Besides the GPC, the PID anti windup algorithm was also implemented using Keil development tools designed for ARM processor-based microcontroller devices and working with C/Cµ langage. A performances comparison study was done between both firmwares. This performances study show good execution speed and low computational burden. These results encourage to develop simple predictive algorithms to be programmed in industrial standard hardware. The main features of the proposed framework are illustrated through two examples and compared with the anti windup PID controller.