A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Digital Encoder Based Power Frequency Deviation Measurement

In this paper, a simple method is presented for measurement of power frequency deviations. A phase locked loop (PLL) is used to multiply the signal under test by a factor of 100. The number of pulses in this pulse train signal is counted over a stable known period, using decade driving assemblies (DDAs) and flip-flops. These signals are combined using logic gates and then passed through decade counters to give a unique combination of pulses or levels, which are further encoded. These pulses are equally suitable for both control applications and display units. The experimental circuit developed gives a resolution of 1 Hz within the measurement period of 20 ms. The proposed circuit is also simulated in Verilog Hardware Description Language (VHDL) and implemented using Field Programing Gate Arrays (FPGAs). A Mixed signal Oscilloscope (MSO) is used to observe the results of FPGA implementation. These results are compared with the results of the proposed circuit of discrete components. The proposed system is useful for frequency deviation measurement and control in power systems.

Implementation of Edge Detection Based on Autofluorescence Endoscopic Image of Field Programmable Gate Array

Autofluorescence Imaging (AFI) is a technology for detecting early carcinogenesis of the gastrointestinal tract in recent years. Compared with traditional white light endoscopy (WLE), this technology greatly improves the detection accuracy of early carcinogenesis, because the colors of normal tissues are different from cancerous tissues. Thus, edge detection can distinguish them in grayscale images. In this paper, based on the traditional Sobel edge detection method, optimization has been performed on this method which considers the environment of the gastrointestinal, including adaptive threshold and morphological processing. All of the processes are implemented on our self-designed system based on the image sensor OV6930 and Field Programmable Gate Array (FPGA), The system can capture the gastrointestinal image taken by the lens in real time and detect edges. The final experiments verified the feasibility of our system and the effectiveness and accuracy of the edge detection algorithm.

FPGA Implementation of the BB84 Protocol

The development of a quantum key distribution (QKD) system on a field-programmable gate array (FPGA) platform is the subject of this paper. A quantum cryptographic protocol is designed based on the properties of quantum information and the characteristics of FPGAs. The proposed protocol performs key extraction, reconciliation, error correction, and privacy amplification tasks to generate a perfectly secret final key. We modeled the presence of the spy in our system with a strategy to reveal some of the exchanged information without being noticed. Using an FPGA card with a 100 MHz clock frequency, we have demonstrated the evolution of the error rate as well as the amounts of mutual information (between the two interlocutors and that of the spy) passing from one step to another in the key generation process.

FPGA Implementation of Adaptive Clock Recovery for TDMoIP Systems

Circuit switched networks widely used until the end of the 20th century have been transformed into packages switched networks. Time Division Multiplexing over Internet Protocol (TDMoIP) is a system that enables Time Division Multiplexing (TDM) traffic to be carried over packet switched networks (PSN). In TDMoIP systems, devices that send TDM data to the PSN and receive it from the network must operate with the same clock frequency. In this study, it was aimed to implement clock synchronization process in Field Programmable Gate Array (FPGA) chips using time information attached to the packages received from PSN. The designed hardware is verified using the datasets obtained for the different carrier types and comparing the results with the software model. Field tests are also performed by using the real time TDMoIP system.

The DAQ Debugger for iFDAQ of the COMPASS Experiment

In general, state-of-the-art Data Acquisition Systems (DAQ) in high energy physics experiments must satisfy high requirements in terms of reliability, efficiency and data rate capability. This paper presents the development and deployment of a debugging tool named DAQ Debugger for the intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN. Utilizing a hardware event builder, the iFDAQ is designed to be able to readout data at the average maximum rate of 1.5 GB/s of the experiment. In complex softwares, such as the iFDAQ, having thousands of lines of code, the debugging process is absolutely essential to reveal all software issues. Unfortunately, conventional debugging of the iFDAQ is not possible during the real data taking. The DAQ Debugger is a tool for identifying a problem, isolating the source of the problem, and then either correcting the problem or determining a way to work around it. It provides the layer for an easy integration to any process and has no impact on the process performance. Based on handling of system signals, the DAQ Debugger represents an alternative to conventional debuggers provided by most integrated development environments. Whenever problem occurs, it generates reports containing all necessary information important for a deeper investigation and analysis. The DAQ Debugger was fully incorporated to all processes in the iFDAQ during the run 2016. It helped to reveal remaining software issues and improved significantly the stability of the system in comparison with the previous run. In the paper, we present the DAQ Debugger from several insights and discuss it in a detailed way.

The Communication Library DIALOG for iFDAQ of the COMPASS Experiment

Modern experiments in high energy physics impose great demands on the reliability, the efficiency, and the data rate of Data Acquisition Systems (DAQ). This contribution focuses on the development and deployment of the new communication library DIALOG for the intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN. The iFDAQ utilizing a hardware event builder is designed to be able to readout data at the maximum rate of the experiment. The DIALOG library is a communication system both for distributed and mixed environments, it provides a network transparent inter-process communication layer. Using the high-performance and modern C++ framework Qt and its Qt Network API, the DIALOG library presents an alternative to the previously used DIM library. The DIALOG library was fully incorporated to all processes in the iFDAQ during the run 2016. From the software point of view, it might be considered as a significant improvement of iFDAQ in comparison with the previous run. To extend the possibilities of debugging, the online monitoring of communication among processes via DIALOG GUI is a desirable feature. In the paper, we present the DIALOG library from several insights and discuss it in a detailed way. Moreover, the efficiency measurement and comparison with the DIM library with respect to the iFDAQ requirements is provided.

CPU Architecture Based on Static Hardware Scheduler Engine and Multiple Pipeline Registers

The development of CPUs and of real-time systems based on them made it possible to use time at increasingly low resolutions. Together with the scheduling methods and algorithms, time organizing has been improved so as to respond positively to the need for optimization and to the way in which the CPU is used. This presentation contains both a detailed theoretical description and the results obtained from research on improving the performances of the nMPRA (Multi Pipeline Register Architecture) processor by implementing specific functions in hardware. The proposed CPU architecture has been developed, simulated and validated by using the FPGA Virtex-7 circuit, via a SoC project. Although the nMPRA processor hardware structure with five pipeline stages is very complex, the present paper presents and analyzes the tests dedicated to the implementation of the CPU and of the memory on-chip for instructions and data. In order to practically implement and test the entire SoC project, various tests have been performed. These tests have been performed in order to verify the drivers for peripherals and the boot module named Bootloader.

Field-Programmable Gate Array Based Tester for Protective Relay

The reliability of the power grid depends on the successful operation of thousands of protective relays. The failure of one relay to operate as intended may lead the entire power grid to blackout. In fact, major power system failures during transient disturbances may be caused by unnecessary protective relay tripping rather than by the failure of a relay to operate. Adequate relay testing provides a first defense against false trips of the relay and hence improves power grid stability and prevents catastrophic bulk power system failures. The goal of this research project is to design and enhance the relay tester using a technology such as Field Programmable Gate Array (FPGA) card NI 7851. A PC based tester framework has been developed using Simulink power system model for generating signals under different conditions (faults or transient disturbances) and LabVIEW for developing the graphical user interface and configuring the FPGA. Besides, the interface system has been developed for outputting and amplifying the signals without distortion. These signals should be like the generated ones by the real power system and large enough for testing the relay’s functionality. The signals generated that have been displayed on the scope are satisfactory. Furthermore, the proposed testing system can be used for improving the performance of protective relay.

Improving the Performances of the nMPRA Architecture by Implementing Specific Functions in Hardware

Minimizing the response time to asynchronous events in a real-time system is an important factor in increasing the speed of response and an interesting concept in designing equipment fast enough for the most demanding applications. The present article will present the results regarding the validation of the nMPRA (Multi Pipeline Register Architecture) architecture using the FPGA Virtex-7 circuit. The nMPRA concept is a hardware processor with the scheduler implemented at the processor level; this is done without affecting a possible bus communication, as is the case with the other CPU solutions. The implementation of static or dynamic scheduling operations in hardware and the improvement of handling interrupts and events by the real-time executive described in the present article represent a key solution for eliminating the overhead of the operating system functions. The nMPRA processor is capable of executing a preemptive scheduling, using various algorithms without a software scheduler. Therefore, we have also presented various scheduling methods and algorithms used in scheduling the real-time tasks.

Analysis of Lightweight Register Hardware Threat

In this paper, we present a design methodology of lightweight register transfer level (RTL) hardware threat implemented based on a MAX II FPGA platform. The dynamic power consumed by the toggling of the various bit of registers as well as the dynamic power consumed per unit of logic circuits were analyzed. The hardware threat was designed taking advantage of the differences in dynamic power consumed per unit of logic circuits to hide the transfer information. The experiment result shows that the register hardware threat was successfully implemented by using different dynamic power consumed per unit of logic circuits to hide the key information of DES encryption module. It needs more than 100000 sample curves to reduce the background noise by comparing the sample space when it completely meets the time alignment requirement. In additional, an external trigger signal is playing a very important role to detect the hardware threat in this experiment.

Run-Time Customisation of Soft-Core CPUs on Field Programmable Gate Array

The use of customised soft-core processors in which instructions can be integrated into a system in application hardware is increasing in the Field Programmable Gate Array (FPGA) field. Specifically, the partial run-time reconfiguration of FPGAs in specialised processors for a particular domain can be very beneficial. In this report, the design and implementation for the customisation of a soft-core MIPS processor using an FPGA and partial reconfiguration (PR) of FPGA technology will be addressed to achieve efficient resource use. This can be achieved using a PR design flow that helps the design fit into a smaller device. Moreover, the impact of static power consumption could be reduced due to runtime reconfiguration. This will be done by configurable custom instructions implemented in the hardware as an extension on the MIPS CPU. The aim of this project is to investigate the PR of FPGAs for run-time adaptations of the instruction set of a soft-core CPU, including the integration of custom instructions and the exploration of the potential to use the MultiBoot feature available in Xilinx FPGAs to carry out the PR process. The system will be evaluated and tested on a Nexus 3 development board featuring a Xilinx Spartran-6 FPGA. The system will be able to load reconfigurable custom instructions dynamically into user programs with the help of the trap handler when the custom instruction is called by the MIPS CPU. The results of this experiment demonstrate that custom instructions in hardware can speed up a certain function and many instructions can be saved when compared to a software implementation of the same function. Implementing custom instructions in hardware is perfectly possible and worth exploring.

Design of Local Interconnect Network Controller for Automotive Applications

Local interconnect network (LIN) is a communication protocol that combines sensors, actuators, and processors to a functional module in automotive applications. In this paper, a LIN ver. 2.2A controller was designed in Verilog hardware description language (Verilog HDL) and implemented in field-programmable gate array (FPGA). Its operation was verified by making full-scale LIN network with the presented FPGA-implemented LIN controller, commercial LIN transceivers, and commercial processors. When described in Verilog HDL and synthesized in 0.18 μm technology, its gate size was about 2,300 gates.

A High Time Resolution Digital Pulse Width Modulator Based on Field Programmable Gate Array’s Phase Locked Loop Megafunction

The digital pulse width modulator (DPWM) is the crucial building block for digitally-controlled DC-DC switching converter, which converts the digital duty ratio signal into its analog counterpart to control the power MOSFET transistors on or off. With the increase of switching frequency of digitally-controlled DC-DC converter, the DPWM with higher time resolution is required. In this paper, a 15-bits DPWM with three-level hybrid structure is presented; the first level is composed of a7-bits counter and a comparator, the second one is a 5-bits delay line, and the third one is a 3-bits digital dither. The presented DPWM is designed and implemented using the PLL megafunction of FPGA (Field Programmable Gate Arrays), and the required frequency of clock signal is 128 times of switching frequency. The simulation results show that, for the switching frequency of 2 MHz, a DPWM which has the time resolution of 15 ps is achieved using a maximum clock frequency of 256MHz. The designed DPWM in this paper is especially useful for high-frequency digitally-controlled DC-DC switching converters.

Experimental Investigation of Indirect Field Oriented Control of Field Programmable Gate Array Based Five-Phase Induction Motor Drive

This paper analyzes the experimental investigation of indirect field oriented control of Field Programmable Gate Array (FPGA) based five-phase induction motor drive. A detailed d-q modeling and Space Vector Pulse Width Modulation (SVPWM) technique of 5-phase drive is elaborated in this paper. In the proposed work, the prototype model of 1 hp 5-phase Voltage Source Inverter (VSI) fed drive is implemented in hardware. SVPWM pulses are generated in FPGA platform through Very High Speed Integrated Circuit Hardware Description Language (VHDL) coding. The experimental results are observed under different loading conditions and compared with simulation results to validate the simulation model.

Power Integrity Analysis of Power Delivery System in High Speed Digital FPGA Board

Power plane noise is the most significant source of signal integrity (SI) issues in a high-speed digital design. In this paper, power integrity (PI) analysis of multiple power planes in a power delivery system of a 12-layer high-speed FPGA board is presented. All 10 power planes of HSD board are analyzed separately by using 3D Electromagnetic based PI solver, then the transient simulation is performed on combined PI data of all planes along with voltage regulator modules (VRMs) and 70 current drawing chips to get the board level power noise coupling on different high-speed signals. De-coupling capacitors are placed between power planes and ground to reduce power noise coupling with signals.

Real-Time Image Encryption Using a 3D Discrete Dual Chaotic Cipher

In this paper, an encryption algorithm is proposed for real-time image encryption. The scheme employs a dual chaotic generator based on a three dimensional (3D) discrete Lorenz attractor. Encryption is achieved using non-autonomous modulation where the data is injected into the dynamics of the master chaotic generator. The second generator is used to permute the dynamics of the master generator using the same approach. Since the data stream can be regarded as a random source, the resulting permutations of the generator dynamics greatly increase the security of the transmitted signal. In addition, a technique is proposed to mitigate the error propagation due to the finite precision arithmetic of digital hardware. In particular, truncation and rounding errors are eliminated by employing an integer representation of the data which can easily be implemented. The simple hardware architecture of the algorithm makes it suitable for secure real-time applications.

A Simple and Efficient Method for Accurate Measurement and Control of Power Frequency Deviation

In the presented technique, a simple method is given for accurate measurement and control of power frequency deviation. The sinusoidal signal for which the frequency deviation measurement is required is transformed to a low voltage level and passed through a zero crossing detector to convert it into a pulse train. Another stable square wave signal of 10 KHz is obtained using a crystal oscillator and decade dividing assemblies (DDA). These signals are combined digitally and then passed through decade counters to give a unique combination of pulses or levels, which are further encoded to make them equally suitable for both control applications and display units. The developed circuit using discrete components has a resolution of 0.5 Hz and completes measurement within 20 ms. The realized circuit is simulated and synthesized using Verilog HDL and subsequently implemented on FPGA. The results of measurement on FPGA are observed on a very high resolution logic analyzer. These results accurately match the simulation results as well as the results of same circuit implemented with discrete components. The proposed system is suitable for accurate measurement and control of power frequency deviation.

Massively-Parallel Bit-Serial Neural Networks for Fast Epilepsy Diagnosis: A Feasibility Study

There are about 1% of the world population suffering from the hidden disability known as epilepsy and major developing countries are not fully equipped to counter this problem. In order to reduce the inconvenience and danger of epilepsy, different methods have been researched by using a artificial neural network (ANN) classification to distinguish epileptic waveforms from normal brain waveforms. This paper outlines the aim of achieving massive ANN parallelization through a dedicated hardware using bit-serial processing. The design of this bit-serial Neural Processing Element (NPE) is presented which implements the functionality of a complete neuron using variable accuracy. The proposed design has been tested taking into consideration non-idealities of a hardware ANN. The NPE consists of a bit-serial multiplier which uses only 16 logic elements on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a look-up table. Arrays of NPEs can be driven by a single controller which executes the neural processing algorithm. In conclusion, the proposed compact NPE design allows the construction of complex hardware ANNs that can be implemented in a portable equipment that suits the needs of a single epileptic patient in his or her daily activities to predict the occurrences of impending tonic conic seizures.

An Efficient Implementation of High Speed Vedic Multiplier Using Compressors for Image Processing Applications

Digital signal processor, image signal processor and FIR filters have multipliers as an important part of their design. On the basis of Vedic mathematics, Vedic multipliers have come out to be very fast multipliers. One of the image processing applications is edge detection. This research presents a small area and high speed 8 bit Vedic multiplier system comprising of compressor based adders. This results in faster edge detection. This architecture is tested on Xilinx vertex 4 FPGA board and simulations were carried out using the Xilinx synthesis tool. Comparisons are made and this system is found to be smaller in area with high speed (the lesser propagation delay). This compressor based Vedic multiplier is 1.1 times speedier than a typical Vedic multiplier. Also, this Vedic Multiplier is 2 times speedier than a ‘simple’ multiplier.