Neural Network Implementation Using FPGA: Issues and Application

.Hardware realization of a Neural Network (NN), to a large extent depends on the efficient implementation of a single neuron. FPGA-based reconfigurable computing architectures are suitable for hardware implementation of neural networks. FPGA realization of ANNs with a large number of neurons is still a challenging task. This paper discusses the issues involved in implementation of a multi-input neuron with linear/nonlinear excitation functions using FPGA. Implementation method with resource/speed tradeoff is proposed to handle signed decimal numbers. The VHDL coding developed is tested using Xilinx XC V50hq240 Chip. To improve the speed of operation a lookup table method is used. The problems involved in using a lookup table (LUT) for a nonlinear function is discussed. The percentage saving in resource and the improvement in speed with an LUT for a neuron is reported. An attempt is also made to derive a generalized formula for a multi-input neuron that facilitates to estimate approximately the total resource requirement and speed achievable for a given multilayer neural network. This facilitates the designer to choose the FPGA capacity for a given application. Using the proposed method of implementation a neural network based application, namely, a Space vector modulator for a vector-controlled drive is presented

Game-Tree Simplification by Pattern Matching and Its Acceleration Approach using an FPGA

In this paper, we propose a Connect6 solver which adopts a hybrid approach based on a tree-search algorithm and image processing techniques. The solver must deal with the complicated computation and provide high performance in order to make real-time decisions. The proposed approach enables the solver to be implemented on a single Spartan-6 XC6SLX45 FPGA produced by XILINX without using any external devices. The compact implementation is achieved through image processing techniques to optimize a tree-search algorithm of the Connect6 game. The tree search is widely used in computer games and the optimal search brings the best move in every turn of a computer game. Thus, many tree-search algorithms such as Minimax algorithm and artificial intelligence approaches have been widely proposed in this field. However, there is one fundamental problem in this area; the computation time increases rapidly in response to the growth of the game tree. It means the larger the game tree is, the bigger the circuit size is because of their highly parallel computation characteristics. Here, this paper aims to reduce the size of a Connect6 game tree using image processing techniques and its position symmetric property. The proposed solver is composed of four computational modules: a two-dimensional checkmate strategy checker, a template matching module, a skilful-line predictor, and a next-move selector. These modules work well together in selecting next moves from some candidates and the total amount of their circuits is small. The details of the hardware design for an FPGA implementation are described and the performance of this design is also shown in this paper.

Adaptive Motion Estimator Based on Variable Block Size Scheme

This paper presents an adaptive motion estimator that can be dynamically reconfigured by the best algorithm depending on the variation of the video nature during the lifetime of an application under running. The 4 Step Search (4SS) and the Gradient Search (GS) algorithms are integrated in the estimator in order to be used in the case of rapid and slow video sequences respectively. The Full Search Block Matching (FSBM) algorithm has been also integrated in order to be used in the case of the video sequences which are not real time oriented. In order to efficiently reduce the computational cost while achieving better visual quality with low cost power, the proposed motion estimator is based on a Variable Block Size (VBS) scheme that uses only the 16x16, 16x8, 8x16 and 8x8 modes. Experimental results show that the adaptive motion estimator allows better results in term of Peak Signal to Noise Ratio (PSNR), computational cost, FPGA occupied area, and dissipated power relatively to the most popular variable block size schemes presented in the literature.

Design and Implementation of Real-Time Automatic Censoring System on Chip for Radar Detection

Design and implementation of a novel B-ACOSD CFAR algorithm is presented in this paper. It is proposed for detecting radar target in log-normal distribution environment. The BACOSD detector is capable to detect automatically the number interference target in the reference cells and detect the real target by an adaptive threshold. The detector is implemented as a System on Chip on FPGA Altera Stratix II using parallelism and pipelining technique. For a reference window of length 16 cells, the experimental results showed that the processor works properly with a processing speed up to 115.13MHz and processing time0.29 ┬Ás, thus meets real-time requirement for a typical radar system.

Floating-Point Scaling for BSS Gain Control

In Blind Source Separation (BSS) processing, taking advantage of scaling factor indetermination and based on the floatingpoint representation, we propose a scaling technique applied to the separation matrix, to avoid the saturation or the weakness in the recovered source signals. This technique performs an Automatic Gain Control (AGC) in an on-line BSS environment. We demonstrate the effectiveness of this technique by using the implementation of a division free BSS algorithm with two input, two output. This technique is computationally cheaper and efficient for a hardware implementation.

Efficient Hardware Architecture of the Direct 2- D Transform for the HEVC Standard

This paper presents the hardware design of a unified architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional (2-D) transform for the HEVC standard. This architecture is based on fast integer transform algorithms. It is designed only with adders and shifts in order to reduce the hardware cost significantly. The goal is to ensure the maximum circuit reuse during the computing while saving 40% for the number of operations. The architecture is developed using FIFOs to compute the second dimension. The proposed hardware was implemented in VHDL. The VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA. The number of cycles in this architecture varies from 33 in 4-point- 2D-DCT to 172 when the 16-point-2D-DCT is computed. Results show frequency improvements reaching 96% when compared to an architecture described as the direct transcription of the algorithm.

Efficient Pipelined Hardware Implementation of RIPEMD-160 Hash Function

In this paper an efficient implementation of Ripemd- 160 hash function is presented. Hash functions are a special family of cryptographic algorithms, which is used in technological applications with requirements for security, confidentiality and validity. Applications like PKI, IPSec, DSA, MAC-s incorporate hash functions and are used widely today. The Ripemd-160 is emanated from the necessity for existence of very strong algorithms in cryptanalysis. The proposed hardware implementation can be synthesized easily for a variety of FPGA and ASIC technologies. Simulation results, using commercial tools, verified the efficiency of the implementation in terms of performance and throughput. Special care has been taken so that the proposed implementation doesn-t introduce extra design complexity; while in parallel functionality was kept to the required levels.

Simulation Tools for Fixed Point DSP Algorithms and Architectures

This paper presents software tools that convert the C/Cµ floating point source code for a DSP algorithm into a fixedpoint simulation model that can be used to evaluate the numericalperformance of the algorithm on several different fixed pointplatforms including microprocessors, DSPs and FPGAs. The tools use a novel system for maintaining binary point informationso that the conversion from floating point to fixed point isautomated and the resulting fixed point algorithm achieves maximum possible precision. A configurable architecture is used during the simulation phase so that the algorithm can produce a bit-exact output for several different target devices.

A Novel Digital Implementation of AC Voltage Controller for Speed Control of Induction Motor

In this paper a novel, simple and reliable digital firing scheme has been implemented for speed control of three-phase induction motor using ac voltage controller. The system consists of three-phase supply connected to the three-phase induction motor via three triacs and its control circuit. The ac voltage controller has three modes of operation depending on the shape of supply current. The performance of the induction motor differs in each mode where the speed is directly proportional with firing angle in two modes and inversely in the third one. So, the control system has to detect the current mode of operation to choose the correct firing angle of triacs. Three sensors are used to feed the line currents to control system to detect the mode of operation. The control strategy is implemented using a low cost Xilinx Spartan-3E field programmable gate array (FPGA) device. Three PI-controllers are designed on FPGA to control the system in the three-modes. Simulation of the system is carried out using PSIM computer program. The simulation results show stable operation for different loading conditions especially in mode 2/3. The simulation results have been compared with the experimental results from laboratory prototype.

FPGA Implementation of the “PYRAMIDS“ Block Cipher

The “PYRAMIDS" Block Cipher is a symmetric encryption algorithm of a 64, 128, 256-bit length, that accepts a variable key length of 128, 192, 256 bits. The algorithm is an iterated cipher consisting of repeated applications of a simple round transformation with different operations and different sequence in each round. The algorithm was previously software implemented in Cµ code. In this paper, a hardware implementation of the algorithm, using Field Programmable Gate Arrays (FPGA), is presented. In this work, we discuss the algorithm, the implemented micro-architecture, and the simulation and implementation results. Moreover, we present a detailed comparison with other implemented standard algorithms. In addition, we include the floor plan as well as the circuit diagrams of the various micro-architecture modules.

Spacecraft Neural Network Control System Design using FPGA

Designing and implementing intelligent systems has become a crucial factor for the innovation and development of better products of space technologies. A neural network is a parallel system, capable of resolving paradigms that linear computing cannot. Field programmable gate array (FPGA) is a digital device that owns reprogrammable properties and robust flexibility. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI and DSP chips. So, many researchers have made great efforts on the realization of neural network (NN) using FPGA technique. In this paper, an introduction of ANN and FPGA technique are briefly shown. Also, Hardware Description Language (VHDL) code has been proposed to implement ANNs as well as to present simulation results with floating point arithmetic. Synthesis results for ANN controller are developed using Precision RTL. Proposed VHDL implementation creates a flexible, fast method and high degree of parallelism for implementing ANN. The implementation of multi-layer NN using lookup table LUT reduces the resource utilization for implementation and time for execution.

Hardware Prototyping of an Efficient Encryption Engine

An approach to develop the FPGA of a flexible key RSA encryption engine that can be used as a standard device in the secured communication system is presented. The VHDL modeling of this RSA encryption engine has the unique characteristics of supporting multiple key sizes, thus can easily be fit into the systems that require different levels of security. A simple nested loop addition and subtraction have been used in order to implement the RSA operation. This has made the processing time faster and used comparatively smaller amount of space in the FPGA. The hardware design is targeted on Altera STRATIX II device and determined that the flexible key RSA encryption engine can be best suited in the device named EP2S30F484C3. The RSA encryption implementation has made use of 13,779 units of logic elements and achieved a clock frequency of 17.77MHz. It has been verified that this RSA encryption engine can perform 32-bit, 256-bit and 1024-bit encryption operation in less than 41.585us, 531.515us and 790.61us respectively.

FPGA Based Longitudinal and Lateral Controller Implementation for a Small UAV

This paper presents implementation of attitude controller for a small UAV using field programmable gate array (FPGA). Due to the small size constrain a miniature more compact and computationally extensive; autopilot platform is needed for such systems. More over UAV autopilot has to deal with extremely adverse situations in the shortest possible time, while accomplishing its mission. FPGAs in the recent past have rendered themselves as fast, parallel, real time, processing devices in a compact size. This work utilizes this fact and implements different attitude controllers for a small UAV in FPGA, using its parallel processing capabilities. Attitude controller is designed in MATLAB/Simulink environment. The discrete version of this controller is implemented using pipelining followed by retiming, to reduce the critical path and thereby clock period of the controller datapath. Pipelined, retimed, parallel PID controller implementation is done using rapidprototyping and testing efficient development tool of “system generator", which has been developed by Xilinx for FPGA implementation. The improved timing performance enables the controller to react abruptly to any changes made to the attitudes of UAV.

Position Control of an AC Servo Motor Using VHDL and FPGA

In this paper, a new method of controlling position of AC Servomotor using Field Programmable Gate Array (FPGA). FPGA controller is used to generate direction and the number of pulses required to rotate for a given angle. Pulses are sent as a square wave, the number of pulses determines the angle of rotation and frequency of square wave determines the speed of rotation. The proposed control scheme has been realized using XILINX FPGA SPARTAN XC3S400 and tested using MUMA012PIS model Alternating Current (AC) servomotor. Experimental results show that the position of the AC Servo motor can be controlled effectively. KeywordsAlternating Current (AC), Field Programmable Gate Array (FPGA), Liquid Crystal Display (LCD).

A Processor with Dynamically Reconfigurable Circuit for Floating-Point Arithmetic

This paper describes about dynamic reconfiguration to miniaturize arithmetic circuits in general-purpose processor. Dynamic reconfiguration is a technique to realize required functions by changing hardware construction during operation. The proposed arithmetic circuit performs floating-point arithmetic which is frequently used in science and technology. The data format is floating-point based on IEEE754. The proposed circuit is designed using VHDL, and verified the correct operation by simulations and experiments.

Hardware Implementations for the ISO/IEC 18033-4:2005 Standard for Stream Ciphers

In this paper the FPGA implementations for four stream ciphers are presented. The two stream ciphers, MUGI and SNOW 2.0 are recently adopted by the International Organization for Standardization ISO/IEC 18033-4:2005 standard. The other two stream ciphers, MICKEY 128 and TRIVIUM have been submitted and are under consideration for the eSTREAM, the ECRYPT (European Network of Excellence for Cryptology) Stream Cipher project. All ciphers were coded using VHDL language. For the hardware implementation, an FPGA device was used. The proposed implementations achieve throughputs range from 166 Mbps for MICKEY 128 to 6080 Mbps for MUGI.

A Reliable FPGA-based Real-time Optical-flow Estimation

Optical flow is a research topic of interest for many years. It has, until recently, been largely inapplicable to real-time applications due to its computationally expensive nature. This paper presents a new reliable flow technique which is combined with a motion detection algorithm, from stationary camera image streams, to allow flow-based analyses of moving entities, such as rigidity, in real-time. The combination of the optical flow analysis with motion detection technique greatly reduces the expensive computation of flow vectors as compared with standard approaches, rendering the method to be applicable in real-time implementation. This paper describes also the hardware implementation of a proposed pipelined system to estimate the flow vectors from image sequences in real time. This design can process 768 x 576 images at a very high frame rate that reaches to 156 fps in a single low cost FPGA chip, which is adequate for most real-time vision applications.

FPGA Implementation of Generalized Maximal Ratio Combining Receiver Diversity

In this paper, we study FPGA implementation of a novel supra-optimal receiver diversity combining technique, generalized maximal ratio combining (GMRC), for wireless transmission over fading channels in SIMO systems. Prior published results using ML-detected GMRC diversity signal driven by BPSK showed superior bit error rate performance to the widely used MRC combining scheme in an imperfect channel estimation (ICE) environment. Under perfect channel estimation conditions, the performance of GMRC and MRC were identical. The main drawback of the GMRC study was that it was theoretical, thus successful FPGA implementation of it using pipeline techniques is needed as a wireless communication test-bed for practical real-life situations. Simulation results showed that the hardware implementation was efficient both in terms of speed and area. Since diversity combining is especially effective in small femto- and picocells, internet-associated wireless peripheral systems are to benefit most from GMRC. As a result, many spinoff applications can be made to the hardware of IP-based 4th generation networks.

Implemented 5-bit 125-MS/s Successive Approximation Register ADC on FPGA

Implemented 5-bit 125-MS/s successive approximation register (SAR) analog to digital converter (ADC) on FPGA is presented in this paper.The design and modeling of a high performance SAR analog to digital converter are based on monotonic capacitor switching procedure algorithm .Spartan 3 FPGA is chosen for implementing SAR analog to digital converter algorithm. SAR VHDL program writes in Xilinx and modelsim uses for showing results.

A Virtual Simulation Environment for a Design and Verification of a GPGPU

When a small H/W IP is designed, we can develop an appropriate verification environment by observing the simulated signal waves, or using the serial test vectors for the fixed output. In the case of design and verification of a massive parallel processor with multiple IPs, it-s difficult to make a verification system with existing common verification environment, and to verify each partial IP. A TestDrive verification environment can build easy and reliable verification system that can produce highly intuitive results by applying Modelsim and SystemVerilog-s DPI. It shows many advantages, for example a high-level design of a GPGPU processor design can be migrate to FPGA board immediately.