Scholarly

A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

Year: 2020 Volume: 14 Issue: 12 419 - 427 Pages

Authors:
Wei Zhang

Abstract: With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Year: 2020 Volume: 14 Issue: 12 486 - 493 Pages

Authors:
Jaeyoung Lee

Abstract: Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Assessing the Effect of the Position of the Cavities on the Inner Plate of the Steel Shear Wall under Time History Dynamic Analysis

Year: 2020 Volume: 14 Issue: 7 221 - 225 Pages

Abstract: The seismic forces caused by the waves created in the depths of the earth during the earthquake hit the structure and cause the building to vibrate. Creating large seismic forces will cause low-strength sections in the structure to suffer extensive surface damage. The use of new steel shear walls in steel structures has caused the strength of the building and its main members (columns) to increase due to the reduction and depreciation of seismic forces during earthquakes. In the present study, an attempt was made to evaluate a type of steel shear wall that has regular holes in the inner sheet by modeling the finite element model with Abacus software. The shear wall of the steel plate, measuring 6000 × 3000 mm (one floor) and 3 mm thickness, was modeled with four different pores with a cross-sectional area. The shear wall was dynamically subjected to a time history of 5 seconds by three accelerators, El Centro, Imperial Valley and Kobe. The results showed that increasing the distance between the geometric center of the hole and the geometric center of the inner plate in the steel shear wall (increasing the RCS index) caused the total maximum acceleration to be transferred from the perimeter of the hole to horizontal and vertical beams. The results also show that there is no direct relationship between RCS index and total acceleration in steel shear wall and RCS index is separate from the peak ground acceleration value of earthquake.

High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform

Year: 2015 Volume: 9 Issue: 1 148 - 152 Pages

Abstract: Real time image and video processing is a demand in many computer vision applications, e.g. video surveillance, traffic management and medical imaging. The processing of those video applications requires high computational power. Thus, the optimal solution is the collaboration of CPU and hardware accelerators. In this paper, a Canny edge detection hardware accelerator is proposed. Edge detection is one of the basic building blocks of video and image processing applications. It is a common block in the pre-processing phase of image and video processing pipeline. Our presented approach targets offloading the Canny edge detection algorithm from processing system (PS) to programmable logic (PL) taking the advantage of High Level Synthesis (HLS) tool flow to accelerate the implementation on Zynq platform. The resulting implementation enables up to a 100x performance improvement through hardware acceleration. The CPU utilization drops down and the frame rate jumps to 60 fps of 1080p full HD input video stream.

High Performance Fibre Reinforced Alkali Activated Slag Concrete

Year: 2014 Volume: 8 Issue: 12 1288 - 1291 Pages

Abstract: The main objective of the study is focused in producing slag based geopolymer concrete obtained with the addition of alkali activator. Test results indicated that the reaction of silicates in slag is based on the reaction potential of sodium hydroxide and the formation of alumino-silicates. The study also comprises on the evaluation of the efficiency of polymer reaction in terms of the strength gain properties for different geopolymer mixtures. Geopolymer mixture proportions were designed for different binder to total aggregate ratio (0.3 & 0.45) and fine to coarse aggregate ratio (0.4 & 0.8). Geopolymer concrete specimens casted with normal curing conditions reported a maximum 28 days compressive strength of 54.75 MPa. The addition of glued steel fibres at 1.0% Vf in geopolymer concrete showed reasonable improvements on the compressive strength, split tensile strength and flexural properties of different geopolymer mixtures. Further, comparative assessment was made for different geopolymer mixtures and the reinforcing effects of steel fibres were investigated in different concrete matrix.

Development of the Algorithm for Detecting Falls during Daily Activity using 2 Tri-Axial Accelerometers

Year: 2012 Volume: 6 Issue: 1 4 - 8 Pages

Abstract: Falls are the primary cause of accidents in people over the age of 65, and frequently lead to serious injuries. Since the early detection of falls is an important step to alert and protect the aging population, a variety of research on detecting falls was carried out including the use of accelerators, gyroscopes and tilt sensors. In exiting studies, falls were detected using an accelerometer with errors. In this study, the proposed method for detecting falls was to use two accelerometers to reject wrong falls detection. As falls are accompanied by the acceleration of gravity and rotational motion, the falls in this study were detected by using the z-axial acceleration differences between two sites. The falls were detected by calculating the difference between the analyses of accelerometers placed on two different positions on the chest of the subject. The parameters of the maximum difference of accelerations (diff_Z) and the integration of accelerations in a defined region (Sum_diff_Z) were used to form the fall detection algorithm. The falls and the activities of daily living (ADL) could be distinguished by using the proposed parameters without errors in spite of the impact and the change in the positions of the accelerometers. By comparing each of the axial accelerations, the directions of falls and the condition of the subject afterwards could be determined.In this study, by using two accelerometers without errors attached to two sites to detect falls, the usefulness of the proposed fall detection algorithm parameters, diff_Z and Sum_diff_Z, were confirmed.

An Efficient Architecture for Interleaved Modular Multiplication

Year: 2009 Volume: 3 Issue: 8 1928 - 1932 Pages

Abstract: Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.

Top Journal