A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Authors:



References:
[1] Boukaye Boubacar Traore, Bernard Kamsu-Foguem, Fana Tangara, Deep convolution neural network for image recognition, in: Ecological Informatics, Volume 48,2018,Pages 257-268, ISSN 1574-9541,
[2] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[3] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556, 2014.
[4] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems, 2015, pp. 91–99.
[5] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
[6] Dawid Poap, Marcin Wozniak. Voice Recognition by Neuro-Heuristic Method (J). Tsinghua Science and Technology, 2019, 24(01):9-17.
[7] A. Ucar, Y. Demir, C. Guzelis, Object recognition and detection with deep learning for autonomous driving applications, (in English), Simul.-Trans. Soc. Model. Simul. Int. 93 (9) (Sep 2017) 759–769, doi:10.1177/0037549717709932.
[8] P. Pelliccione, E. Knauss, R. Heldal, et al., Automotive architecture framework: the experience of volvo cars, J. Syst. Archit. 77 (2017) 83–100. 06/01/ 2017 https://doi.org/10.1016/j.sysarc.2017.02.005.
[9] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,no. 7553, pp. 436–444, 2015.
[10] D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj,C. Alfredo, M. Berin, and C. Eugenio. Accelerating deep neural networks on mobile processor with embedded programmable logic. In NIPS 2013. IEEE, 2013.
[11] S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273{284. ACM, 2010.
[12] C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32{37. IEEE, 2009.
[13] M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13{19. IEEE, 2013.
[14] BY G. E. HINTON, R. R. SALAKHUTDINOV. Reducing the Dimensionality of Data with Neural Networks. SCIENCE28 JUL 2006 : 504-507. DOI: 10.1126/science.1127647
[15] Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
[16] Hou Yuqingyang, Quan Jicheng, Wang Hongwei. Review of Deep Learning Development (J). Ship Electronic Engineering, 2017,37(04):5-9+111.
[17] Zhang Rong, Li Weiping, Mo Tong. Review of Deep Learning (J). Information and Control, 2018,47(04):385-397+410.
[18] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, Tsuhan Chen, Recent advances in convolutional neural networks, Pattern Recognition, Volume 77, 2018, Pages 354-377, ISSN 0031-3203.
[19] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity (J). Bulletin of Mathematical Biophysics,1943,5(4): 115-133
[20] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain (J). Psychological Review, 1958,65(6):386-408
[21] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex (J). Journal of Physiology, 1962,160(1):106-154
[22] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
[23] Wu Yan-Xia, Liang Kai, Liu Ying, Cui Hui-Min. The Progress and Trends of FPGA-Based Accelerators in Deep Learning (J/OL). Chinese Journal of Computers, 2019:1-20 (2019-03-19). http://kns.cnki.net/kcms/detail/11.1826.TP.20190114.1037.002.html.
[24] Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, et al. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGARCH Comput. Archit. News 43, 1 (March 2015), 369–381. DOI:https://doi.org/10.1145/2786763.2694358
[25] Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, 2015, pp. 92-104, doi: 10.1145/2749469.2750389.
[26] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, 2014, pp. 609-622, doi: 10.1109/MICRO.2014.58.
[27] Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGARCH Comput. Archit. News 42, 1 (March 2014), 269–284. DOI:https://doi.org/10.1145/2654822.2541967
[28] Liu Shaoli, Du Zidong, Tao Jinhua, et al. Cambricon: An instruction set architecture for neural networks (C). //Proc of the 43rd Int Symp on Computer Architecture. Piscataway, NJ:IEEE,2016:393-405
[29] Norman P. Jouppi, Cliff Young, Nishant Patil, et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. SIGARCH Comput. Archit. News 45, 2 (May 2017), 1–12. DOI:https://doi.org/10.1145/3140659.3080246
[30] Google. Cloud TPUs: Google’s second-generation tensor processing unit is coming to cloud (EB/OL). (2017-10-30). https://ai. Google/tools/cloud-tpus/
[31] Swagath Venkataramani, Ashish Ranjan, et al. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 13–26. DOI:https://doi.org/10.1145/3079856.3080244
[32] Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. SIGARCH Comput. Archit. News 44, 3 (June 2016), 367–379. DOI:https://doi.org/10.1145/3007787.3001177
[33] Ali Shafiee, Anirban Nag, Naveen Muralimanohar, et al. 2016. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Archit. News 44, 3 (June 2016), 14–26. DOI:https://doi.org/10.1145/3007787.3001139
[34] A. Parashar et al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, 2017, pp. 27-40, doi: 10.1145/3079856.3080254.
[35] Chen Guilin, Ma Sheng, Guo Yang. Survey on Accelerating Neural Network with Hardware (J/OL). Journal of Computer Research and Development, 2019(02) (2019-03-20). http://kns.cnki.net/kcms/detail/11.1777.TP.20190129.0940.004.html.
[36] Cavigelli L, Gschwend D, Mayer C, et al. Origami: A convolutional network accelerator // Proceedings of the Great Lakes Symposium on VLSI. Pittsburgh, USA, 2015: 199-204
[37] Chen Y-H, Krishna T, Emer J, et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks // Proceedings of the 2016 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, USA, 2016: 262-263
[38] Shafiee A, Nag A, Muralimanohar N, et al. ISAAC: A convolutional neural network accelerator with In-situ analog arithmetic in crossbars // Proceedings of the ISCA. Seoul, ROK, 2016: 14-26
[39] Andri R, Cavigelli L, Rossi D, et al. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights // Proceedings of the IEEE Computer Society Annual Symposium on VLSI. Pittsburgh, USA, 2016: 236-241
[40] Gokmen T, Vlasov Y. Acceleration of deep neural network training with resistive cross-point devices: design considerations. Front neurosci, 2016, 10(51): 333
[41] Shen Yangjing, Shen Juncheng, Ye Jun, Ma Qi. A FPGA Based Spiking Neuron Network Accelerator (J). Electronic Science and Technology, 2017, 30(10):89-92+96.
[42] Siyang Wang. FPGA based Convolutional Neural Network Accelerator Design and Realization (D). University of Electronic Science and Technology of China, 2017.
[43] Nurvitadhi E V G, Sim J, et al. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? // Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, USA, 2017: 5-14
[44] D. S. Reay, T. C. Green and B. W. Williams, "Field programmable gate array implementation of a neural network accelerator," IEE Colloquium on Hardware Implementation of Neural Networks and Fuzzy Logic, London, UK, 1994, pp. 2/1-2/3.
[45] Wang, T., Wang, C., Zhou, X., & Chen, H. (2018). A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities. arXiv preprint arXiv:1901.04988.
[46] C. Farabet, Y. LeCun, K. Kavukcuoglu, et al. Large-scale FPGA-based convolutional networks (J). In Scaling up Machine Learning: Parallel and Distributed Approaches eds Bekkerman, 2011, 399–419.
[47] C. Farabet, B. Martini, B. Corda, et al. Neuflow: A runtime reconfigurable dataflow processor for vision (C). In Computer Vision and Pattern Recognition Workshops, 2011, 109–116.
[48] M. Peemen, A. Setio, B. Mesman, et al. Memory-centric accelerator design for convolutional neural networks (C). IEEE International Conference on Computer Design, 2013, 13–19.
[49] M. Sankaradas, V. Jakkula, S. Cadambi, et al. A massively parallel coprocessor for convolutional neural networks (C). In Application Specific Systems, Architectures and Processors, 2009, 53–60.
[50] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks", FPGA, 2015.
[51] Qianru Zhang, Meng Zhang, Tinghuan Chen, Zhifei Sun, Yuzhe Ma, Bei Yu, Recent advances in convolutional neural network acceleration, Neurocomputing, Volume 323, 2019, Pages 37-51,ISSN 0925-2312.
[52] Wei Ding, Zeyu Huang, Zunkai Huang, Li Tian, Hui Wang, Songlin Feng, Designing efficient accelerator of depthwise separable convolutional neural network on FPGA, Journal of Systems Architecture, 2018, ISSN 1383-7621.
[53] Liu Qinrang, Liu Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity (J). Journal of Electronics & Information Technology, 2018,40(06):1368-1374.
[54] Yufei Ma, Naveen Suda et al., ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler, Integration, the VLSI Journal (2017), https://doi.org/10.1016/j.vlsi.2017.12.009
[55] Yu Zijian, Ma De, Yan Xiaolang, Shen Juncheng. FPGA-based Accelerator for Convolutional Neural Network (J). Computer Engineering, 2017, 43(01):109-114+119.
[56] D. L. Ly and P. Chow, “A high-performance FPGA architecture for restricted Boltzmann machines,” in Proc. FPGA, Monterey, CA, USA,2009, pp. 73–82.
[57] S. K. Kim, L. C. McAfee, P. L. McMahon, and K. Olukotun, “A highly scalable restricted Boltzmann machine FPGA implementation,” in Proc. FPL, Prague, Czech Republic, 2009, pp. 367–372.
[58] J. Qiu et al., “Going deeper with embedded FPGA platform for convolutional neural network,” in Proc. FPGA, Monterey, CA, USA, 2016,pp. 26–35.
[59] Q. Yu, C. Wang, X. Ma, X. Li, and X. Zhou, “A deep learning prediction process accelerator based FPGA,” in Proc. CCGRID, Shenzhen, China, 2015, pp. 1159–1162.
[60] C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, X. Zhou, "DLAU: A scalable deep learning accelerator unit on FPGA", IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 36, no. 3, pp. 513-517, Mar. 2017.
[61] Chen Huang, Zhu Yong-xin, Tian Li, Wang Hui, Feng Song-lin. FPGA-based Design of Accelerator for Convolution Layer of Convolutional Neural Network (J). Microelectronics and Computer, 2018,35(10):85-88.
[62] G. Estrin, "The WEIZAC Years (1954-1963)," in Annals of the History of Computing, vol. 13, no. 4, pp. 317-339, Oct.-Dec. 1991, doi: 10.1109/MAHC.1991.10037.