# A Low-Area Fully-Reconfigurable Hardware Design of Fast Fourier Transform System for 3GPP-LTE Standard Xin-Yu Shih, Yue-Qu Liu, Hong-Ru Chou Abstract—This paper presents a low-area and fully-reconfigurable Fast Fourier Transform (FFT) hardware design for 3GPP-LTE communication standard. It can fully support 32 different FFT sizes, up to 2048 FFT points. Besides, a special processing element is developed for making reconfigurable computing characteristics possible, while first-in first-out (FIFO) scheduling scheme design technique is proposed for hardware-friendly FIFO resource arranging. In a synthesis chip realization via TSMC 40 nm CMOS technology, the hardware circuit only occupies core area of 0.2325 mm<sup>2</sup> and dissipates 233.5 mW at maximal operating frequency of 250 MHz. Keywords—Reconfigurable, fast Fourier transform, single-path delay feedback, 3GPP-LTE. # I. Introduction FTT is a commonly applied design scheme in the communication fields, transforming the received signals from time domain to frequency domain. Originally, a simple hardware-oriented direction is referred to single-path delay feedback (SDF) FFT [1], [2]. It is only used for radix-2 based hardware architecture. In the following research flavors, radix-4 [3], radix-8 [4], radix-2<sup>2</sup> [5], radix-2<sup>3</sup> [6], radix-2<sup>4</sup> [7], and radix-2<sup>k</sup> [8] FFT systems are developed in sequence to expand the analogous design concepts. It has already become an interesting main research road both in academic and modern industry. Furthermore, it moves crucial trends to apply the similar design concepts beyond the radix of power of 2, such as radix-3, radix-5, radix-6, and radix-12 FFT hardware circuits [9], [10]. In addition to supporting only one radix in an independent FFT system, there also exist dedicated works used to provide mixed-radix computing [11], [12]. In an individual FFT system, certain parts are used for one radix, whereas other parts are employed for another radix. Unfortunately, one specified processing element (PE) or FIFO storage is still responsible for only one-radix operation, resulting in less circuit flexibility and larger design efforts. Instead, we propose a systematic way to develop a fully-reconfigurable hardware design for supporting 32 different modes defined in 3GPP-LTE communication standard. In a synthesis chip implemented with TSMC 40 nm Xin-Yu Shih was a senior engineer in Mediatek, working for more than 4 years. He is now an Assistant Professor in Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan (e-mail: xyshih@mail.ee.nsysu.edu.tw). Yue-Qu Liu and Hong-Ru Chou are with National Sun Yat-sen University, 804 Kaohsiung Taiwan (e-mail: tom@snsd.ee.nsvsu.edu.tw. bob.chou@snsd.ee.nsvsu.edu.tw). CMOS technology, our work only occupies a core area of 0.2325 mm<sup>2</sup> and consumes 233.5 mW under maximal clock frequency of 250 MHz. The rest of this paper is organized as follows. Section II demonstrates the hardware architecture of our proposed fully-reconfigurable SDF FFT system for 3GPP-LTE communication standard. Also, two important composed parts are reconfigurable processing element (RPE) design and first-in first-out scheduling scheme (FIFO-SS) for 32-mode setting configuration. Section III shows ASIC chip implementation with synthesis results, using TSMC 40 nm CMOS technology. Finally, Section IV concludes our developed work. #### II. PROPOSED HARDWARE ARCHITECTURE The FFT specification defined in 3GPP-LTE communication standard can be easily represented as (1): $$N = 2^X * 3^Y \tag{1}$$ where the FFT size (N) varies from 4 to 2048, divided into mixed combination of power of 2 and power of 3. As shown in Fig. 1, the possible (X, Y) combination values are denoted as red circular points in the X-Y plane. Fig. 1 32-mode supporting of FFT system for 3GPP-LTE standard According to the different FFT sizes, we want to propose a fully-reconfigurable FFT hardware architecture. Instead of implementing all of the supported circuit system individually, our work can realize 32 operating modes in a single chip system. In addition, we would introduce two critical developed circuits, including RPE design and FIFO-SS. Fig. 2 The proposed fully-reconfigurable FFT hardware architecture Fig. 3 Block diagram of the proposed RPE ## A. Proposed FFT Hardware Architecture Fig. 2 depicts our proposed FFT hardware architecture. There are 2 intelligent fundamental portions, RPE (see Section II B) and FIFO bank (see Section II C), which are responsible for data processing and temporary storage, respectively. For data processing, one RPE is developed to manipulate different kinds/radixes of butterfly operations. One RPE is so-called one super stage of calculating unit. Once one RPE is completely designed, we can concatenate four copies of RPEs to build up necessary computing procedure. On the other hand, for temporary storage, all total length of FIFO is aggregated as a huge bank and suitably managed with the proposed first-in FIFO-SS. As dealing with various FFT sizes, we can distribute all of the FIFO resource to each super stage without any storing conflict situations. While RPE operates at different modes, FIFO distribution and reading/writing access arrangement also differ with each other. #### B. Reconfigurable Processing Element (RPE) As shown in Fig. 3, RPE is composed of 12 complex adders and four complex multipliers totally. All of processing circuit logics can be grouped with eight computing banks, including four adder banks, two multiplier banks, and two mixed banks. First, while operating at 2-stage radix-3 (2S-R3) mode, all of the eight banks are activated, achieving 100% of adder usage and multiplier usage both. As for other mode configuration, some of computing banks would be activated or un-activated, depending on the processing circuit logics on demand. For each operating mode, assume that j adders and k multipliers are in use. Also, the number of FIFO chains to or from large FIFO bank is considered as m, which is presented in Fig. 2. Therefore, the corresponding $\{j, k, m\}$ parameters are listed as follows. - 1) 1-stage radix-3 (1S-R3) mode: {6, 2, 2} - 2) 3-stage radix-2 (3S-R2) mode: {6, 3, 3} - 3) 2-stage radix-2 (2S-R2) mode: {4, 2, 2} - 4) 1-stage radix-2 (1S-R2) mode: {2, 1, 1} - 5) 1-stage radix-3 and radix-2 (R3-R2) mode: {8, 3, 3} Furthermore, we can take more insights into 2 mixed banks. Each mixed bank consists of 2 adders and 1 multiplier. Obviously, 2 adders are worked for dealing with 2S-R3, R3-R2, and 3S-R2 modes. But, this multiplier is enabled only for setting at 2S-R3, and R3- R2 modes. In other words, different operating modes need various additions and multiplications based on different butterfly operations. | Mode | FFT | X | Y | FIFO Length of Each Stage | | | | | | | | | | | | |-------|-------|----|---|---------------------------|----|-----|------|------|-------|-------|-------|-------|-------|------|--------| | Index | Point | A | Y | L1 | L2 | L3 | L4 | L5 | L6 | L7 | L8 | L9 | L10 | L11 | Length | | 1 | 4 | 2 | 0 | 1 | 2 | | _ | | | | | | | | 3 | | 2 | 8 | 3 | 0 | 1 | 2 | 4 | | | | | | | | | 7 | | 3 | 12 | 2 | 1 | 1 | 2 | 4*2 | | _ | | | | | | | 11 | | 4 | 16 | 4 | 0 | 1 | 2 | 4 | 8 | | | | | | | | 15 | | 5 | 24 | 3 | 1 | 1 | 2 | 4 | 8*2 | | _ | | | | | | 23 | | 6 | 32 | 5 | 0 | 1 | 2 | 4 | 8 | 16 | | | | | | | 31 | | 7 | 36 | 2 | 2 | 1 | 2 | 4*2 | 12*2 | | | | | | | | 35 | | 8 | 48 | 4 | 1 | 1 | 2 | 4 | 8 | 16*2 | | | | | | | 47 | | 9 | 64 | 6 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | | | | | | 63 | | 10 | 72 | 3 | 2 | 1 | 2 | 4 | 8*2 | 24*2 | | _ | | | | | 71 | | 11 | 96 | 5 | 1 | 1 | 2 | 4 | 8 | 16 | 32*2 | | | | | | 95 | | 12 | 108 | 2 | 3 | 1 | 2 | 4*2 | 12*2 | 36*2 | | _ | | | | | 107 | | 13 | 128 | 7 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | | | | | 127 | | 14 | 144 | 4 | 2 | 1 | 2 | 4 | 8 | 16*2 | 48*2 | | | | | | 143 | | 15 | 192 | 6 | 1 | 1 | 2 | 4 | 8 | 16 | 32 | 64*2 | | | | | 191 | | 16 | 216 | 3 | 3 | 1 | 2 | 4 | 8*2 | 24*2 | 72*2 | | | _ | | | 215 | | 17 | 256 | 8 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | | | | 255 | | 18 | 288 | 5 | 2 | 1 | 2 | 4 | 8 | 16 | 32*2 | 96*2 | | | | | 287 | | 19 | 324 | 2 | 4 | 1 | 2 | 4*2 | 12*2 | 36*2 | 108*2 | | | _ | | | 323 | | 20 | 384 | 7 | 1 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128*2 | | | | 383 | | 21 | 432 | 4 | 3 | 1 | 2 | 4 | 8 | 16*2 | 48*2 | 144*2 | | | _ | | 431 | | 22 | 512 | 9 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | | | 511 | | 23 | 576 | 6 | 2 | 1 | 2 | 4 | 8 | 16 | 32 | 64*2 | 192*2 | | | | 575 | | 24 | 648 | 3 | 4 | 1 | 2 | 4 | 8*2 | 24*2 | 72*2 | 216*2 | | _ | | | 647 | | 25 | 768 | 8 | 1 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256*2 | | | 767 | | 26 | 864 | 5 | 3 | 1 | 2 | 4 | 8 | 16 | 32*2 | 96*2 | 288*2 | | | | 863 | | 27 | 972 | 2 | 5 | 1 | 2 | 4*2 | 12*2 | 36*2 | 108*2 | 324*2 | | _ | | | 971 | | 28 | 1024 | 10 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | | 1023 | | 29 | 1152 | 7 | 2 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128*2 | 384*2 | | | 1151 | | 30 | 1296 | 4 | 4 | 1 | 2 | 4 | 8 | 16*2 | 48*2 | 144*2 | 432*2 | | | | 1295 | | 31 | 1536 | 9 | 1 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512*2 | | 1535 | | 32 | 2048 | 11 | 0 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2047 | Fig. 4 32-mode configuration of the proposed FIFO-SS ## C. First-In First-Out Scheduling Scheme (FIFO-SS) In addition to RPE, we also propose a systematic-storing scheduling scheme for FIFO distribution/arrangement as demonstrated in Fig. 4. All of 32 operating modes are listed with (X, Y) configuration, generating any FFT points defined in 3GPP-LTE communication standard. In order to support up to 2048 FFT points, the required FIFO length is entirely 2047. Accordingly, whole FIFO bank is divided into 11 groups (L1 ~ L11), including FIFO lengths of 1, 2, ..., and 1024. Moreover, if referring to mode index of 19, FFT size is configured as 324 and (X, Y) = (2, 4). The corresponding FIFO bank is separated as L1 ~ L6 sections. L1 and L2 have lengths of 1 and 2, respectively. Due to radix-3 butterfly processing, L3 has 2 sub-FIFOs, which each sub-FIFO owns a length of 4. In the analogous manner, L4 ~ L6 lengths can be deduced easily, such as 12\*2, 36\*2, and 108\*2. In accordance with the look-up table (LUT) established in Fig. 4, each of 32 modes can obtain the required FIFO arrangement. Once each super stage would be asserted as a specified operating mode, the FIFO bank has a corresponding access mapping relationship. In addition, the proposed system only needs to modify this LUT if we want to add one more mode setting or change current mode supporting. From the above design strategy, the 32-mode FFT system can easily achieve fully-reconfigurable characteristics by completely utilizing hardware reuse of PEs and data storage. #### III. SYNTHESIS RESULTS Our proposed hardware circuit is designed and implemented with Design Compiler and TSMC 40 nm CMOS technology. The maximum operating frequency is 250 MHz, occupying a design area of 0.2325 mm<sup>2</sup>, dissipating power of 233.5 mW, and delivering a system throughput of 250 MSamples/s. The corresponding synthesis area results are shown in Fig. 5. First, the FIFO bank is realized by D Flip-flops and SRAM. 2 SRAM 256\*32 are utilized to store the needed data, transferring to and from RPE super stage 3. However, SRAM 736\*32 and SRAM 1024\*32 are used to keep the required data, reading from and writing to RPE super stage 4. These SRAMs entirely occupy 5.6%, 5.6%, 11.7%, and 15.3% of total design area individually. Second, 4 super stages of RPE have a summed area of 0.0856 mm<sup>2</sup>, owning 36.8% of total design area. At last, twiddle factor and control unit only have area of 0.0113 mm<sup>2</sup> and 0.0037 mm<sup>2</sup>, respectively. #### IV. CONCLUSION For the applied FFT system in 3GPP-LTE communication standard, we propose a low-area and fully-reconfigurable hardware architecture regarding 32-mode configuration (32 different FFT sizes). In addition, RPE is developed to realize reconfigurable computing characteristics whereas FIFO-SS design technique is dedicated for hardware-friendly FIFO scheduling. By using TSMC 40 nm CMOS technology, our proposed design work with synthesis results has a core area of 0.2325 mm² only, consuming 233.5 mW at maximal clock frequency (250 MHz). In a word, this work provides a prototyping design for FFT system in 3GPP-LTE system. | | S-1 · · · · · · · · · · · · · · · · · · | Synthesis Area (um²) | | | | | | |--------------|-----------------------------------------|----------------------|--------|--------|--|--|--| | , | Sub-circuits | Value | Ratio | | | | | | | Super Stage 1 | 21404.6 | 9.2% | 36.8% | | | | | RPE | Super Stage 2 | 21403.3 | 9.2% | | | | | | KPE | Super Stage 3 | 21399.7 | 9.2% | 36.8% | | | | | | Super Stage 4 | 21393.6 | 9.2% | | | | | | | SRAM 256*32 | 13040.0 | 5.6% | 56.7% | | | | | | SRAM 256*32 | 13040.0 | 5.6% | | | | | | FIFO<br>Bank | SRAM 736*32 | 27310.5 | 11.7% | | | | | | Dunk | SRAM 1024*32 | 35564.6 | 15.3% | | | | | | | DFF | 42900.4 | 18.5% | 1 | | | | | T | widdle Factor | 11323.4 | 4.9% | 4.9% | | | | | ( | Control Unit | 3740.2 | 1.6% | 1.6% | | | | | | Total | 232520.3 | 100.0% | 100.0% | | | | Fig. 5 Synthesis results of the proposed hardware circuit #### ACKNOWLEDGMENT The authors would like to thank the National Chip Implementation Center (CIC) for all design technical supports. # REFERENCES - S. He, and M. Torkelson, "A new approach to pipeline FFT processor," in Parallel Processing Symposium, 1996., Proceedings of IPPS'96, The 10th International, apr 1996, pp. 766-770. - [2] Z. Qian and M. Margala, "Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, pp. 3008-3012, 2016. - [3] E. E. Swartzlander, W. K. Young, and S. J. Joseph, "A radix 4 delay commutator for fast Fourier transform processor implementation," IEEE Journal of solid-state circuits, vol. 19, pp. 702-709, 1984. [4] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, "A dynamic scaling FFT processor - [4] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, "A dynamic scaling FFT processor for DVB-T applications," IEEE Journal of Solid-State Circuits, vol. 39, pp. 2005-2013, 2004. - [5] N. H. Cuong, N. T. Lam, and N. D. Minh, "Multiplier-less based architecture for variable-length FFT hardware implementation," in Communications and Electronics (ICCE), 2012 Fourth International Conference on, 2012, pp. 489-494. - [6] D.-s. Kim and S.-y. Lee, "Dual input radix 2<sup>3</sup> SDF IFFT/FFT processor for wireless multi-channel real sound speakers using time division duplex scheme," IEEE Transactions on Consumer Electronics, vol. 55, pp. 2323-2328, 2009. - [7] M. Shin and H. Lee, "A high-speed four-parallel radix-2<sup>4</sup> FFT/IFFT processor for UWB applications," in 2008 IEEE International Symposium on Circuits and Systems, 2008, pp. 960-963. - [8] M. Garrido, J. Grajal, M. Sánchez, and O. Gustafsson, "Pipelined radix-2(k) feedforward FFT architectures," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, pp. 23-32, 2013. - [9] A. Karlsson, J. Sohl, and D. Liu, "Cost-efficient mapping of 3-and 5-point DFTs to general baseband processors," in 2015 IEEE International Conference on Digital Signal Processing (DSP), 2015, pp. 780-784. - [10] Y. Suzuki, T. Sone, and K. Kido, "A new FFT algorithm of radix 3, 6, and 12," IEEE transactions on acoustics, speech, and signal processing, vol. 34, pp. 380-383, 1986. - [11] W. Zheng and K. Li, "Split radix algorithm for 6<sup>m</sup> length DFT," IEEE signal processing Letters, vol. 20, pp. 713-716, 2013. - [12] J. Chen, J. Hu, S. Lee, and G. E. Sobelman, "Hardware efficient mixed Radix-25/16/9 FFT for LTE systems," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, pp. 221-229, 2015.