# A Single-Phase Register File with Complementary Pass-Transistor Adiabatic Logic Jianping Hu, and Xiaolei Sheng Abstract—This paper introduces an adiabatic register file based on two-phase CPAL (Complementary Pass-Transistor Adiabatic Logic circuits) with power-gating scheme, which can operate on a single-phase power clock. A 32×32 single-phase adiabatic register file with power-gating scheme has been implemented with TSMC 0.18μm CMOS technology. All the circuits except for the storage cells employ two-phase CPAL circuits, and the storage cell is based on the conventional memory one. The two-phase non-overlap power-clock generator with power-gating scheme is used to supply the proposed adiabatic register file. Full-custom layouts are drawn. The energy and functional simulations have been performed using the net-list extracted from their layouts. Compared with the traditional static CMOS register file, HSPICE simulations show that the proposed adiabatic register file can work very well, and it attains about 73% energy savings at 100 MHz. *Keywords*—Low power, Register file, Complementary pass-transistor logic, Adiabatic logic, Single-phase power clock. #### I. INTRODUCTION WITH the rapid development of integrated circuits, the density and speed of CMOS chips continuously increase. Reducing power consumption has become a key factor in IC designs. There are several low-power approaches such as the reduction of supply voltages, node capacitances and switching activities. However, these conventional low-power methods have several challenges with the shrink of CMOS technology sizes, degraded voltage margin, increased speed, increased leakage currents, and increased soft error rates [1]. Adiabatic logic is a promising alternative low-power approach by utilizing AC voltage supplies (power clocks) to recycle the energy of circuits instead of being dissipated as heat [2]–[7]. The register file embedded in microprocessors is one of the most power-consuming blocks, since it contains large-capacitance buses and is frequently accessed. In recent years, several adiabatic memories using multi-phase power clocks have been reported. In [2]–[6], the adiabatic circuits, such as ECRL (efficient charge recovery logic), PAL-2N (pass-transistor adiabatic logic with NMOS pull-down configuration), CPAL (complementary pass-transistor adiabatic logic), and RERL (Reversible Energy Recovery Jianping Hu is with Faculty of Information Science and Technology, Ningbo University, Ningbo City, Zhejiang 315211, China. He is now a professor (corresponding author: 86-574-87665739; fax: 86-574-87600940; e-mail: nbhjp@yahoo.com.cn). Xiaolei Sheng is with Faculty of Information Science and Technology, Ningbo University, Ningbo City, Zhejiang 315211. Logic) are used to drive address decoders, bit-lines and word-lines containing large node capacitance. However, the problems of these multi-phase clocking adiabatic circuits include clock skew, complicated power-clock tree, and multiple power-clock generators, which result in extra area overhead and increase the complexity of the layout place and route [7]–[9]. Recently, a two-phase CPAL has been reported [10], which uses two-phase non-overlap power clocks. This paper focuses on the layout implementations of the single-phase CPAL register file. A single-phase power-clock scheme for two-phase CPAL circuits is proposed. A 32×32 adiabatic register file based on two-phase CPAL with power-gating scheme is presented, which can operate on a single-phase power clock. The two-phase non-overlap power-clock generator with power-gating scheme is used to supply the proposed adiabatic register file. For comparison, a 32×32 register file based on conventional CMOS using similar structure is also implemented. The function verifications and energy loss comparisons have been carried out. ## II. REVIEW OF TWO-PHASE CPAL GATES CPAL circuits using two-phase power-clocks scheme have been reported in [10], as shown in Fig. 1. It is composed of two main parts: the logic function circuit and the load driven circuit. The logic circuit consists of four NMOS transistors (N5-N8) with complementary pass-transistor logic (CPL) function block. The load driven circuit consists of a pair of transmission gates (N1, P1 and N2, P2). The clamp transistors (N3 and N4) ensure stable operation by preventing from floating of output nodes. Its simulated waveforms and two-phase power clocks are also shown in Fig. 1. The detailed description on two-phase CPAL circuits can be found in [10] The two-phase CPAL circuits don't have non-adiabatic energy loss on output nodes [10]. Therefore, if it is used for driving large node capacitances on the bit-lines, and word-lines of the register file, the energy loss can be greatly reduced compared to other similar adiabatic logic and conventional CMOS circuits. The complex CPAL gates can be realized by using CPL function blocks to replace the N5-N8 of the CPAL buffer. All basic two-input CPAL gates, such as buffer/inverter, AND/NAND gate, multiplexer, OR/NOR gate, XOR/XNOR, use the same topology, and only inputs are permutated [10]. Fig.1 CPAL buffer using two-phase scheme and its waveforms Fig. 2 (a) shows the schematic of the two-input AND/NAND gate, where only the N-logic input blocks are shown and the other transistors (P1, P2, and N1-N4) are omitted for simplicity. Its layout is also shown in Fig. 2(a). The schematic of the three-input AND/NAND and its layout are shown in Fig. 2 (b). Fig. 2 CPAL gates: (a) two-input AND/NAND, and (b) three-input AND/NAND # III. Two-Phase Power-Clock Generator & Power-Gating Scheme In the two-phase CPAL circuits, the two-phase power supplies (power clocks) must be employed, thus a two-phase power-clock generator is demanded. The two-phase power-clock generator consists of a mode-2 counter and an adiabatic signal converter, as shown in Fig. 3 (a). Fig. 3 Two-phase non-overlap power-clock generator and power-gating scheme: (a) schematic and (b) its layout The two-phase power clocks can be generated from a single-phase sinusoidal power clock (pc) that can be easily produced using a general LC circuits [9], [11]. The signal converter is used to convert the single-phase sinusoidal power clock to the two-phase power clocks. The mode-2 counter using the static CMOS flip-flop is used to synchronize the signal converter. Simulated waveforms for the two-phase power-clock generator are shown in Fig. 4. In order to reduce energy loss of adiabatic logic blocks during idle periods, the power-gating technologies can be introduced by switching off their power clocks. The power-gating scheme for the four-phase CPAL circuits has been reported in [12]. The proposed power-gating scheme for the single-phase CPAL circuits is shown in Fig. 3(a). In Fig. 3(a), a transmission gates (TG) are used as the power-gating switch, which are inserted between the single-phase power clock (clk) and virtual power clock (pc). It is used to disconnect the adiabatic logic block from the power clock during idle periods. A clamp NMOS transistor prevents the floating state of the virtual power clock (pc). In active mode, the power-gating control signal (active) is high, thus virtual power clock (pc) follows power clock (clk). In sleep mode, active is low, virtual power clock (pc) is set as low level, so that the power-clock generator and the power-gated adiabatic logic block is shut down to reduce energy losses during idle periods. Fig. 4 Simulated waveforms of two-phase power-clock generator with power-gating scheme The power-gating switches introduce an additional energy loss. In active mode, the power-gated logic block can be modeled by a capacitor $C_{\rm AL}$ and a resistor $R_{\rm AL}$ [12]. The energy loss per cycle in active mode introduced by the power-gating switch can be written as $$E_{\text{active}} = \frac{\pi^2}{2} \left( \frac{RC_{\text{AL}}}{T} \right) C_{\text{AL}} V_{\text{DD}}^2 , \qquad (1)$$ where T is period of the power clock clk, R is its turn-on resistance of the transmission-gate (TG), which is in inverse proportion to the channel width of TG. In sleep mode, TG is off, and $I_{leakage}$ is its average leakage current, which is proportional to its channel width. The energy loss per cycle in sleep mode introduced by the power-gating switch can be written as $$E_{\text{sleep}} = I_{\text{leakage}}T$$ , (2) Additional energy is needed to turn on and off the power-gating switch between sleep and active modes by charging the node Active and Activeb from 0V to $V_{\rm DD}$ . When sleep time is long enough (larger than 20T), the energy loss turning on and off the switch can be ignored. Average energy loss per cycle in one power-gating cycle is written as $$\overline{E} \approx (E_{active})\alpha + E_{sleep}(1-\alpha)$$ , (3) where $a = T_{\text{active}}/(T_{\text{active}} + T_{\text{sleep}})$ is active ratio, $T_{\text{active}}$ is active time, and $T_{\text{sleep}}$ is sleep time. The energy overhead of the power-gating switches can be minimized by choosing optimal sizes of TG according to (1) - (3) for a given power-gated adiabatic logic block and a given active ratio (a). # IV. CPAL REGISTER FILE A single-phase adiabatic 32×32 register file with power-gating scheme has been implemented using TSMC 0.18µm process. All the circuits except for the storage cell array are realized with two-phase CPLA circuits. #### A. Storage Cell The storage-cell structure is shown in Fig. 5 (a). The cell consists of a cross-coupled inverter pair (N1, P1, and N2, P2) and two pairs of access transistors (N<sub>R</sub> and N<sub>Rb</sub>, N<sub>W</sub> and N<sub>Wb</sub>) that are enabled by *RWL* (read word-line) and *WWL* (write word-line) for read and write operations, respectively. The memory array is composed of a multiplicity of these cells arrayed horizontally and vertically. The *WWL* and *RWL* of a row are connected along the horizontal axis, while the read bit-lines (*RBL* and *RBLb*) and the write bit-lines (*WBL* and *WBLb*) are connected in a column. The storage-cell layout is shown in Fig. 5 (b), and its area is $9.02\mu m \times 13.1\mu m$ . The Metal 1 are placed horizontally that are used for the power supply ( $V_{\rm DD}$ ), write word-line (WWL), read word-line (RWL), and ground (GND), respectively. The read bit-lines (RBL and RBLb) and the write bit-lines (WBL and WBLb) use metal lines (Metal 2) that are placed vertically. This will results in regular layout designs for large storage array. Fig. 5 Storage cell: (a) schematic and (b) its layout #### B. The Structure of CPAL Register File The single-phase CPAL register file is shown in Fig. 6, which consists of a storage-cell array, address decoders, read/write word-line drivers, sense amplifiers, write bit-line and read data-line drivers, power-gating switches, and a two-phase non-overlap power-clock generator with the power-gating switch, which generates two-phase power clocks $(pc_1 - pc_5)$ to supplies the whole circuits. $pc_3$ and $pc_5$ have the same phase as $pc_1$ , while $pc_4$ has the same phase as $pc_2$ . For simplicity, the power-clock generator with the power-gating switch is not shown in Fig. 6. As shown in Fig. 6, an address decoder is divided into the two-level address decoding, which consists of the pre-decoder and two-input NAND gates. The address decoder with 5-bit addresses ( $A_0 - A_4$ ) is used for selecting storage cells by charging the word lines. The word-line signals are produced by using AND gates with the two outputs of the pre-decoding and the read/write enable signals (RE and WE). WWL (write word-line) is charged when $pc_3$ goes high, and RWL (read word-line) is charged when $pc_4$ goes high. #### C. Operation Timing The timing diagram is shown in Fig. 7. The write/read operation is composed of four cycles of the virtual power clock *pc*. During T1, address pre-decoding is processed. During T2, address decoding is completed, and *RE* and *WE* (read enable signal and write enable signal) are prepared. During $T_3$ , the WWL (write word line) is selected, and then the write operation is completed. During $T_4$ , the RWL (read word line) is selected and the RBL (read bit line) follows the RWL or stays at a ground level. During next cycle, the read data (RD) is carried out. Fig. 6 The structure of the CPAL register file core Fig. 7 The operation timing of the CPAL register file ### V. POST-LAYOUT SIMULATIONS & ENERGY DISSIPATIONS The single-phase adiabatic 32×32 register file based on CPAL circuits has been implemented. Considering the energy overhead of the power-gating switch and its area penalty, the channel width of the NMOS and PMOS transistors of TG is taken with $7\mu m$ and $14\mu m$ , respectively. To compare its performance, we also develop a $32\times32$ register based on conventional CMOS circuits using the similar structure. Full-custom layouts are drawn by using Virtuoso<sup>TM</sup> Layout Editor. The layout of the single-phase adiabatic $32\times32$ register file is shown in Fig. 8, and its area is $282\mu m\times322\mu m$ . The layout area of the conventional $32\times32$ register file is $279\mu m\times136\mu m$ . Full parasitic extraction is done for the two register files. The post-layout simulations have been performed using the net-list extracted from their layouts. The simulated waveforms of the adiabatic register files are shown in Fig. 9. The energy consumptions of the two register files are listed in the Table I. The conventional CMOS register file is provided by a DC power supply ( $V_{\rm DDS}$ ). Based on post-layout simulations, its energy dissipation per cycle is about 26.4pJ at 100MHz. The single-phase CPAL register file is provided by the power clocks $pc_1 - pc_4$ and a DC voltage source $V_{\rm DD}$ , and its energy loss also includes the energy dissipation of the two-phase non-overlap power-clock generator and power-gating switch, which is denoted by *clk* in Table I. Therefore, the total energy consumption per cycle is the sum of the six terms. At 100MHz, the energy dissipation per cycle of the adiabatic register file is about 7.2pJ. Fig. 8 The layout of the CPAL 32×32 register In the adiabatic register file, the power-gating switch is used to reduce energy loss during idle periods. Since read / write operation doesn't be always carried in the register file, the energy loss of the adiabatic register file can be greatly reduced by using power-gating scheme for the standby state of the register file. Fig. 9 Simulation waveforms of the adiabatic register file TABLE I ENERGY CONSUMPTIONS PER CYCLE OF THE CPAL AND CONVENTIONAL REGISTER FILES AT 100MHz (PJ) | 8-bit full adders | Energy Loss Sources | | | | | | | |-------------------|---------------------|--------|--------|--------|-------------|------|--------------| | | $pc_1$ | $pc_2$ | $pc_3$ | $pc_4$ | $V_{ m DD}$ | clk | $V_{ m DDS}$ | | Pre-layout | 0.07 | 0.7 | 0.7 | 1.2 | 0.3 | 0.43 | 14.4 | | Post-layout | 0.38 | 0.9 | 2.9 | 1.7 | 0.8 | 0.52 | 26.4 | #### VI. CONCLUSION This paper presents an adiabatic register file based on two-phase CPAL with power-gating scheme. It can operate on a single-phase power clock, and its power-clock generator is much easier than multi-phase schemes. Based on post-layout simulations, the single-phase adiabatic register file attains about 73% energy savings, as compared with conventional CMOS circuits at 100MHz. #### ACKNOWLEDGMENT Project is supported by National Natural Science Foundation of China (No. 60773071), Zhejiang Science and Technology Project of China (No. 2007C11067), and Ningbo Natural Science Foundation (No. 2009A610066). #### REFERENCES - J. M. Rabaey and M. Pedram, "Low power design methodologies," Kluwer Academic Publishers, Boston, 1996, pp. 65-95. - [2] Y. Moon and D. K. Jeong, "A 32 x 32-b adiabatic register file with supply clock generator," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, 1998, pp. 696-701. - [3] K. W. Ng and K. T. Lau, "A novel adiabatic register file design," *Journal of Circuits, Systems, and Computers*, vol. 10, no. 1, 2000, pp. 67-76. - [4] Jianping Hu, Tiefeng Xu, and Hong Li, "A lower-power register file based on complementary pass-transistor adiabatic logic," *IEICE Transactions on Information and Systems*, vol. E88-D, no. 7, 2005, pp. 1479-1485. - [5] Jianping Hu, Binbin Liu, Xuanyan Hu, and Sheng Zhang, "A Test Chip for CPAL Register File Fabricated in Chartered 0.35μm CMOS Process," *IEEE Midwest Symposium on Circuits and Systems*, 10-13 Aug. 2008, pp. 434 – 437. - [6] J. -H. Kwon, J. Lim, and S. -I. Chae, "Three-port nRERL register file for ultra-low-energy applications," *The International Symposium on Low Power Electronics and Design*, Digest of Technical Papers, 2000, pp. 161-166. - [7] S. Kim, C. H. Ziesler, and M. C. Papaefthymiou, "A true single-phase energy-recovery multiplier," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 11, no. 2, 2003, pp.194–207. - [8] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, and K. W. Current, "Clocked CMOS adiabatic logic with integrated single-phase power-clock supply," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 8, no. 4, 2000, pp. 460–463. - [9] D. Maksimovic and V. G. Oklobdzija, "Integrated power clock generators for low energy logic," *IEEE Power Electronics Specialists Conference*, Atlanta, June 1995, pp.61–67. - [10] Jianping Hu, Tiefeng Xu, and Yinshui Xia, "Low-power adiabatic sequential circuits with complementary pass-transistor logic," *IEEE Midwest Symposium on Circuits and Systems*, USA, August 7-10, 2005, pp. 1398-1401. - [11] H. Mahmoodi-Meimand and A. Afzali-Kusha, "Efficient power clock generation for adiabatic logic," *IEEE International Symposium on Circuits and Systems*, 2001, pp.642-645. - [12] Dong Zhou, Jianping Hu, and Ling Wang, "Adiabatic Flip-Flops for Power-Down Applications," *IEEE International Symposium on Integrated Circuits*, Singapore, 2007, pp. 493-496.