# Transceiver for Differential Wave Pipe-Lined Serial Interconnect with Surfing

Bhaskar M., Venkataramani B.

Abstract—In the literature, surfing technique has been proposed for single ended wave-pipelined serial interconnects to increase the data transfer rate. In this paper a novel surfing technique is proposed for differential wave-pipelined serial interconnects, which uses a 'Controllable inverter pair' for surfing. To evaluate the efficiency of this technique, a transceiver with transmitter, receiver, delay locked loop (DLL) along with 40mm metal 4 interconnects using the proposed surfing technique is implemented in UMC 180nm technology and their performances are studied through post layout simulations. From the study, it is observed that the proposed scheme permits 1.875 times higher data transmission rate compared to the single ended scheme whose maximum data transfer rate is 1.33 GB/s. The proposed scheme has the ability to receive the correct data even with stuck-at-faults in the complementary line.

**Keywords**—Controllable inverter pair, differential interconnect, serial link, surfing, wave pipelining.

#### I. INTRODUCTION

S the CMOS technology scales down, transistor sizes get reduced and this in turn increases the speed of the logic blocks [1]. The interconnects between the transistors, referred to as local interconnects, become shorter as technology scales down. However, global interconnects (interconnects used for routing signals between logic blocks), do not scale in length from one technology to another [2] and they limit the maximum data rate for on-chip communication. In deep submicron technologies, in order to achieve high data transfer rates, the delay through the global interconnects needs to be reduced. For this purpose, techniques such as repeater insertion [3], wire sizing [4], low swing signaling [5] and pulsed wave interconnects [6], [7] have been proposed in the literature. However, even with these techniques, the time required to transmit data across chip may be several clock periods or handshake cycles.

In [8], wave-pipelining is applied to conventional repeater insertion technique to replace the global clocks with local clocks. Wave-pipelining enables multiple data waves to propagate through a uniformly buffered global interconnect. However, the data sent through wave-pipelined interconnects are not reliable. To overcome this problem, surfing technique is proposed in [9] for single ended interconnect. In this technique, a control signal denoted as 'req' is transmitted in a separate line along with each buffered wave-pipelined

Bhaskar M. is Associate Professor in Department of Electronics and Communication Engineering, National Institute of Technology, Tiruchirappalli, Tamilnadu, India (e-mail: bhaskar@ nitt.edu).

Venkataramani B. is Professor in Department of Electronics and Communication Engineering & Dean(R&C), National Institute of Technology, Tiruchirappalli, Tamilnadu, India (e-mail: bvenki@ nitt.edu).

interconnect segment. This generates the surfing signal 'fast' that controls the propagation delay of each segment. When 'fast' is true, the delay of the buffer in the particular interconnect segment becomes lesser than the normal value. The circuit used in [9] to generate signal 'fast' from 'req' needs a setup timing constraint of about one fourth of the clock period. The reliability of data transmitted is ensured by the 'req' signal but reliability of the transmission of 'req' is not ensured.

In this paper, we propose a novel surfing technique for differential wave-pipelined interconnects to overcome the above constraints. In this technique, the complementary signal path is used to surf the true signal path and vice versa. A separate line is not required to propagate the control signal. It eliminates the setup time constraint and the data reliability is ensured both for true and complement signals. In this paper, the complete transceiver for differential wave pipe-lined serial interconnect with surfing for 40mm wire is designed and post layout simulations are carried out.

The paper is organized as follows: Section II describes the design of differential wave-pipelined serial interconnect with surfing. In Section III, the details of the test chip design for serial interconnect are presented. Section IV provides the post layout simulation results and the observations. The concluding remarks are given in Section V

# II. DESIGN OF DIFFERENTIAL WAVE-PIPELINED SERIAL INTERCONNECT WITH SURFING

The schematic diagram of surfing circuit proposed for the wave-pipelined differential serial interconnect is shown in Fig. 1. It has true and complementary data wires connected between transmitter and receiver. These two interconnect wires are divided into 'n' equal segments and uniform size buffers are inserted between each segment along with its control circuitry for surfing. The data transmission will be robust if both true and complementary data are received by the receiver simultaneously.



Control circuitry for generating surfing signals

Fig. 1 Schematic diagram of differential wave-pipelined serial interconnect with surfing and UR

A modified pair of buffers called the "Controllable Inverter pair" is proposed in this paper to ensure surfing along both true and complementary wires. The controllable inverter pair can vary the delay of the buffers when the control signals are activated, so that transmission rate can be made faster or slower i.e. the delay of the data lines can be varied whenever required.



Fig. 2 Block diagram of wave-pipelined segment with surfing

In the proposed scheme, each surfing segment of the wavepipelined serial interconnect contains a controllable inverter pair, its control circuitry and an interconnect wire. The block diagram of a segment used for surfing is shown in Fig. 2. The timing constraints proposed in [9] for surfing the single ended interconnect is extended for the differential interconnect and the control circuitry proposed in this paper ensures that the surfing pulses are produced in accordance with the timing constraint given by (1).

$$\delta_{true}^{fast,max} \leq \delta_{comp}^{slow,min} \geq \delta_{comp}^{fast,max} \leq \delta_{true}^{slow,min}$$
 (1)

where.

 $\delta_{true}^{fast,max}$  &  $\delta_{comp}^{fast,max}$  - denote the maximum delay of the true and complementary data paths when 'fast' is asserted respectively.

 $\delta_{true}^{slow,min}$  &  $\delta_{comp}^{slow,min}$  - denote the minimum delay of the true and complementary data paths when 'slow' (complement of true) is asserted respectively.

These constraints ensure that events in the True and Complementary data paths propagate together at the same speed. It is to be noted that in the proposed scheme, the surfing pulses are produced only when there is reliability issue in the transmission path, whereas in [9], the fast pulses are produced irrespective of the situation.



Fig. 3 Controllable inverter pair (a) surfing circuit for true signal (b) surfing circuit for complement signal

#### A. Controllable Inverter Pair

The circuit diagram of the "Controllable inverter pair" is given in Fig. 3. In Fig. 3, T\_IN, T\_OUT, C\_IN and C\_OUT denote true input, true output, the complementary input and the complementary outputs respectively. One pair of inverters is used for true line (T\_IN) and another pair for the complementary line (C\_IN). The delay of the inverter pairs is controlled by the surfing signals F1 and F2. These two inverters must always be used as a pair because the control signals for surfing these pairs are generated from respective inputs of the pair. The control circuitry ensures that surfing signals are produced at every segment.



Fig. 4 Control circuitry to generate F1, F2 (a) pseudo-NOR (b) pseudo-NAND

The transistors M4 and M7 are ON, when surfing signal F2 is low, thereby providing additional pull-down strength at the output node T\_OUT and pull-up strength at the output node C\_OUT. Similarly, transistors M3 and M8 are ON, when surfing signal F1 is high, thereby providing additional pull-up strength at the output node T\_OUT and pull-down strength at the output node C\_OUT. These additional surfing transistors reduce the delay and increase the frequency of operation of the serial link. The number of transistors used in the surfing segments is less than that used in [9].

## B. Control Circuitry

The control circuitry used for surfing the controllable inverter pair is shown in Fig. 4. The control signals F1 and F2 are obtained using pseudo-NOR and pseudo-NAND circuits respectively. The signals generated by the control circuitry are shown in Fig. 5.

The signal F1 becomes high, when both T\_IN & C\_IN are less than VDD/2 and it speeds up the low to high transition of T\_OUT and high to low transition of C\_OUT. Similarly, F2 becomes low, when both T\_IN & C\_IN are greater than VDD/2 and it speeds up the high to low transition of T\_OUT and the low to high transition of C\_OUT. When neither F1 nor F2 is true, the surfing signals are not active and the signal propagates with the normal speed.



Fig. 5 Generation of F1 and F2 signals from T\_IN and C\_IN

C.Operation of Controllable Inverter Pair Circuit with Stuck-at-Faults

In addition to surfing, the Controllable Inverter Pair enables the correct logic level to be transmitted to the next segment even if there are any stuck-at-faults (SAF) in the complementary signal path. This is achieved using the signals F1 and F2 as follows:

The circuit for generating signals F1 and F2 are given in Fig. 4. Let us assume that a stuck-at-zero fault occurs at the point C IN in Fig. 3. This makes the normal inverter formed by M5 & M6 to drive the C OUT to be always one. The problem occurs when T IN is actually zero. This requires the T OUT to be one and the C OUT to be zero. Even in this case, the control circuitry ensures that the correct results are obtained as follows: In Fig. 4, Let us assume that T IN is zero and C IN is also zero due to stuck-at-fault. This makes both F1 and F2 to be one which drives the surfing inverter output to zero. Since the surfing inverter is stronger than the normal inverter, the output is restored to the correct logic level. Next, let us consider the stuck-at-one fault at C IN, which makes the normal inverter to drive the C OUT to be always zero. In this case when T IN is one, both F1 and F2 are zero, which makes the surfing inverter to drive the C OUT to logic one.

### III. DESIGN OF TEST CHIP FOR SERIAL INTERCONNECT

In order to test the proposed techniques, the design of complete differential wave-pipelined serial link has been carried out in UMC 180 nm technology and its block diagram is shown in Fig. 6. The serial link consists of a transmitter, interconnect surfing segments, a receiver and Delay locked loops (DLL) for synchronization. The domino logic based multiplexer proposed in [10] is used as transmitter in this paper. Two multiplexers are used - one for the true and another for the complementary data line. The receiver uses the improved voltage mode differential de-multiplexing sense amplifier (IVDSA) [11].

#### A. Domino Logic Based Transmitter

In this paper, domino logic based 4:1 multiplexer proposed in [10] is used for the transmitter to reduce the power dissipation compared to that of pseudo NMOS based multiplexer. The transmitter circuit for true data is shown in Fig. 7. The numbers in this figure denote the scale factors used for different transistors (scale factor [11] is the ratio of the size of the transistor required to deliver a particular current to that

of a unit size inverter delivering the same current). An inverting buffer (driver) is used at the output of the domino logic based multiplexer.



Fig. 6 Block diagram of differential wave-pipelined serial interconnect with surfing



Fig. 7 The transmitter circuit for true data



Fig. 8 The timing diagram of the multiplexing clock signals

The operation of the circuit is as follows: The clocks  $\emptyset_{10}$  and  $\emptyset_{t2}$ ,  $\emptyset_{t1}$  and  $\emptyset_{t3}$  are out-of-phase to each other as shown in Fig. 8. The clocks  $\emptyset_{t0}$  and  $\emptyset_{t2}$  are used for the control of precharge/evaluation phase of two least significant bits of data  $(D_0, D_1)$  and two most significant bits of data  $(D_2, D_3)$  respectively. The clocks  $\emptyset_{t1}$  and  $\emptyset_{t3}$  are used for multiplexing

all the four bits of the data. When the multiplexer circuit corresponding to the LSB 2-bits is in pre-charge phase, the other portion of the circuit performs the multiplexing operation. When  $\mathcal{O}_{t0}$  is low, the node N1 is in pre-charge state and it is isolated from node N3 by transistors M11 & M12 used as transmission gate. At this time,  $\mathcal{O}_{t2}$  is high and node N2 evaluates to either data  $D_2$  or  $D_3$  based on clocks  $\mathcal{O}_{t1}$  and  $\mathcal{O}_{t3}$ .

Signal at node N2 is passed to node N3 through transistors M23 & M24. When  $D_2$  is high, the node N2 becomes 0 and in the same evaluation phase, if  $D_3$  is low, the node N2 must be pulled to logic 1. This is ensured using transistors M21 & M22. For the purpose of ensuring the load to be identical, the transistors M19 & M20 are used. Similarly data  $D_0$  and  $D_1$  are multiplexed.

Using the circuit similar to that of Fig. 7, the Complement signal  $(\overline{T}_x)$  is generated using the complemented data inputs  $(\overline{D}_0$ - $\overline{D}_3)$ .

#### B. Receiver

The improved voltage mode sense amplifier (IVSA) proposed in [12] is modified in this paper to make it sense the serial input data and also to de-multiplex it into 4-bit parallel data. This is referred to as the improved voltage mode demultiplexing sense amplifier (IVDSA) and is shown in Fig. 9. The IVDSA consists of differential input stage, a pair of cross coupled inverters and the nonoverlapped clock driven transistors for de-multiplexing. In conventional voltage mode sense amplifier (CVSA) [13], the drains of the input transistors are connected to the sense nodes of cross coupled inverters. In the IVDSA, the drain of the input transistors are directly connected to the output of the cross coupled inverters. This reduces the number of series transistors in the evaluation path and hence it reduces the switching times. The additional transistors M<sub>10</sub>-M<sub>12</sub> are used for de-multiplexing along with sensing the data signal. Four such IVDSAs are used to recover the data signals  $D_0$ - $D_3$  using clocks ( $\emptyset_{r0}$ - $\emptyset_{r3}$ ) at the receiver DLL.

The operation of the circuit is as follows: The first IVDSA circuit is controlled by non overlapping clocks  $\mathcal{O}_{r0}$  &  $\mathcal{O}_{r3}$ applied to the gate of transistors M7-M12, which makes the sense amplifier to receive the data  $D_0$  and its complement. During the low phase of the clock, the internal nodes x and y are pre-charged to logic high through M<sub>7</sub> & M<sub>8</sub>, M<sub>10</sub> & M<sub>11</sub>. The capacitance at the differential output nodes are charged to high values. During the overlapping times of the clocks  $\mathcal{O}_{r0}$  &  $\mathcal{O}_{r3}$ , when both  $\mathcal{O}_{r0}$  and  $\mathcal{O}r3$  are 1, the transistors  $M_9$  &  $M_{12}$  are turned ON and they provide the tail current. The voltage at the nodes x and y are determined by the inputs  $(R_x \text{ and } \overline{R}_x)$  driven by the interconnect segment. The regenerative action of cross coupled inverters pulls one node to VDD and the other to GND according to its inputs. The sensed data from first IVDSA is fed to SR latch. The receiving end of interconnect is connected to IVDSA.

The delay equation for the voltage mode sense amplifier is obtained as follows:

The regenerative amplifier (cross coupled inverters) consists of two inverters connected back to back. Current through both inverters are the same and is given by equation

$$I_{x} = g_{m1} V_{1} = -g_{m2} V_{1} \tag{2}$$

The difference between the output voltages of the two inverters is given by

$$\mathbf{v}_{\mathbf{x}} = \mathbf{v}_1 - \mathbf{v}_2 \tag{3}$$

Differential resistance ( $r_l$ ) and the transconductance of the inverter (gm) of the regenerative amplifier are given by

$$r_1 = v_x/I_x = -2/g_m$$
 (4)  
 $g_m = g_{m1} = g_{m2}$ 

Time constant  $\tau$  and delay  $T_{latch}$  of the inverter are given by

$$\tau = \mathbf{r}_l \, \mathbf{c}_l \tag{5}$$

$$T_{latch} = \tau_{latch} ln \frac{\Delta v(t)}{\Delta v(0)}$$
 (6)

Time constant and delay  $T_{\text{SA}}$  of the regenerative load are given by

$$\tau_{\text{latch}} = c/g_{\text{m}} \tag{7}$$

$$T_{SA} = c_l v_{thp} / I_{dl} + T_{latch}$$
 (8)

where,  $I_{dl}$  is the current flowing through one arm of sense amplifier and  $v_{thp}$  is the threshold voltage of PMOS transistor.



Fig. 9 Improved voltage mode demux sense amplifier (IVDSA)

#### C. Delay Locked Loop (DLL)

The Mixed DLL proposed in [14] for generating the four phase clock is used in this paper and its block diagram is shown in Fig. 10. It consists of three basic blocks: dynamic phase comparator, charge pump and voltage controlled delay line (VCDL). The phase comparator block compares the reference clock with the delayed output signal  $\emptyset_3$  from the last

stage of the VCDL. Depending on the difference in phase, UP and DOWN pulses are generated. If reference clock is leading the output  $\emptyset_3$ , an UP pulse is generated; else, a DOWN pulse is generated. These pulses are given to the charge pump to generate the control voltage  $V_{ctrl}$ . The control voltage controls the delay of each stage in VCDL and hence the phase of the output clocks  $(\emptyset_0 - \emptyset_3)$  is adjusted until the DLL is locked.

The design of the DLL circuit is carried out using method of logical effort [11]. The design parameters of the phase comparator and VCDL using [11] are shown in Fig. 10, where  $g_i$  - stage logical effort,  $p_i$  - stage parasitic delay, H - electrical effort along the path and B - path branching effort.

The dynamic phase comparator (PC) circuit [15] compares the reference clock with the delayed output signal from the last stage of the VCDL. The PC can operate with less phase offset at high frequencies due to the symmetry in the circuit. The widths of UP and DOWN pulses depend on the phase shift between the inputs of the PC. In the phase frequency detector circuit proposed in [16], UP and DOWN pulses of fixed duration are produced even at the locked state of the DLL.

It is shown in [15] that the accuracy of the phase comparator is improved when the UP and DOWN pulse durations are made shorter. In this paper, PC is designed (using the method of logical effort) such that it does not produce UP and DOWN pulses at the locked state in order to have less jitter accumulated in the clock pulses. The waveforms of the dynamic phase comparator are shown in Fig. 11. The charge pump (CP) circuit given in [14] is used. It takes the UP and DOWN pulses from PC and pumps a proportional amount of charge to its output capacitance (C<sub>CP</sub>). A critical design requirement is to ensure that the UP and DOWN charges are equal at the locked state. The charge imbalance is usually caused by the transistor mismatches, the capacitive charge injection and channel length modulation. The transistor mismatches and the capacitive charge injection are reduced by using two parallel connected current biasing transistors (M<sub>3</sub>&M<sub>4</sub>, M<sub>5</sub>&M<sub>6</sub>) and thereby increasing the pumping current. The current mismatch due to channel length modulation is reduced by proper design of PC and CP.

Instead of applying a separate bias voltage to the tail current source of the CP,  $V_{ctrl}$  of the CP is used as bias voltage. This makes the loop bandwidth of the DLL to be a constant fraction of the input frequency and thereby ensures better stability.

The VCDL circuit shown in Fig. 10 uses the current starved inverters proposed in [17]. A segment of VCDL has three current starved inverter cells followed by a normal inverter and is shown in Fig. 10. In order to minimize the sensitivity to supply and substrate noise and to achieve a wide tuning range, the VCDL is built with the current starved inverter (buffer) topology. The voltage  $V_{\text{ctrl}}$  is applied to series-connected elements, which can "current starve" an inverter. The voltage  $V_{\text{ctrl}}$  controls the ON resistance of pull-down transistor  $M_{\text{NIb}}$  and through a current mirror, pull-up transistor  $M_{\text{Pl}}$ . The variable resistance controls the current available to charge or discharge the output capacitance of the VCDL segments. The phase difference provided by each segment of VCDL depends

on the delay of each inverter, and it is determined by the input capacitance that each inverter provides to its predecessor and the resistance between the output node capacitance and supply rails. The design of the VCDL is carried out using method of logical effort taking the transmitter capacitance as its load. Fig. 11 shows the dynamic phase comparator waveforms. In Fig. 11, UP becomes high when Clk pulse leads  $\emptyset_3$  and DOWN becomes high when Clk pulse lags  $\emptyset_3$ .



Fig. 10 Diagram of DLL (a) Circuit diagram of VCDL (one segment) (b) Block diagram of DLL (c) Circuit diagram of phase comparator (PC) (d) Circuit diagram of charge pump (CP)



Fig. 11 Dynamic phase comparator waveforms

For the test chip, the coplanar line given in [18] is used as the differential transmission line and its parameters are shown in Fig. 12. The process parameters for interconnect implemented using metal 4 layer in 180nm UMC CMOS are: width  $0.6\mu m$ , thickness  $0.58\mu m$ , pitch  $1.43 \mu m$ .

#### IV. SIMULATION RESULTS AND PERFORMANCE COMPARISON

The design of differential wave-pipelined surfing interconnect is carried out for 40mm metal 4 interconnect in UMC 180nm technology. The post layout simulations are carried out using Cadence Virtuso tool. For 40mm interconnect, twenty identical surfed interconnect segments consisting of 2mm wire and surfing circuit are used.

Figs. 13 and 14 show the eye-diagrams of the True and Complementary input signals at the transmitter side of the differential serial interconnect and the respective outputs at the end of the 20th segment for a data pulse width of 400ps without jitter and with 110ps jitter respectively.



Fig. 12 Structure of Co-planar differential transmission line

From the waveforms, it is observed that both true and complementary data signals can be received at the same time at the end of the 20th segment and it may be noted that both the signals are received correctly by proper surfing in the signal paths.



Fig. 13 Eye-Diagram of the input and output with data pulse-width of 400ps without jitter



Fig. 14 Eye-Diagram of the input and output with data pulse-width of 400ps with jitter of 110ps

Figs. 15 and 16 show the eye-diagram of the true and complementary outputs at the 5th, 10th and 20th segments without surfing and with surfing respectively. From these waveforms, it is observed that with surfing the swing as well as the eye opening is good at the end of the receiver.

Figs. 17 and 18 show the eye-diagram of the input and the output at the 20th segment with 8th segment getting short circuited to ground and with 13th segment getting short-circuited to VDD respectively. From the waveforms, it is observed that even if there is any Stuck-at-faults in complementary signal path, the scheme transmits the data signal reliably to the receiver.



Fig. 15 Eye-Diagram at the output of 5th, 10th and 20th segments without surfing for a data period of 400ps



Fig. 16 Eye-Diagram at the output of 5th, 10<sup>th</sup> and 20th segments with surfing for a data period of 400ps



Fig. 17 Eye-Diagram at the output of the receiver with short-circuit to ground at 8th segment



Fig. 18 Eye-Diagram at the output of the receiver with short-circuit to VDD at 13th segment

The layout of the differential wave pipelined surfing interconnect along with the magnified portion of the transmitter, receiver and its corresponding delay locked loop (DLL) are also shown in Fig. 19.

The simulation results of both single ended and differential wave pipelined serial interconnect with surfing schemes are presented in Table I and are compared. The simulations are carried out for all corners (slow-slow-ss, typical-typical-tt, fast-fast-ff, slow NMOS fast PMOS-snfp, fast NMOS slow PMOS -fnsp).

From Table I, it is observed that the minimum data period required for the data transmission through the entire 20 differential interconnects segments is only 400ps which corresponds to a serial data rate of 2.5Gbps. This is 1.875 times better than the performance obtained in [9]. The differential scheme has the advantage of true signal surfing the complementary signal and vice-versa. Hence, separate control signal is not required for surfing and there is no setup time constraint as in [9]. The maximum allowable jitter between the true and complementary signal path is observed to be  $\pm 110$ ps.

TABLE I

PERFORMANCE COMPARISON OF THE PROPOSED SCHEME TO THAT OF SINGLE ENDED SCHEME

TO THAT OF SINGLE ENDED SCHEME

TO THAT OF SINGLE ENDED SCHEME

TABLE I

| PERFORMANCE COMPARISON OF THE PROPOSED SCHEME TO THAT OF SINGLE ENDED SCHEME |                           |                                                                  |
|------------------------------------------------------------------------------|---------------------------|------------------------------------------------------------------|
| Parameters                                                                   | Results of Surfing scheme | Implementation results of the Proposed differential interconnect |
|                                                                              | proposed in [9]           | surfing scheme                                                   |
| Length of each segment                                                       | 2mm                       | 2mm                                                              |
| Number of segments                                                           | 20                        | 20                                                               |
| Width of the Interconnect                                                    | Double the min. width     | 0.56μm(Min. width in 0.28μm)                                     |
| Minimum data period                                                          | All All except ss         | tt ss ff snfp fnsp                                               |
| (at specified corners)                                                       | 1 ns 750 ps               | 400ps 470ps 370ps 410ps 400 ps                                   |
| Control signal and setup constraint                                          | Yes                       | No                                                               |
| Maximum allowable Jitter                                                     | ±40ps                     | ±110ps                                                           |
| Capability to function in the presence of SAF                                | No                        | Yes                                                              |
| Noise Problem                                                                | Yes                       | No(since the scheme is differential)                             |



Magnified portion of the transmitter, receiver and its DLLs

Fig. 19 The layout of the differential wave pipelined serial interconnect with surfing

#### V.CONCLUSION

In this paper, transceiver for the wave pipelined differential serial interconnect with surfing scheme is implemented and tested. A new circuit called "Controllable Inverter Pair" is proposed for the purpose of surfing the differential true and complementary data signal paths. The differential scheme has higher data rate and allows higher jitter between true and complementary signal paths. It also has the additional advantage of transmitting the correct logic values to successive segments even if the stuck-at-faults occur in the complementary line.

#### REFERENCES

- International Technology Roadmap for Semiconductors, (2001).
   Semiconductor Industry Association, 2001, Interconnects section, p. 4.
- [2] R. Ho, K. W. Mai, and M. A. Horowitz, "The future of wires," Proc. IEEE, vol. 89, no 4, pp.490-504, April 2001.
- [3] H.B. Bakoglu, and J.D. Meindl, "Optimal interconnection circuits for VLSI," IEEE Trans. Electron Devices ED-32 (5), pp. 903–909, 1985.
   [4] C.J. Alpert, A. Devgan, J.P. Fishburn, and S.T. Quay, "Interconnect
- [4] C.J. Alpert, A. Devgan, J.P. Fishburn, and S.T. Quay, "Interconnect synthesis without wire tapering," IEEE Trans. Computer-Aided Design Integrated Circuits and Systems 20 (1), pp. 90–104, 2001.
   [5] H. Zhang, V. George, and J. M. Rabaey, "Low-Swing On-Chip
- [5] H. Zhang, V. George, and J. M. Rabaey, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Robustness," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 8, no.3, June 2000.
- [6] P. Wang, G. Pei and E. chih-chuan Kan, "Pulsed wave interconnect," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 12, no. 5. May 2004.
- [7] P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed current-mode signaling for nearly speed-of- light intra chip communication," IEEE Journal of Solid-State Circuits, vol. 41, pp. 772-780, April 2006.
- [8] J. Nyathi, R.R. Rydberg III and J.G. Delgado-Frias, "Wave-Pipelining the Global Interconnect to Reduce the Associated Delays," IEEE conference, 2006.
- [9] Greenstreet and Ren, "Surfing Interconnect," In Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06)

- [10] M. Bhaskar, D. Prasankumar and B. Venkataramani, "Design of Differential voltage mode Transmitter for On-chip serial link based on Method of Logical Effort", IEEE International conference ICCCNT, July 2012.
- [11] Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann Publishers, Inc., 1998.
- [12] P. Murugeswari, G. Anusha, P. Venkateshwarlu, M. Bhaskar, and B. Venkataramani, "A Wide Band Voltage Mode Sense Amplifier Receiver for High Speed Interconnects," Proceedings of TENCON 2008, IEEE Region 10 conference.
- [13] P. Wijetunga and A.F.J. Levi, "3.3 GHz Sense-amplifier in 0.18  $\mu m$  CMOS technology," IEEE, ISCAS, pp. 764-765, 2002.
- [14] Karutharaja, V, M. Bhaskar and B. Venkataramani, "Synchronization of On-chip Serial Interconnect Transceivers using Delay Locked Loop (DLL)," Proceeding of 2011 IEEE International conference ICSCCN, 2011.
- [15] Y Moon, J Choi, K Lee, D-K Jeong, M-K Kim, "An All Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide range Operation and Low-Jitter Performance," IEEE Journal of Solid-State Circuits, Vol. 35, No.3, March 2000.
- [16] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi and H. K. Lim, "A 960-Mb/s/pin Interface for skew tolerant bus using low jitter PLL," IEEE J. Solid-State Circuits, vol. 32, pp. 691-700, May 1997.
- [17] Mark G. Johnson, and Edwin L. Hudson, "A Variable Delay Line PLL for CPU-Coprocessor Synchronization," IEEE Journal of Solid-State Circuits, Vol. 23, No. 5, October, 1988.
- [18] H. Ito, J. Inoue, S. Gomi, H.Sugita, K. Okada and K. Masu, "On-chip Transmission line for Long Global Interconnects," IEEE, IEDM, 2004.

Bhaskar M. received the B.E. degree in Electronics and communication engineering from Bharathiar University, Coimbatore, India, in 1992 and the M.E. degree in Microwave and Optical Engineering from Madurai Kamaraj University, Madurai, India in 1995. He worked as project trainee in Indian Space Research Organization (ISRO), Bangalore, India from July 1994 to February 1995. He worked as faculty in Shanmuga College of Engineering (now Sastra Universtiy), Thanjavur, India from June 1995 to April 1997. Since 1997, he has been with the faculty of the National Institute of Technology, Trichy (Formerly known as Regional Engineering College, Trichy). Currently he is the Associate Professor of the Electronics and Communication Department. He has coauthored one book and has published numerous papers in national and international conferences out of which three of them have been selected for the best paper award. He has done many projects using Programmable DSPs. His research interests include Architecture and applications of DSPs; low power system on a single chip (SOC) design and the design and performance analysis of high speed transceivers for on-chip interconnects.

Venkataramani B. received the B.E. degree in Electronics and communication engineering from Regional Engineering College, Tiruchirappalli, India, in 1979 and the M.Tech. and Ph.D. degrees in electrical engineering from Indian Institute of Technology, Kanpur, India, in 1984 and 1996,respectively. He worked as Deputy Engineer in Bharath Electronics, Ltd., Bangalore, India, and as a Research Engineer in the Indian Institute of Technology, each for approximately 3 years. Since 1987, he has been with the faculty of the National Institute of Technology, Trichy (Formerly known as Regional Engineering College, Trichy). Currently he is the Professor of the Electronics and Communication Department and Dean (Research and Consultancy). He has published two books and numerous papers in journals and international conferences. His current research interests include field-programmable gate array (FPGA) and system on a single chip (SOC)-based system design and performance analysis of high-speed interconnects.