# Optimal Distribution of High-Speed Clocks on Transceiver Chips 

Makar Chand Snai, and Behzad Razavi<br>Department of Electrical and Computer Engineering, University of California at Los Angles, Los Angeles, CA 90095-1594 USA<br>(e-mail: makarsnai9@g.ucla.edu; razavi@ee.ucla.edu).


#### Abstract

A methodology is proposed for the optimum design of on-chip transmission lines carrying high-speed clocks. Differential metal 9 or stacked metal 8/metal 9 geometries with no ground plane emerge as best choices. Moreover, CMOS and CML signaling schemes are compared for clock distribution. It is shown that CMOS clocks are better suited for frequencies up to 20 GHz and line lengths up to 5 mm .


## I. Introduction

Wireline transceiver design must typically deal with the distribution of high-speed clocks across large chips. For example, the clock generated by a central phase-locked loop (PLL) must reach multiple lanes and then be separated into quadrature phases locally [1]. Similarly, the local oscillator (LO) in a wireless transceiver chip must travel a long distance to feed multiple radios [2]. Clock distribution has been studied extensively in the past [3]-[10] but there is a paucity of work on interconnect/circuit codesign. Today's wireline and wireless applications, require optimization of the interconnects and their interface circuits.

This paper describes analysis and design methods that enable optimal distribution of clocks across long distances on transceiver chips. Specifically, we address the following key questions. (1) What type of transmission line (T-line) structure is optimum? (2) How does one decide between CMOS and current-mode logic (CML) circuits for clock distribution?

Section II presents the analysis framework for our study, and Section III deals with the optimization of on-chip T-lines. Section IV studies the performance of T-lines with CMOS and CML interfaces and Section V analyzes the effect of termination resistance.

## II. Analysis Framework

This study is carried out in a $28-\mathrm{nm}$ CMOS technology with nine metal layers. Both metal 8 (M8) and metal 9 (M9) exhibit


Fig. 1. Clock distribution link.
a sheet resistivity of $23 \mathrm{~m} \Omega / \square$. Shown in Fig. 1 is the design environment with a clock frequency, $f_{C K}$, equal to $5 \mathrm{GHz}, 10$ GHz , or 20 GHz . A CMOS or CML driver delivers the clock to a differential T-line with a length, $l$, equal to $1 \mathrm{~mm}, 2 \mathrm{~mm}$, or 5 mm . The line is terminated into equal loads $R_{L 1}$ and $R_{L 2}$, with their center tap, A, left floating for CMOS signaling or tied to $V_{D D}$ for CML signaling. The analysis methods presented here can be readily applied to other clock frequencies and longer interconnects as well. The T-line structures developed in this work are simulated as two-port networks in Cadence's EMX.

In order to achieve the "optimal" performance, we seek to generate rail-to-rail swings in $V_{\text {out }}$ (in both CMOS and CML implementations) while considering rise and fall times and power consumption as benchmarking metrics.

## III. T-LINE DESIGN

The T-line properties play a critical role in the overall performance of the clock link shown in Fig. 1. To deliver a large voltage swing to the load with minimum power consumption, one may opt to maximize the line's characteristic impedance, $Z_{0}$. But such a choice makes two tacit assumptions. (1) The load resistance, $R_{L}$, must be equal to $Z_{0}$, a point that holds in most, but not all, cases. We return to this issue below. (2) The dc resistance, $R_{d c}$, of the T-line is negligible, which is decidely untrue for on-chip structures. If the line is driven by a low impedance, this resistance limits the voltage gain to $R_{L} /\left(R_{L}+R_{d c}\right)$. We call this factor the line's "dc gain". For current-mode operation, on the other hand, we predict less signal loss due to $R_{d c}$. Thus, T-line optimization seeks to maximize the voltage swing delivered to the load, but it may also depend on the type of the driver used in Fig. 1.
For differential signaling, we can envision the line geometries depicted in Fig. 2. From the perspective of routing in a complex floor plan, we constrain the total width, $d_{2}$, to $10 \mu \mathrm{~m}$. As the line width, W , increases, the dc resistance drops, but the fringe capacitance between CK and $\overline{C K}$ rises, lowering the characteristic impedance. We therefore expect that a certain value of the line spacing, $d_{1}$, yields the largest voltage swing at the load. The choice between the structures in Fig. 2(a) and Fig. 2(b) is governed by three considerations related to the ground plane: (1) this plane slightly lowers $Z_{0}$, (2) owing to the high sheet resistivity of metal 1 , this plane raises the loss at very high frequencies, and (3) this plane reduces unwanted coupling from and to the substrate.

(c)

Fig. 2. On-chip transmission line structures.


Fig. 3. Characteristic impedance of different T-lines.

The problem of dc resistance can be alleviated through the use of stacked metal layers. Depicted in Fig. 2(c) is an example where this resistance is halved if M9 and M8 have the same sheet resistivity. The cost is a greater fringe capacitance between CK and $\overline{C K}$ and hence a lower $Z_{0}$. Stacking more metal layers further decreases both $R_{d c}$ and $Z_{0}$, yielding diminishing returns.

In order to determine the optimum line design, we construct two plots for the three geometries in Fig. 2 while $d_{2}=10$

Fig. 4. DC resistance of different T-lines.
$\mu \mathrm{m}$ : (1) $Z_{0}$ versus $d_{1}$ and (2) $R_{d c}$ versus $d_{1}$ for a length of 2 mm . Shown in Figs. 3 and 4, the EMX results agree with our intuitive predictions, revealing that (a) the ground plane lowers $Z_{0}$ by about $5 \%$ because the fringe capacitance between CK and $\overline{C K}$ is by far dominant, and (b) $R_{d c}$ becomes an appreciable fraction of $Z_{0}$ as $d_{1}$ decreases.

We are ultimately interested in the voltage swing and transition times across the load for a given voltage-mode (VM) or current mode (CM) drive level. This requires that we examine the T-line's frequency response up to at least the third harmonic of the clock. Figures 5(a) and (b) plot the simulated frequency response of the structures shown in Fig. 2 with and without ground planes for VM excitation. Driven by a zero source resistance and terminated by $R_{L}=Z_{0}$, the lines are 5 mm long and have a spacing of $4 \mu \mathrm{~m}$ or $6 \mu \mathrm{~m}$. These results merit several remarks. First, the M9 structure, which exhibits a $Z_{0}$ of $136 \Omega$ for $d_{1}=6 \mu \mathrm{~m}$ and $110 \Omega$ for $d_{1}=4 \mu \mathrm{~m}$, generally suffers from less loss for the latter line spacing. That is, the optimum choice here emerges as $d_{1}=4$ $\mu \mathrm{m}$. Second, in the absence of the ground plane, this geometry provides a relatively flat response up to 60 GHz . Third, the stacked M9/M8 T-line also prefers $d_{1}=4 \mu \mathrm{~m}$ and performs better without a ground plane beyond roughly 30 GHz . Due to its lower dc loss, this structure may be preferred over its M9 counterpart.

Figures 5(c) and (d) repeat the foregoing study for currentmode excitations. Interestingly, for both M9 and stacked M9/M8 geometries, the optimum still occurs at $d_{1}=4 \mu \mathrm{~m}$. Moreover, the ground plane proves more detrimental here than in the case of a voltage-mode input. We conclude that the Tline using $d_{1}=4 \mu \mathrm{~m}$ and no ground plane is the optimum structure for clock distribution. The choice between the M9 and stacked M8/M9 geometries is revisited later.

## IV. CMOS and CML Signaling

The CMOS and CML implementations of the clock link are depicted as half circuits in Fig. 6. The design begins with a


Fig. 5. (a) Voltage-mode ac response of 5-mm T-lines with $d_{1}=4 \mu \mathrm{~m}$, (b) voltage-mode ac response of 5-mm T-lines with $d_{1}=6 \mu \mathrm{~m}$, (c) current-mode ac response of $5-\mathrm{mm}$ T-lines with $d_{1}=4 \mu \mathrm{~m}$, and (d) current-mode ac response of $5-\mathrm{mm}$ T-lines with $d_{1}=6 \mu \mathrm{~m}$.
peak-to-peak amplitude of about 400 mV at node X so as to ensure nearly rail-to-rail swings in $V_{\text {out }}$. The driver inverter devices in Fig. 6 are then sized in conjunction with $R_{L 1}=Z_{0}$. Similarly, $I_{S S}$ is chosen in Fig. 6(a) for $I_{S S} R_{L 1}=400 \mathrm{mV}$. From the equivalent circuit shown in Fig. 6(a), we observe that the CMOS driver benefits from class-D action and consumes half as much power as does the CML counterpart.

The trends suggested by the ac responses shown in Fig. 5 must now be checked by large-signal transient simulations of CMOS and CML topologies. We have analyzed different T-line geometries with or without a ground plane, different lengths, and different clock frequencies. In these simulations, we monitor the voltage received across the load resistance. We present a subset of the results here.

We begin with $f_{C K}=20 \mathrm{GHz}$ and $l=2 \mathrm{~mm}$, and examine the load voltage for M9 and stacked M9/M8 T-lines with and without a ground plane and with $d_{1}=4 \mu \mathrm{~m}$. Shown in Fig. 7(a) for the CMOS realization, the differential waveforms reveal that the optimum occurs for the M9 or the stacked M9/M8 line, both without the ground plane. Similarly, for the CML topology, Fig. 7(b) nominates the same structures as the best choices. Nevertheless, the CML link suffers from slower transitions.

We next raise the length to 5 mm and, using the optimum T-lines, study the received CMOS and CML clocks, illustrated in Fig. 8. The CMOS link exhibits sharper transitions.

The foregoing analysis leads to several important conclusions. First, for $f_{C K}=20 \mathrm{GHz}$, the 28-nm CMOS technology considered here makes CMOS transmission more attractive for both $l=2 \mathrm{~mm}$ and $l=5 \mathrm{~mm}$. Second, in the optimum conditions, the CMOS and CML drivers draw 5 mW and 10 mW , respectively, and the self-biased inverter draws 3 mW .

## V. Termination Resistance Considerations

As circuit designers, we ask, beyond what T-line length is it necessary to have $R_{L}=Z_{0}$ ? In other words, under what conditions can we select $R_{L}>Z_{0}$ so as to increase the received voltage swings or reduce the power consumption? This point can be addressed from two different perspectives. First, for a periodic square-wave clock at $f_{C K}$, we surmise that the T-line need not be matched at the load if its length is much less than the wavelength of the third harmonic of the clock. For example, the T-lines considered here display a phase velocity of about $2 \times 10^{8} \mathrm{~m} / \mathrm{s}$, yielding a wavelength of around 3.3 mm for the third harmonic of $f_{C K}=20 \mathrm{GHz}$. Thus line lengths up to a few hundred microns do not require $R_{L}$ $=Z_{0}$.


Fig. 6. (a) Half circuit of CMOS clock link, (b) half circuit of CML clock link, and (c) equivalent circuit of differential CMOS driver.


Fig. 7. (a) CMOS periodic response with 2-mm T-line at $f_{C K}=20 \mathrm{GHz}$, (b) CML periodic response with 2-mm T-line at $f_{C K}=20 \mathrm{GHz}$.

The second perspective is more precise and examines the clock link's step response for different values of $R_{L}$, seeking to determine whether the reflections from the load and source impedances distort the received waveform. Fig. 9(a) plots the results for a CMOS link using a stacked M9/M8 T-line with no ground plane and $l=500 \mu \mathrm{~m}$ or $l=1 \mathrm{~mm}$. We have $R_{L}=$


Fig. 8. Periodic response of CMOS and CML clock links for a 5-mm T-line at $f_{C K}=20 \mathrm{GHz}$.

(a)

(b)

Fig. 9. (a) CMOS step response with load impedance of $Z_{0}$ and $2 Z_{0}$, (b) CML step response with load impedance of $Z_{0}$ and $2 Z_{0}$.
$Z_{0}$ or $R_{L}=2 Z_{0}$, observing that only some distortion occurs for $l<500 \mu \mathrm{~m}$. Figure 9(b) repeats the experiment for a CML link and leads to the same conclusion. We should remark that the CML driver's tail current is halved here. Thus, for lengths up to about $500 \mu \mathrm{~m}$, these T -lines can tolerate a termination of about $2 Z_{0}$.

## VI. Conclusion

This paper offers a methodical approach to the design of high-speed on-chip clock links. Two key points emerging from this study are (1) the optimum T-line structure consists of an M9 or M9/M8 traces with no ground plane, and (2) for frequencies up to 20 GHz and line lengths up to 5 mm , CMOS signaling is advantageous with respect to CML designs.

## ACKNOWLEDGMENT

This work is supported by Realtek Semiconductor.

## REFERENCES

[1] S Chen, L Zhou, I Zhuang, J Im, D Melek, J Namkoong, M Raj, J Shin, Y Frans, K Chang, "A 4-to-16GHz Inverter-Based InjectionLocked Quadrature Clock Generator with Phase Interpolators for MultiStandard I/Os in 7nm FinFET," IEEE ISSCC Dig. Tech. Papers, pp. 391-393, Jan./Feb. 2018.
[2] Susnata Mondal, and Jeyanandh Paramesh "A Reconfigurable 28-/37GHz MMSE-Adaptive Hybrid-Beamforming Receiver for Carrier Aggregation and Multi-Standard MIMO Communication," IEEE J. SolidState Circuits, Vol. 54, no. 5, May. 2019.
[3] D Chung, C Ryu, H Kim, C Lee, J Kim, K Bae, J Yu, H Yoo, "ChipPackage Hybrid Clock Distribution Network and DLL for Low Jitter Clock Delivery," IEEE J. Solid-State Circuits, Vol. 41, no. 1, pp. 274286, Sept. 2006.
[4] T Wu, F Aryanfar, H-Chang Lee, J Shen, T Chin, C Werner, K Chang, " Low-Skew Clock Distribution Using Zero-PhaseClock-Buffer DLLs," IEEE ISSCC Dig. Tech. Papers, pp. 176-178, Jan./Feb. 2010.
[5] M Sasaki, "A High-Frequency Clock Distribution Network Using Inductively Loaded Standing-Wave Oscillators," IEEE J. Solid-State Circuits, Vol. 44, no. 10, pp. 2800-2807, October. 2009.
[6] R K Nandwana, S Saxena, A Elkholy, M Talegaonkar, J Zhu, W-S Choi1, A Elmallah, P K Hanumolu1, " A 3-to-10Gb/s 5.75pJ/b Transceiver with Flexible Clocking in 65 nm CMOS," IEEE ISSCC Dig. Tech. Papers, pp. 492-495, Jan./Feb. 2017.
[7] K A Bowman, C Tokunaga, T Karnik, V K. De, and J W. Tschanz, "A 22 nm All-Digital Dynamically Adaptive Clock Distribution for Supply Voltage Droop Tolerance," IEEE J. Solid-State Circuits, Vol. 48, no. 4, pp. 907-915, April. 2013.
[8] L G. Salem, P P. Mercier, " A 0.4-to-1V 1MHz-to-2GHz SwitchedCapacitor Adiabatic Clock Driver Achieving 55.6\% Clock Power Reduction," IEEE ISSCC Dig. Tech. Papers, pp. 442-444, Jan./Feb. 2017.
[9] S Shahramian, S P. Voinigescu, A C Carusone, "A 35-GS/s, 4-Bit Flash ADC With Active Data and Clock Distribution Trees," IEEE J. SolidState Circuits, Vol. 44, no. 6, pp. 1709-1720, April. 2009.
[10] D Chung, C Ryu, H Kim, C Lee, J Kim, K Bae, J Yu, H Yoo, and J Kim, "Chip-Package Hybrid Clock Distribution Network and DLL for Low Jitter Clock Delivery," IEEE J. Solid-State Circuits, Vol. 41, no. 1, pp. 274-286, January. 2006.

