# An Inductorless 20-Gb/s CDR With High Jitter Tolerance

Long Kong<sup>(D)</sup>, Member, IEEE, Yikun Chang<sup>(D)</sup>, Student Member, IEEE, and Behzad Razavi<sup>(D)</sup>, Fellow, IEEE

тχ

Abstract—A full-rate clock and data recovery loop employs a three-stage ring voltage-controlled oscillator, a master-slave passive sampler as both a phase detector and a filter, and a new flip-flop to achieve a loop bandwidth of 170 MHz. Implemented in 45-nm CMOS technology, the circuit occupies an area of 14  $\mu$ m  $\times$  26  $\mu$ m and exhibits a jitter tolerance of 2 UI at 5 MHz and a recovered clock jitter of 459 fs with  $2^{31} - 1$  pseudorandom bit sequence.

Index Terms-Clock and data recovery (CDR), flip-flop, jitter tolerance, master-slave passive sampler, phase detector (PD).

#### I. INTRODUCTION

ULTI-LANE wireline transceivers must deal with the L large footprint of their building blocks, and hence the problem of clock distribution over long interconnects. It is, therefore, desirable to avoid structures, such as inductors and capacitors, in the design. One function often requiring both is clock and data recovery (CDR), as it relies on LC oscillators for low clock jitter and large loop filter capacitors for small pattern-dependent jitter. CDR circuits also face a stringent tradeoff between pattern-dependent jitter and jitter tolerance.

This paper introduces a CDR architecture [1] that: 1) achieves a wide bandwidth; 2) greatly suppresses the voltage-controlled oscillator (VCO) phase noise; 3) allows the use of a ring oscillator; 4) achieves, in principle, zero pattern-dependent jitter; 5) requires a very small loop filter; 6) provides a high jitter tolerance; 7) obviates the need for a frequency detector; and 8) consumes much less power in its phase detector (PD) than the prior art. In addition, we present a new flip-flop topology that can operate at high speeds with low power consumption. These new concepts are demonstrated in a 20-Gb/s prototype realized in 45-nm CMOS technology, achieving a loop bandwidth of 170 MHz and a jitter tolerance of 2 unit intervals (UIs).

## II. MOTIVATION

CDR design typically follows one of the two approaches: 1) a dedicated circuit, including a VCO and a phase detector [2], is used within each lane or 2) the transmit (TX)

The authors are with the Department of Electrical Engineering, University of California at Los Angeles, Los Angeles, CA 90095-1594 USA (e-mail: longkong@ucla.edu).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2019.2930899

PLL RX RX D Q CDR PI/CDR PI/CDR CDR (a) (b) Fig. 1. (a) VCO-based CDR. (b) PI-based CDR.

ΤХ

phase-locked loop (PLL) generates fixed quadrature phases that are distributed to the lanes and applied to local phase interpolators (PIs) and delay-locked loops (DLLs) [3], [4]. Depicted in Fig. 1(a), the former approach entails three issues, namely, the large CDR footprint, noise coupling from the TX output data to each CDR's VCO, and possible injection pulling among the VCOs themselves. The use of spiral inductors exacerbates all three issues. The latter approach, illustrated in Fig. 1(b), suffers from long interconnects for the TX PLL I and Q signals as well as a potentially high jitter due to the finite resolution of the PIs. Both approaches also face challenges if the system must support different data rates because the LC VCO in the CDRs or in the TX PLL must achieve a wide tuning range.

The difficulties with the former approach are considerably eased if the CDRs can operate with ring oscillators since their footprint and unwanted coupling are smaller and their tuning range is much wider. This observation motivates us to seek a CDR architecture that achieves a wide loop bandwidth. We also expect that such a solution would afford a high jitter tolerance and a small loop filter.

## III. BACKGROUND

In its fundamental form, a CDR circuit locks a VCO output, CK, to the input random data,  $D_{in}$ . A non-returnto-zero (NRZ) data sequence at a bit rate of  $1/T_b$  has a sinc<sup>2</sup> spectrum with nulls at integer multiples of  $1/T_b$ . Since the spectrum does not contain an impulse (a tone) at this frequency, it is not possible to phase-lock an oscillator directly to the data.

0018-9200 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Manuscript received February 12, 2019; revised June 2, 2019; accepted July 20, 2019. Date of publication August 12, 2019; date of current version September 24, 2019. This paper was approved by Associate Editor Azita Emami. This work was supported by Realtek Semiconductor. (Corresponding author: Long Kong.)





Fig. 2. (a) Edge detector waveform. (b) Autocorrelation function. (c) Conceptual CDR topology. (d) Band–band PD.

As the VCO requires a certain dc voltage at its control, we can say that the phase detector must generate a proper average value in response to the phase difference,  $\Delta\phi$ , between CK and  $D_{in}$ . Alternatively, we can observe that a PD must produce a periodic tone if the VCO frequency and the data rate are unequal. This latter view allows us to develop a new phase detector.

The foregoing perspective shows which PD structures function properly with random data. For example, a mixer that simply multiplies CK by  $D_{in}$  fails because the spectrum of  $D_{in}$  does not contain a tone at  $1/T_b$ , and neither does the spectrum of CK  $\times D_{in}$ . As a simple remedy, we can apply  $D_{in}$ to an edge detector so as to create a tone at  $1/T_b$  [Fig. 2(a)]. An edge detector is equivalent to a differentiator followed by a full-wave rectifier. To see why  $D_{ed}$  provides such a tone, we first compute its autocorrelation,  $R_{ed}(\tau)$ , as

$$R_{\rm ed}(\tau) = \int_{-\infty}^{\infty} D_{\rm ed}(t) D_{\rm ed}(t-\tau) dt.$$
(1)

We observe that the integrand is zero except when  $D_{\rm ed}(t)$  and  $D_{\rm ed}(t - \tau)$  have overlap, i.e., when  $\tau$  is in the vicinity of  $T_b$ . Thus, as shown in Fig. 2(b),  $R_{\rm ed}(\tau)$  is periodic, and its Fourier transform (the spectrum of  $D_{\rm ed}$ ) exhibits impulses at  $1/T_b$ ,  $2/T_b$ , and so on.

We can now mix  $V_{ed}$  with the VCO output, noting that the result,  $V_{PD}$ , contains a tone at  $|f_{CK} - 1/T_b|$  [see Fig. 2(c)], which modulates the VCO frequency and enables the loop to lock.

As another example, let us direct our analysis to a master–slave D flip-flop acting as a bang–bang PD [see Fig. 2(d)]. In this case, the rising (or falling) edges of the data sample the clock. Equivalently,  $D_{in}$  is subjected to differentiation and half-wave rectification so as to provide a sampling signal. Using the autocorrelation approach described earlier, one can show that the spectrum of the differentiated



Fig. 3. New model for bang-bang PD.



Fig. 4. (a) Master-slave sampler as PD and (b) its waveform.

and half-wave-rectified data contains a tone at  $1/T_b$ , which translates to another at  $|f_{CK} - 1/T_b|$  in the flip-flop output.

For our architecture development, we introduce a new model for a bang–bang PD. Rather than considering the digital levels, we view the function as an analog master–slave sampler followed by a 1-bit quantizer (see Fig. 3). This representation allows an analog waveform at CK, while still producing digital levels in  $V_{\text{out}}$ , revealing that the bang–bang-induced CDR jitter fundamentally originates from the quantization noise of the quantizer. This explains why CDRs employing a D-flip-flop PD suffer from a high pattern-dependent jitter.

#### **IV. PROPOSED PHASE DETECTOR**

#### A. Basic Idea

Let us return to the bang-bang PD model shown in Fig. 3 and ask whether it is possible to avoid the quantization noise. In other words, we surmize that a master-slave analog sampler can also act as a PD [see Fig. 4(a)]. The circuit retains its data differentiation and half-wave rectification properties, generating a tone at the output if  $f_{CK} \neq 1/T_b$ . But another important attribute of this topology emerges when the phase difference between  $V_{CK}$  and  $D_{in}$  is constant: the output voltage can assume the constant value necessary for the VCO with no ripple—at least in principle. To appreciate this point, suppose the VCO requires a control voltage equal to  $V_0$ . As shown in Fig. 4(b), the CDR loop adjusts the phase difference such that the PD samples a value of exactly  $V_0$ . We will examine the details of the operation below, but the key point here is that the master-slave sampler acts as both a PD and a low-pass filter.

We should remark that, in [5], we have proposed a cascade consisting of an XOR gate and a master–slave sampler for the case of periodic signals (e.g., in RF synthesizers). The sampler circuit shown in Fig. 4(a), on the other hand, can operate with both periodic signals and random data.

#### B. Proposed PD Analysis

We analyze the proposed PD in the time and frequency domains so as to illustrate its properties. Returning to Fig. 4, we choose the time constant associated with  $S_1$  and  $C_1$ ,  $\tau_1$ , approximately equal to  $0.19T_b$ . When  $D_{in}$  rises and  $S_1$  turns on,  $V_1$  begins to track  $V_{CK}$ , but sluggishly. When  $D_{in}$  falls at  $t = t_b$ , the instantaneous value of  $V_1$  is stored on  $C_1$ . At the same time,  $S_2$  turns on, impressing this voltage on  $C_2$ . For a constant phase difference, the voltages on  $C_1$  and  $C_2$  settle to that required by the VCO,  $V_0$ , and no charge sharing occurs thereafter.

When placed in a CDR loop, the above PD forces the phase difference,  $\Delta \phi$ , to reach a value that gives  $V_1 = V_0$  at  $t = t_b$ . We observe that a shorter time constant in the master sampler leads to a sharper change from  $t_a$  to  $t_b$ , yielding a higher gain for the PD, and vice versa. In practice, the choice of  $\tau_1$  is limited by the clock feedthrough and charge injection, imposing a minimum acceptable value for  $C_1$  (= 10 fF in this paper).

Now let us derive the PD gain. The voltage on  $C_1$  at  $t_b$  can be calculated from the first-order *RC* equation

$$V_{1} = V_{\text{DD}} \left[ 1 - \exp\left(-\frac{T_{b}}{2\tau_{1}}\right) \right] \exp\left(-\frac{\Delta\phi T_{b}}{2\pi\tau_{1}}\right) + V_{1} \exp\left(-\frac{T_{b}}{\tau_{1}}\right)$$
(2)

which can be manipulated as

$$V_1 = \frac{V_{\text{DD}}}{1 + \exp\left(-\frac{T_b}{2\tau_1}\right)} \exp\left(-\frac{\Delta\phi T_b}{2\pi\tau_1}\right).$$
 (3)

The derivative of  $V_1$  with respect to  $\Delta \phi$  yields the gain

$$\left|\frac{dV_1}{d\Delta\phi}\right| = \frac{T_b}{2\pi\tau_1} \cdot \frac{V_{\text{DD}}}{1 + \exp\left(-\frac{T_b}{2\tau_1}\right)} \exp\left(-\frac{\Delta\phi T_b}{2\pi\tau_1}\right)$$
$$= \frac{T_b}{2\pi\tau_1} \cdot V_1. \tag{4}$$

For a control voltage of 0.3 V, the PD gain amounts to around 0.26 V/rad.

The choice of  $C_2$  in Fig. 4(a) entails a tradeoff between the PD's bandwidth and undesirable effects such as charge injection, clock feedthrough, and kT/C noise. We select  $C_2 = C_1$  and return to this point in Section IV-D.

The frequency-domain behavior of the proposed PD offers additional insights. In Fig. 4(a), as far as  $V_{CK}$  and  $V_{PD}$  are concerned, the topology acts as a zero-order hold (ZOH) (i.e., an ideal sample-and-hold circuit), except that the sampling of  $V_{CK}$  occurs randomly and according to the falling edges of  $D_{in}$ . For simplicity, we assume that  $V_{CK}$  resembles a sinusoid, arriving at the waveforms shown in Fig. 5(a). The ZOH PD output can be viewed as the convolution of two functions: first

$$x(t) = V_{\rm CK}(t) \sum_{m=-\infty}^{+\infty} a_m \delta[t - m(2T_b)]$$
(5)



Fig. 5. (a) Proposed PD waveforms. (b) x(t) and  $\Pi(t)$  waveforms. (c) x(t) waveform for a constant input phase difference. (d) Spectrum of PD output.

where  $a_m$  is equal to 1 on the falling edges of  $D_{in}$  and 0 otherwise, and second

$$\Pi(t) = 1 \quad 0 < t < 2T_b = 0 \quad t < 0, t > 2T_b.$$
(6)

These functions are shown in Fig. 5(b). Let us now assume a constant phase difference between  $V_{\text{CK}}$  and  $D_{\text{in}}$ , noting that in

$$x(t) = \sum_{m=-\infty}^{+\infty} a_m V_{\rm CK}[t - m(2T_b)]\delta[t - m(2T_b)]$$
(7)

 $V_{\text{CK}}[t - m(2T_b)]$  is constant (because the impulses sample the same point on  $V_{\text{CK}}$ ) [see Fig. 5(c)]. The autocorrelation of this function is a periodic series of impulses with a period of  $2T_b$ . The spectrum, therefore, consists of impulses at integer multiples of  $1/(2T_b)$ . The convolution of x(t)with  $\Pi(t)$  causes this spectrum to be multiplied by a sinc<sup>2</sup> envelope that exhibits nulls at  $1/(2T_b)$ ,  $2/(2T_b)$ , and so on. Plotted in Fig. 5(d), the result contains only an impulse at f = 0, corresponding to the dc value at the PD output. This analysis confirms that, in the locked condition, the proposed PD generates no time-varying components and hence no ripple. We expect that this PD fundamentally alters the tradeoff between pattern-dependent jitter and jitter tolerance.

### C. Capture Range

As mentioned in Section III, a PD sensing different input frequencies must generate a tone at the frequency difference.



Fig. 6. Simulated (a) PD output with  $1/T_b = 20$  Gb/s and  $f_{CK} = 19$  GHz and (b) closed-loop control voltage.

The proposed PD delivers a *large* tone in such a case, thereby providing a wide capture range for the CDR and eliminating the need for frequency acquisition. To examine this property, we simulate the PD with  $1/T_b = 20$  Gb/s and  $f_{CK} = 19$  GHz in open-loop condition and obtain the output waveform plotted in Fig. 6(a). The beat component exhibits a peak-to-peak swing of 400 mV, which heavily modulates the VCO in a CDR environment and drives it toward lock. Fig. 6(b) plots the closed-loop control voltage while the VCO frequency begins at 19 GHz and locks at 20 GHz. In practice, our coarse tuning brings the VCO frequency to within 500 MHz of the desired value, hence requiring only a capture range commensurate with such an error. For this reason, our CDR does not require frequency detection.

## D. PD Transfer Function

Unlike conventional PDs, the topology in Fig. 4(a) provides low-pass filtering in addition to phase detection. For CDR loop analysis, we must compute the PD's transfer function, which signifies how slow or fast phase fluctuations in  $D_{in}$  or  $V_{CK}$ translate to a sampled value at  $V_1$  and  $V_{PD}$ .

To this end, we apply a phase-modulated clock of the form

$$V_{\rm CK}(t) = (V_{\rm DD}/2) \cos [\omega_0 t + a \cos(\omega_m t)] + V_{\rm DD}/2 \quad (8)$$

while  $D_{in}$  has no phase modulation, and examine the change in  $V_{PD}$ . We expect that for low values of  $\omega_m$ , the transfer function is simply given by  $K_{PD}$ , where  $K_{PD}$  was found in Section IV-B to be 0.26 V/rad. As  $\omega_m$  increases, the change in  $V_{PD}$  begins to diminish due to the circuit's low-pass action.

The analysis of the actual PD circuit with a phase-modulated input proves difficult. We instead consider the operation from an intuitive angle so as to arrive at an empirical result. We make two observations. First, switch  $S_1$  in Fig. 4(a) performs a mixing action, downconverting the phase-modulated clock. Second, the network consisting of  $S_1$ ,  $C_1$ , and  $S_2$ can be approximated by a continuous-time resistance,  $R_{eq}$ .



Fig. 7. Equivalent PD structure.



Fig. 8. Simulated (a) frequency response and (b) phase response of PD.

The value of  $R_{eq}$  would be given by  $1/(f_1C_1)$  if  $S_1$  periodically turned on and off at a rate of  $f_1$ . In our case,  $S_1$  is driven by a random data sequence with equal ONE and ZERO probabilities. The frequency is determined by the zero-toone transition density, which is always around one quarter for different pseudorandom bit sequence (PRBS) patterns. Therefore, the effective sampling frequency is  $1/(4T_b)$ . Thus, for time scales much larger than  $T_b$ ,  $R_{eq} \approx [1/(4T_b)]^{-1}C_1^{-1}$ . These thoughts lead to the equivalent structure shown in Fig. 7, where the phase-modulated clock is first downconverted and then subjected to a low-pass filter. (Switch  $S_2$  performs mixing as well, but it also passes a fraction of  $V_1$  without frequency translation.) We, therefore, predict a one-pole response with a -3-dB bandwidth given by  $\omega_0 = 1/(R_{eq}C_2) =$  $(4T_b)^{-1}(C_1/C_2)$ . Interestingly, this bandwidth is independent of process, voltage, and temperature (PVT) conditions.

Based on these observations, we express the PD's transfer function as

$$\frac{V_{\text{out}}}{\phi_{\text{in}}}(s) = \frac{K_{\text{PD}}}{1 + \frac{C_2}{C_1} \cdot (4T_b) \cdot s}.$$
(9)

To verify the accuracy of this circuit, we simulate the PD with random data and a phase-modulated clock. We then plot the frequency response of the PD (see Fig. 8). Here,  $D_{in}$  is a PRBS of  $2^{11}-1$ , the ON-resistance of  $S_1$  and  $S_2$  is 770  $\Omega$ , and  $C_1 = C_2 = 10$  fF. We observe the first-order behavior with a -3-dB bandwidth of 530 MHz, whereas  $(4T_b)^{-1}(C_1/C_2)$  yields 796 MHz. The difference is due to the incomplete settling of  $V_1$ . The response is relatively independent of the



Fig. 9. PD model with leakage currents.

data pattern so long as ONEs and ZEROs occur with equal probabilities.

#### E. Power Consumption

Due to its passive nature, the proposed PD consumes much less power than conventional topologies do. In the locked condition,  $C_1$  and  $C_2$  in Fig. 4(a) experience little voltage change, and, therefore,  $S_1$  and  $S_2$  consume negligible power. The principal power drain is associated with any buffers necessary to drive the gates of  $S_1$  and  $S_2$ . In this paper, the total gate capacitance of these switches including the layout parasitics is around 12 fF, translating to  $P = f C V_{\text{DD}}^2 = 0.12$  mW with  $f = 1/(2T_b)$  and  $V_{\text{DD}} = 1$  V. By comparison, even a single conventional flip-flop acting as a bang–bang PD would draw several milliwatts at these rates.

## F. Imperfections

The PD of Fig. 4(a) entails three imperfections that trade with the values of  $C_1$  and  $C_2$  and can potentially degrade the performance. First, the leakage currents associated with the master and slave samplers introduce jitter. As shown in Fig. 9, these leakages,  $I_{L1}$  and  $I_{L2}$ , cause  $V_{\text{cont}}$  to droop when  $\overline{D_{\text{in}}}$  is high, i.e., during a long sequence of consecutive zeros. In our CDR design,  $I_{L2}$  also includes the leakage of the varactors used in the VCO (Section VI-A). The droop in  $V_{\text{cont}}$  is given by

$$\Delta V_{\rm cont1} = \frac{I_{L1} + I_{L2}}{C_1 + C_2} t \tag{10}$$

and the resulting output phase drift by

$$\Delta \phi = K_{VCO} \int_0^{nT_b} \Delta V_{\text{contl}} dt \tag{11}$$

where  $nT_b$  denotes the maximum run length. It follows that the output jitter is equal to:

$$\Delta t_1 = \frac{I_{L1} + I_{L2}}{C_1 + C_2} \frac{n^2 T_b^3 K_{\text{VCO}}}{4\pi}.$$
 (12)

In this paper,  $I_{L1} + I_{L2} = 20$  nA,  $T_b = 50$  ps, and  $K_{\rm VCO} = 800$  MHz/V, yielding  $\Delta t = 10^{-30} n^2 / (C_1 + C_2)$ . For example, if  $C_1 = C_2 = 10$  fF, then  $\Delta t = 45$  fs<sub>pp</sub> for a run length of 30 bits.

The second imperfection relates to the charge injection and clock feedthrough of  $S_1$  and  $S_2$ . When  $S_1$  turns off,  $V_1$ experiences a pedestal and when  $S_2$  turns on, this pedestal is partially removed. However, when  $S_2$  turns off, a pedestal,  $\Delta V_{\text{cont}2}$ , appears at the output and causes the VCO to drift as long as  $\overline{D_{\text{in}}}$  is low. In this case, the output jitter is given by

$$\Delta t_2 = \frac{\Delta V_{\text{cont}2} K_{\text{VCO}}(nT_b^2)}{2\pi}.$$
(13)



Fig. 10. CDR architecture with proposed PD.

According to simulations,  $\Delta V_{\text{cont2}} = 4 \text{ mV}$ , and hence,  $\Delta t_2 = 240 \text{ fs}_{pp}$  for n = 30.

The ripple introduced by the leakage and the switch charge injection and clock feedthrough can be suppressed by interposing a simple *RC* filter between the PD and the VCO. This paper does not include such a filter as  $\Delta t_1$  and  $\Delta t_2$  are sufficiently small.

The third imperfection in Fig. 4(a) is the kT/C noise. When  $S_1$  turns off,  $C_1$  holds an rms noise voltage equal to  $(kT/C_1)^{1/2}$ , which is then charge-shared with  $C_2$  when  $S_2$  turns on, creating a noise voltage of  $(kT/C_1)^{1/2}C_1/(C_1+C_2)$  on  $C_2$ . Upon turning off,  $S_2$  deposits its own noise on  $C_2$  in the form of  $(kT/C_2)^{1/2}$ . Squaring and summing these two values, we have

$$\overline{V_{\text{cont},n}^2} = \frac{\text{kTC}_1}{(C_1 + C_2)^2} + \frac{\text{kT}}{C_2}.$$
(14)

If the switches turned on and off periodically, we would spread this noise power across a bandwidth equal to the switching frequency. However, in our case, the switching is random.

Nevertheless, with a data rate of 20 Gb/s, we still expect that the kT/C noise is spread by a large factor. Simulations of the PD indicate that its output noise voltage has a spectral density of  $10^{-16}$  V<sup>2</sup>/Hz. The resulting phase noise at an offset of  $\Delta f$  is given by the product of this density and  $K_{\rm VCO}^2/(2\pi \Delta f)^2$ , e.g., -101.8 dBc/Hz at 1-MHz offset. This noise is suppressed within the CDR circuit's loop bandwidth, contributing negligible jitter.

#### V. CDR ARCHITECTURE

The phase detector proposed in Section IV-A can form a simple and compact CDR circuit along with a VCO and a data-retiming flip-flop. Shown in Fig. 10 is such a system. Due to the PD's finite gain, this architecture exhibits a type-I transfer function and hence a finite phase offset.

#### A. Loop Dynamics

Most of the advantages of the proposed architecture stem from its potentially wide bandwidth. Thus, we first analyze the dynamics of the CDR loop, seeking the maximum loop BW that it can provide. From (9), we obtain the open-loop transfer function as

$$\frac{\phi_{\text{out}}}{\phi_{\text{in}}} \mid_{\text{open}} (s) = \frac{K_{\text{PD}}}{1 + \frac{s}{\omega_0}} \cdot \frac{K_{\text{VCO}}}{s}.$$
(15)



Fig. 11. Simulated control voltage for PRBS lengths of 11 and 23, respectively.



Fig. 12. Simulated jitter transfer.

To formulate the phase margin (PM), we equate the magnitude of (15) to unity, arriving at

$$2\omega_u^2 = -\omega_0^2 + \omega_0^2 \sqrt{1 + \frac{4K_{\rm PD}^2 K_{\rm VCO}^2}{\omega_0^2}}.$$
 (16)

Since the second term in the radical is about 0.6 in this design, we can write  $(1 + \epsilon)^{1/2} \approx 1 + \epsilon/2$  and hence

$$\omega_u \approx K_{\rm PD} K_{\rm VCO} \approx 2\pi \,(210 \text{ MHz}). \tag{17}$$

The PM is estimated by finding the phase of (15) at  $\omega_u$ 

$$PM = \frac{\pi}{2} - \tan^{-1}\frac{\omega_u}{\omega_0} \tag{18}$$

which amounts to 68° for  $\omega_0 = 2\pi$  (530 MHz). Fig. 11 plots the simulated VCO control voltage as a function of time for PRBS lengths of 11 and 23, indicating well-behaved settling.

With the loop parameters mentioned above, we can construct the CDR's jitter transfer as well. Here, we apply a phase-modulated PRBS and vary the modulation frequency. Fig. 12 plots the result, indicating a -3-dB bandwidth of about 230 MHz. According to simulations, this response changes negligibly for PRBS patterns from  $2^7 - 1$  to  $2^{23} - 1$ .



Fig. 13. Modified CDR architecture.



Fig. 14. Complete CDR architecture.

Due to its type-I nature, the CDR circuit if Fig. 10 exhibits a static phase offset,  $\Delta t$ , that varies in proportion to  $V_{\text{cont}}$ . This issue manifests itself in extreme PVT corners: to tune the oscillation frequency to 20 GHz,  $V_{\text{cont}}$  must be relatively high in one corner and relatively low in another.

With the aid of the waveforms in Fig. 4(b), we can express  $\Delta t = [\Delta \phi/(2\pi)]T_b$  as

$$\Delta t = \tau_1 \ln \frac{V_{\rm DD}}{\left[1 + \exp\left(-\frac{T_b}{2\tau_1}\right)\right] V_{\rm cont}}.$$
 (19)

If  $V_{\text{cont}}$  ranges from  $0.2V_{\text{DD}}$  to  $0.8V_{\text{DD}}$ , then  $\Delta t$  falls from 0.29 to 0.03 UI. This variation degrades the clock PM and the jitter tolerance. Let us recognize that the falling edge of  $V_{\text{CK}}$  in Fig. 4(b) always leads that of  $D_{\text{in}}$ . We wish to introduce an additional delay in the retimer's clock path to compensate for  $\Delta t$ . To achieve a well-defined delay, we note that the ring VCO provides three output phases and can readily establish a 120° (0.33 UI) phase difference between the clocks applied to the PD and the retimer. These thoughts lead to the modified architecture shown in Fig. 13, where B is delayed with respect to A by 0.33 UI. We also redesign the flip-flop so that it samples on the falling edge of B. As a result, the net phase offset seen by the FF is now equal to

$$\Delta t_{\rm FF} = \Delta t + 0.33 \,\,{\rm UI} - 0.5 \,\,{\rm UI} \tag{20}$$

which varies from -0.14 to 0.12 UI.

#### B. Complete CDR Design

Fig. 14 shows the CDR circuit realized in this paper. The loop incorporates a three-stage ring oscillator with varactor tuning (and discrete, switched-capacitor tuning). To reduce the kickback noise from  $D_{in}$  to the VCO through the retimer,



Fig. 15. One stage of ring VCO.

a buffer follows node B. Also, a 1-to-2 demultiplexer (DMUX) and a divide-by-2 stage are added to reduce the output speed and facilitate testing.

The proposed CDR architecture offers the following advantages.

- 1) It fundamentally alters and eases the tradeoff between pattern-dependent jitter and jitter tolerance.
- 2) It achieves a bandwidth of 230 MHz and a settling time of about 10 ns (see Fig. 11).
- 3) It suppresses the VCO phase noise for offset frequencies as high as 200 MHz, allowing the use of a low-power ring oscillator (Section VI-A).
- By the virtue of the PD's inherent filtering, it does not require large capacitors, lending itself to a compact implementation.
- 5) It achieves a sufficiently wide capture range to obviate the need for a frequency detector.

According to simulations, the CDR still locks if directly driven by a channel that has a maximum loss of 8 dB at Nyquist. In this case, the capture range remains the same. In practice, the CDR is preceded by a linear equalizer to accommodate greater channel losses.

#### VI. BUILDING BLOCKS

The performance of the proposed circuit is determined by the PD, the VCO, and the retiming flip-flop. In this section, we focus on the last two.

## A. VCO Design

The VCO is configured as a three-stage ring with varactor tuning for fine control and programmable capacitors for coarse control. Shown in Fig. 15 is one stage along with the device dimensions. As explained in [5], varactor tuning proves superior to other tuning techniques, such as starved inverters, as it negligibly degrades the phase noise across the tuning range. The VCO fine control has a gain of 800 MHz/V. The coarse control allows tuning from 18.6 to 21.6 GHz.

The VCO draws a power of 1.8 mW at 20 GHz. Fig. 16 plots the simulated free-running VCO phase noise together with the PD's contribution and the overall closed-loop phase noise. Due to its low power consumption, the VCO exhibits high phase noise, as high as -62 dBc/Hz at 1-MHz offset. Moreover, due to the small transistor dimensions, flicker noise upconversion dominates for offsets up to about 200 MHz. Nonetheless, the CDR bandwidth dramatically suppresses this noise, yielding an integrated rms jitter of 120 fs from 100 kHz to 1 GHz.



Fig. 16. Simulated free-running VCO, passive sampler, and overall CDR phase noise.



Fig. 17. (a) Conventional TSPC flip-flop and (b) simulated eye diagram of node X.

## B. Flip-Flop Design

The retiming flip-flop and the divider-by-2 circuit in Fig. 14 operate at 20 GHz, potentially consuming a high power in both the data path and the clock path. For example, a current-steering structure designed for this speed drains about 3 mW, plus several milliwatts for CMOS level conversion.

The retiming FF must deal with another issue related to setup times for the ZERO-to-ONE and ONE-to-ZERO transitions in the data input. If these setup times are unequal, the jitter tolerance degrades by their difference.

A low-power FF can be realized by the true single-phase clocking (TSPC) topology [6]. Shown in Fig. 17(a) is the TSPC flip-flop with "split" outputs [6], which introduces only two transistors in the clock path. Due to the degraded logical levels at X and Y, this circuit begins to fail above approximately 8 GHz in 45-nm CMOS technology. Fig. 17(b) plots the simulated eye diagram at X, revealing a degradation of about 400 mV in the low level that slows down the ZERO-to-ONE transition at Y considerably. We should remark that this effect also raises the circuit's phase noise, an issue that proves critical in applications such as RF synthesis.

The TSPC FF also fares poorly in terms of the second issue mentioned above: the design in Fig. 17(a) exhibits  $t_{s1} = 28$  ps and  $t_{s2} = 15$  ps, degrading the jitter tolerance by about 13 ps.



Fig. 18. (a) Proposed retimer and (b) simulated eye diagram of node X.



Fig. 19. Output frequency versus input frequency of the divide-by-2 circuit.

To overcome these drawbacks, we introduce the flip-flop shown in Fig. 18(a), where the clocked devices are realized as complementary switches driven by CK and  $\overline{CK}$ . With rail-to-rail swings at all of the nodes within the circuit, we expect a greater speed. Indeed, as the simulated eye diagram in Fig. 18(b) suggests, the proposed FF, with its layout parasitics, runs up to 25 GHz. This dramatic improvement accrues at the cost of one inverter that is necessary to generate  $\overline{CK}$ , but the power penalty is only 170  $\mu$ W at 20 GHz.

The proposed FF also benefits from nearly equal setup times:  $t_{s1} = 17$  ps and  $t_{s2} = 19$  ps. Thus, the jitter tolerance is negligibly affected.

The new FF proves useful in divider design as well. To evaluate the maximum speed, we plot in Fig. 19 the output frequency versus the input frequency of a divide-by-2 circuit using the TSPC and the proposed topologies. The latter achieves a fourfold speed improvement.

#### VII. EXPERIMENTAL RESULTS

The proposed CDR circuit has been fabricated in TSMC's 45-nm CMOS technology. The die photograph is shown in Fig. 20 with an active area of 14  $\mu$ m × 26  $\mu$ m. This is about two orders of magnitude less than that of prior art. The prototype has been tested on a high-speed probe station with a 1-V supply. Excluding the DMUX and the divide-by-2 stage, the CDR circuit consumes 3 mW: 1.8 mW in the VCO, 0.22 mW in the retimer, and 1 mW in the PD and the buffers.



Fig. 20. Die micrograph.



Fig. 21. Test setup.



Fig. 22. (a) Input data eye (10 ps/div. and 63.7 mV/div.). (b) Measured recovered data eye (20 ps/div. and 69.5 mV/div.).

Fig. 21 shows the setup. An RF signal generator (the Agilent E8257D) applies a 10-GHz clock to a bit-error-rate tester (BERT) (the N4903B). The BERT delivers two 10-Gb/s data streams to a multiplexer (the M8061A) so as to generate data at 20 Gb/s. The prototype's output is then returned to the BERT for error characterization. The measured VCO tuning range is from 18 to 21 GHz. During each measurement, the capacitance code is predetermined such that the VCO reaches 20 GHz. This can be accomplished for two different coarse settings (due to overlap), and in all two cases, the CDR locks. This indicates that the lock range is at least 500 MHz. There is no need for initializing  $V_{\rm cont}$  before the loop locks.

Fig. 22 plots the measured input data eye at 20 Gb/s and the recovered data eye at 10 Gb/s. The quality of data recovery can be assessed only by the jitter tolerance, as presented in the following.

Fig. 23 plots the measured spectrum and eye diagram of the recovered clock. Most of the jitter observed in the latter arises from the oscilloscope's sampling and trigger heads.<sup>1</sup> To determine the actual jitter, we plot the phase noise

<sup>&</sup>lt;sup>1</sup>The non-50% duty cycle is due to poor pull-down strength of the open-drain PMOS output buffer.



Fig. 23. (a) Measured recovered clock spectrum. (b) Eye diagram (10 ps/div. and 51.6 mV/div.).



Fig. 24. Measured recovered clock phase noise.

in Fig. 24 and integrate from 100 Hz to 1 GHz, obtaining 460 fs<sub>rms</sub>. We should make two remarks here. First, due to our spectrum analyzer limitations, the recovered clock is applied to an off-chip divide-by-2 circuit for phase noise measurements. (This divide-by-2 circuit should not be confused with that in Fig. 14.) Thus, the plot in Fig. 24 should be raised by 6 dB to display the phase noise at 20 GHz. Second, these results are obtained with a PRBS of  $2^{31}$ -1, demonstrating that the CDR circuit can handle long CIDs.

The measured jitter transfer and jitter tolerance are plotted in Fig. 25. Both measurements are completed with a PRBS of  $2^7-1$ . The former indicates a loop bandwidth of around 170 MHz. The discrepancy with respect to the simulated value of 230 MHz is attributed to  $K_{PD}$  variations. In the prototype,



Fig. 25. Measured (a) jitter transfer and (b) jitter tolerance.

TABLE I Performance Summary

|                                                | [7]                           | [8]   | [9]                          | [10]     | This<br>Work |
|------------------------------------------------|-------------------------------|-------|------------------------------|----------|--------------|
| Data Rate (Gb/s)                               | 25                            | 20    | 25                           | 28       | 20           |
| Oscillator Topology                            | LC                            | LC    | LC                           | N/A      | Ring         |
| Rec. Clock Jitter (ps)                         | 1.56                          | 0.407 | 0.254                        | N/A      | 0.459        |
| Jitter Tolerance<br>@ 5MHz (UI <sub>pp</sub> ) | 0.5                           | 0.9   | 0.52                         | 0.3      | 2.0          |
| Power (mW)                                     | 3.1<br>(excludes<br>1:2 DMUX) | 154   | 99<br>(excludes<br>2:5 DMUX) | 106.6    | 3.0          |
| Area (mm <sup>2</sup> )                        | 0.039                         | 0.854 | 2.47 *                       | 0.523 ** | 0.00036      |
| Technology (nm)                                | 65                            | 90    | 65                           | 28       | 45           |
| Supply (V)                                     | 1                             | 1.5   | 1.2                          | N/A      | 1            |

\* Includes two CDRs and one DMUX \*\* Includes pads

capacitor  $C_1$  is programmable by a factor of 2 so as to adjust the loop bandwidth. But in our measurements,  $C_1$  is always set to 10 fF. The jitter tolerance is around 8 UI<sub>pp</sub> at 1 MHz, 2 UI<sub>pp</sub> at 5 MHz, and 1 UI<sub>pp</sub> at 10 MHz. In this characterization, the input jitter amplitude is gradually increased to the point of a sharp rise in the bit error rate from  $< 10^{-14}$  to  $10^{-6}$ .

Table I compares the performance of our proposed CDR circuit with that of others in this range of data rates. We note that our jitter tolerance is four times that of [7] with a much smaller area and twice that of [8] with a much smaller area and much less power dissipation.

#### VIII. CONCLUSION

A CDR architecture is proposed that breaks the tradeoff between the loop bandwidth and pattern-dependent jitter, affording the widest CDR bandwidth reported. As such, the circuit can incorporate a ring oscillator while achieving high jitter tolerance with low power consumption. Other advantages include fast locking, a small footprint, and a wide capture range. A new high-speed, low-power flip-flop is also introduced.

#### ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle Program for chip fabrication.

#### REFERENCES

 L. Kong, Y. Chang, and B. Razavi, "A 14 μM×26 μM 20-Gb/s 3-mW CDR circuit with high jitter tolerance," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2018, pp. 271–272.

- [2] M. S. Jalali, A. Sheikholeslami, M. Kibune, and H. Tamura, "A reference-less single-loop half-rate binary CDR," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 2037–2047, Sep. 2015.
- C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H. Jackel, "A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2921–2929, Dec. 2006.
- [4] B. Abiri, R. Shivnaraine, A. Sheikholeslami, H. Tamura, and M. Kibune, "A 1-to-6Gb/s phase-interpolator-based burst-mode CDR in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. ISSCC Dig. Tech. Papers*, Feb. 2011, pp. 154–156.
- [5] L. Kong and B. Razavi, "A 2.4 GHz 4 mW integer-N inductorless RF synthesizer," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 626–635, Mar. 2016.
- [6] J. Yuan and C. Svensson, "High-speed CMOS circuit technique," *IEEE J. Solid-State Circuits*, vol. 24, no. 1, pp. 62–70, Feb. 1989.
- [7] J. W. Jung and B. Razavi, "A 25-Gb/s 5-mW CMOS CDR/deserializer," *IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 684–697, Mar. 2013.
- [8] J. Lee and K.-C. Wu, "A 20-Gb/s full-rate linear clock and data recovery circuit with automatic frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3590–3602, Dec. 2009.
- [9] K.-C. Wu and J. Lee, "A 2×25-Gb/s receiver with 2:5 DMUX for 100-Gb/s Ethernet," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2421–2432, Nov. 2010.
- [10] J. Liang, A. Sheikholeslami, H. Tamura, Y. Ogata, and H. Yamaguchi, "A 28Gb/s digital CDR with adaptive loop gain for optimum jitter tolerance," in *IEEE Int. Solid-State Circuits Conf. ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 122–123.



Long Kong (S'15–M'16) received the B.E. degree in microelectronics from Shanghai Jiao Tong University, Shanghai, China, in 2011, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Los Angeles, Los Angeles, CA, USA, in 2013 and 2016, respectively.

In 2016, he joined Oracle, Santa Clara, CA, USA, as a Senior Hardware Engineer, where he was working on high-speed SerDes transceivers. He then joined Apple Inc., Cupertino, CA, USA, in 2017,

as an RFIC Design Engineer. His research interests include frequency synthesizers, clock and data recovery for data communication systems, and wireless transceivers.

Dr. Kong was a recipient of the Qualcomm Innovation Fellowship from 2013 to 2014, the Analog Devices Outstanding Student Designer Award in 2015, and the Broadcom Fellowship from 2015 to 2016.



Yikun Chang (S'16) received the B.S. degree in microelectronics from Peking University, Beijing, China, in 2013, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Los Angeles, Los Angeles, CA, USA, in 2015 and 2018, respectively.

She is currently a SerDes Circuit Design Engineer with Apple Inc., Cupertino, CA, USA. Her research interest includes low-power techniques in wireline transceivers.

Dr. Chang was a recipient of the China National Scholarship in 2012 and the Analog Devices Outstanding Student Designer Award in 2016.



Behzad Razavi (S'87–M'90–SM'00–F'03) received the B.S.E.E. degree from the Sharif University of Technology, Tehran, Iran, in 1985 and the M.S.E.E. and Ph.D.E.E. degrees from Stanford University, Stanford, CA, USA, in 1988 and 1992, respectively.

He was with AT&T Bell Laboratories, Murray Hill, NJ, USA, and Hewlett-Packard Laboratories, Palo Alto, CA, USA, until 1996. Since 1996, he has been an Associate Professor and subsequently a Professor of electrical engineering with the

University of California at Los Angeles, Los Angeles, CA, USA. He was an Adjunct Professor with Princeton University, Princeton, NJ, USA, from 1992 to 1994 and with Stanford University in 1995. He has authored the books *Principles of Data Conversion System Design* (IEEE Press, 1995), *RF Microelectronics* (Prentice Hall, 1998, 2012) (translated to Chinese, Japanese, and Korean), *Design of Analog CMOS Integrated Circuits* (McGraw-Hill, 2001 and 2016) (translated to Chinese, Japanese, and Korean), *Design of Integrated Circuits for Optical Communications* (McGraw-Hill, 2003; Wiley, 2012), and *Fundamentals of Microelectronics* (Wiley, 2006) (translated to Korean, Portuguese, and Turkish). His current research interests include wireless and wireline transceivers and data converters.

Dr. Razavi is a member of the U.S. National Academy of Engineering. He was a recipient of the Beatrice Winner Award for Editorial Excellence at the 1994 International Solid-State Circuits Conference (ISSCC), the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998, the McGraw-Hill First Edition of the Year Award in 2001, the Lockheed Martin Excellence in Teaching Award in 2006, the UCLA Faculty Senate Teaching Award in 2007, the CICC Best Invited Paper Award in 2009 and 2012, the 2012 Donald Pederson Award in Solid-State Circuits, the American Society for Engineering Education PSW Teaching Award in 2014, and the 2017 IEEE CAS John Choma Education Award. He was a co-recipient of both the Jack Kilby Outstanding Student Paper Award and the Beatrice Winner Award for Editorial Excellence at the 2001 ISSCC, the 2012 and the 2015 VLSI Circuits Symposium Best Student Paper Awards, and the 2013 CICC Best Paper Award. He was also recognized as one of the top ten authors in the 50-year history of ISSCC. He served on the Technical Program Committees for ISSCC from 1993 to 2002 and the VLSI Circuits Symposium from 1998 to 2002. He has also served as a Guest Editor and an Associate Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Electronics. He currently serves as the Editor-in-Chief of the IEEE SOLID-STATE CIRCUITS LETTERS. He is an Editor of Monolithic Phase-Locked Loops and Clock Recovery Circuits (IEEE Press, 1996) and Phase-Locking in High-Performance Systems (IEEE Press, 2003). He has served as an IEEE Distinguished Lecturer.