## A 14 $\mu$ m $\times$ 26 $\mu$ m 20-Gb/s 3-mW CDR Circuit

with High Jitter Tolerance

Long Kong, Yikun Chang, and Behzad Razavi

Electrical Engineering Department, University of California, Los Angeles, CA 90095, USA

longkong@ucla.edu

Abstract — A full-rate CDR loop employs a 3-stage ring VCO, a master-slave passive sampler as both a phase detector and a filter, and a new flipflop to achieve a bandwidth of 170 MHz. Implemented in 45-nm CMOS technology, the circuit exhibits a jitter tolerance of 2 UI at 5 MHz and a recovered clock jitter of 340 fs with  $2^7$ -1 PRBS.

Recent developments in wireline systems call for the use of tens of lanes on a single chip, and hence a small footprint for each transceiver. Clock and data recovery (CDR) circuits operating at tens of gigahertz present two issues in this regard: (1) they rely on LC oscillators to achieve a low jitter, and (2) they employ a loop filter with a large capacitor to minimize pattern-dependent jitter. The CDR footprint therefore tends to be large, e.g., 230  $\mu$ m × 170  $\mu$ m [1]. Another challenge in CDR design is the trade-off between pattern-dependent jitter tolerance, both of which increase as the loop bandwidth increases.

This paper introduces a compact, full-rate CDR architecture in 45-nm CMOS technology that allows the use of ring oscillators at 20 GHz while achieving the highest jitter tolerance reported for data rates of 20 Gb/s and above. Our approach, in principle, completely avoids pattern-dependent jitter, thus accommodating a wide loop bandwidth (> 100 MHz in this work). As a result, the ring VCO phase noise is greatly suppressed, the loop filter is significantly shrunk, and the jitter tolerance is considerably improved.

Architecture To arrive at the proposed architecture, we begin with the bang-bang loop shown in Fig. 1(a), where a flipflop,  $FF_1$ , acts as a phase detector and samples the VCO output on each rising (or falling) data transition, but generating substantial ripple on  $V_{cont}$ . We recognize that  $FF_1$  is equivalent to a master-slave sampler plus a 1-bit quantizer, the latter producing the ripple. We propose to emulate the master-slave operation in the analog domain [Fig. 1(b)], by means of a passive sampler consisting of  $S_1$ ,  $S_2$ ,  $C_1$ , and  $C_2$ . Here, the VCO output is sampled onto  $C_1$  when  $D_{in}$  rises and the result is delivered to  $C_2$  when  $D_{in}$  falls. In theory, no quantization occurs and no filter need precede the VCO because the passive master-slave sampler serves as both a (linear) phase detector and a filter. The filtering property has been used in RF synthesis [2].

To see why our CDR loop is, theoretically, free from pattern-dependent jitter (and hence achieves a wide band-

width), consider the waveforms shown in Fig. 1(b), noting that the time constant associated with the master sampler  $(S_1$ and  $C_1$ ) is around 0.15 UI. When  $D_{in}$  goes high at t = $t_1$ ,  $V_1$  begins to track  $V_{CK}$ , and when  $D_{in}$  goes low at t =  $t_2$ ,  $V_1$  freezes at a certain value,  $V_0$ . Now,  $S_2$  turns on, impressing  $V_0$  upon  $C_2$ . The loop must lock so that  $V_{cont}$ =  $V_0$  yields the desired VCO frequency, thereby requiring a phase offset,  $\Delta t$ . From another perspective, the loop slides the almost triangular waveform on  $V_1$  to the left or to the right so as to provide  $V_1 = V_0$  at  $t_2$ . The phase offset is reduced by minimizing the acquisition time constant and hence sharpening the transitions in  $V_1$ , and another technique described below. The clock feedthrough and charge injection of  $S_1$  and  $S_2$ ultimately place a lower bound on the ripple amplitude. This ripple can be suppressed by an RC filter preceding the VCO, but in our prototype the ripple is small enough not to demand such a filter. Note that the kT/C noise spectrum contributes negligible phase noise because of the high sampling rate provided by  $D_{in}$ .

In addition to relaxing the trade-off between patterndependent jitter and jitter tolerance, the proposed architecture offers three advantages. First, it suppresses the VCO phase noise for frequency offsets as high as 130 MHz, affording the use of low-power ring oscillators. Second,  $C_2$  in Fig. 1(b) holds the last sampled value for a long time, experiencing only a small droop due to the leakage of  $S_2$  and the varactors used in the VCO. That is, the CDR circuit can handle long input runs with little increase in jitter. In this work,  $V_{cont}$  droops by 2.3 mV for CID = 100 UI, causing 140 fs of peak-to-peak jitter. Third, the architecture exhibits a fast settling time, about 10 ns.

Figure 2 shows the complete type-I CDR architecture. Here, a three-stage inverter-based ring oscillator with varactor tuning (and digital capacitor tuning) runs at 20 GHz while drawing 1.8 mW and exhibiting a free-running phase noise of -118.5 dBc/Hz at 100-MHz offset. The CDR output jitter due to the VCO is about 120 fs,rms. Both VCO phases at A and B are buffered and utilized: the former for phase detection and the latter for retiming. This separation partially cancels the static phase offset mentioned above: B is ahead of A by 50 ps/3  $\approx$  16.7 ps while the offset varies from 9 ps to 12 ps across process corners and T = 0 to 100 °C. Thus, the net offset is less than 4 ps. For ease of measurements, the full-rate retimed data is demultiplexed by a factor of 2.

Retimer The data retimer  $FF_0$  in Fig. 2 can potentially

consume a high power. We propose a modified TSPC flipflop [3] that can operate at the requisite rate of 20 GHz in 45-nm CMOS technology. Shown in Fig. 3(a) is the FF topology. The original TSPC circuit utilizes only one clock phase and hence only  $M_1$  and  $M_2$  for clocking, suffering from degraded logical levels at X and Y. We add complementary devices  $M_3$  and  $M_4$  to restore these levels and improve the maximum speed from 8 GHz to 22 GHz. The retimer consumes 0.22 mW and the inverter generating  $\overline{CK}$ , 0.17 mW. That is, the departure from single-phase clocking is well justified.

**Experimental Results** The CDR circuit has been fabricated in 45-nm CMOS technology and tested with a 1-V supply on a probe station. Fig. 3(b) shows the die photograph and an active area of 14  $\mu$ m × 26  $\mu$ m. Excluding the DMUX and the  $\div$  2 circuit shown in Fig. 2, the prototype consumes 3.0 mW. Fig. 4 shows the measured recovered eye at 10 Gb/s and the recovered 20-GHz clock spectrum. Integrated from 100-Hz to 1-GHz, the rms jitter is 340 fs with 2<sup>7</sup>-1 PRBS and 459 fs with 2<sup>31</sup>-1 PRBS. Fig. 5 plots the measured jitter transfer and jitter tolerance. The former suggests a 3-dB transfer bandwidth of about 170 MHz and the latter demonstrates a tolerance of 2 UI at 5 MHz.

Table I summarizes the performance obtained in this work and compares it with that of prior art for data rates in the range of 20 to 28 Gb/s.

**Acknowledgments** The authors thank the TSMC University Shuttle Program for chip fabrication.

## References

- [1] J. W. Jung and B. Razavi, IEEE JSSC, pp. 684-697, Mar. 2013.
- [2] L. Kong and B. Razavi, IEEE JSSC, pp. 626-635, Mar. 2016.
- [3] J. Yuan and C. Svensson, IEEE JSSC, pp. 62-70, Feb. 1989.
- [4] J. Lee and K. C. Wu, ISSCC Dig., pp. 366-367, Feb. 2009.
- [5] K. C. Wu and J. Lee, ISSCC Dig., pp. 374-375, Feb. 2010.
- [6] J. Liang et al., ISSCC Dig., pp. 122-123, Feb. 2017.



Fig. 1. (a) Conventional bang-bang CDR loop, and (b) proposed CDR concept along with its waveforms.



Fig. 3. (a) Proposed flipflop schematic, and (b) CDR die photograph.

(a)

(b)



Fig. 4. Measured eye diagram of recovered data and recovered clock spectrum.



TABLE I. Performance summary.

|                                   | [1]                           | [4]   | [5]                          | [6]      | This<br>Work |
|-----------------------------------|-------------------------------|-------|------------------------------|----------|--------------|
| Data Rate (Gb/s)                  | 25                            | 20    | 25                           | 28       | 20           |
| Oscillator Topology               | LC                            | LC    | LC                           | N/A      | Ring         |
| Rec. Clock Jitter (ps)            | 1.56                          | 0.407 | 0.254                        | N/A      | 0.459        |
| Jitter Tolerance<br>@ 5MHz (UIpp) | 0.5                           | 0.9   | 0.52                         | 0.3      | 2.0          |
| Power (mW)                        | 3.1<br>(excludes<br>1:2 DMUX) | 154   | 99<br>(excludes<br>2:5 DMUX) | 106.6    | 3.0          |
| Area (mm <sup>2</sup> )           | 0.039                         | 0.854 | 2.47 *                       | 0.523 ** | 0.00036      |
| Technology (nm)                   | 65                            | 90    | 65                           | 28       | 45           |
| Supply (V)                        | 1                             | 1.5   | 1.2                          | N/A      | 1            |

\* Includes two CDRs and one DMUX \*\* Includes pads