# A CMOS Clock Recovery Circuit for 2.5-Gb/s NRZ Data

Seema Butala Anand and Behzad Razavi, Member, IEEE

Abstract—This paper describes a phase-locked clock recovery circuit that operates at 2.5 Gb/s in a 0.4- $\mu$ m digital CMOS technology. To achieve a high speed with low power dissipation, a two-stage ring oscillator is introduced that employs an excess phase technique to operate reliably across a wide range. A sample-and-hold phase detector is also described that combines the advantages of linear and nonlinear phase detectors. The recovered clock exhibits an rms jitter of 10.8 ps for a PRBS sequence of length  $2^7 - 1$  and a phase noise of -80 dBc/Hz at a 5-MHz offset. The core circuit dissipates a total power of 33.5 mW from a 3.3-V supply and occupies an area of  $0.8 \times 0.4$  mm<sup>2</sup>.

*Index Terms*—Clock and data recovery, optical communication, oscillators, phase detectors, PLLs.

## I. INTRODUCTION

T HE rapid increase of real-time audio and video transport over the Internet has led to a global demand for high-speed serial-data communication networks. To accommodate the required bandwidth, an increasing number of wide-area networks (WANs) and local-area networks (LANs) are converting the transmission medium from a copper wire to fiber. This trend motivates research on low-cost, low-power integrated fiber-optic receivers. A critical task in such receivers is the recovery of the clock embedded in the nonreturn-to-zero (NRZ) serial-data stream. The recovered clock both removes the jitter and distortion in the data and retimes it for further processing.

This paper describes the design and implementation of a phase-locked CMOS clock recovery circuit that employs a number of novel techniques to support a data rate of 2.5 Gb/s. A two-stage ring oscillator incorporating an excess phase shift method achieves a high speed, enabling reliable operation in a 0.4- $\mu$ m CMOS technology. In addition, a phase detection technique based on an analog sample-and-hold circuit is introduced that exhibits a high speed while producing a small ripple on the control voltage of the oscillator. Measurements on the fabricated prototype indicate an rms jitter of 10.8 ps for a pseudorandom bit sequence (PRBS) of  $2^7 - 1$ . The circuit dissipates 33.5 mW from a 3.3-V supply and occupies an area of  $0.8 \times 0.4$  mm<sup>2</sup>.

The next section of the paper presents the clock recovery architecture and design issues. Section III describes the building blocks and Section IV summarizes the experimental results.

The authors are with the Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA 90095–1594 USA (e-mail: razavi@icsl.ucla.edu).

Publisher Item Identifier S 0018-9200(01)01480-9.

### II. ARCHITECTURE

Two properties of NRZ data make clock recovery difficult. First, arbitrarily long consecutive sequences of 1s or 0s limit the capture range of the phase-locked loop and allow the oscillation frequency to drift. Second, NRZ data contains no spectral content at the bit rate, requiring edge detection. These properties impact the choice of the clock recovery architecture: 1) in the absence of data transitions in the input bit sequence, the phase detector must not generate any false phase comparisons and 2) the circuit requires a nonlinear operation to create a spectral line at the bit rate.

Clock recovery circuits designed for WANs such as SONET or SDH require a narrow-loop bandwidth to meet the jitter transfer specification. This in turn severely limits the capture range of a simple phase-locked loop. For this reason, a means of frequency detection is also necessary so as to guarantee lock in the presence of large oscillator frequency variations.

In addition to the above issues, various sources of jitter affect the design of the overall system as well. In clock recovery circuits, five sources of jitter can be identified: 1) jitter generated from a noisy input; 2) oscillator jitter due to device electronic noise; 3) supply and substrate noise; 4) disturbance of oscillator by leakage of data transitions through the phase detector; and 5) oscillator jitter due to ripple on the control line.

The effect of input jitter is reduced by a narrow-loop bandwidth and the oscillator jitter is lowered through the use of large swings and careful design with respect to phase noise. All of the building blocks are fully differential so as to minimize the effect of supply and common-mode noise. Moreover, a buffer isolates the oscillator from the data transitions coupled through the phase detector.

At low supply voltages, the VCO gain  $K_{\rm VCO}$  required to achieve a given tuning range becomes quite large. As a result, the ripple on the control voltage due to the phase detector activity creates greater jitter at the output. The conflict between a wide tuning range and a low VCO sensitivity is resolved by a provision for two control inputs, a fine control driven by the main loop and a coarse control that will be driven by a frequency-locked loop (FFL) in future implementations. Since the FLL remains relatively quiet (or can be disabled) after phase lock, the high sensitivity of coarse control does not lead to high jitter.

The architecture of the clock recovery circuit addressing the above issues is shown in Fig. 1. The loop consists of a phase detector (PD), a voltage-to-current (V/I) converter, a passive loop filter, and a ring-based voltage-controlled oscillator (VCO). The VCO provides the main output through a set of open-drain buffers to  $50-\Omega$  termination resistors.

Manuscript received July 20, 2000; revised October 18, 2000. This work was supported in part by the SRC under Contract 64.001 and by Cypress Semiconductor.



Fig. 1. PLL architecture.



Fig. 2. (a) Two-stage ring oscillator. (b) Simplified model of (a).

Designing the circuit to operate at 2.5 GHz in a 0.4- $\mu$ m CMOS technology entails a number of challenges. The next section will deal with the design of each building block.

# **III. BUILDING BLOCKS**

In this section, the transistor-level implementation of each building block is described, emphasizing the design constraints imposed by the technology limitations.

## A. VCO

Since the clock recovery circuit is designed with the provision of adding a frequency detector, the oscillator must generate quadrature outputs [1], [2]. For a differential ring oscillator, this observation dictates the need for an even number of stages. In the 0.4- $\mu$ m CMOS technology used here, the maximum oscillation frequency does not exceed 1.8 GHz for a four-stage ring (and 2.4 GHz for a three-stage ring). Thus, to achieve reliable operation at 2.5 GHz, a two-stage topology is necessary. However, two simple differential pairs in a loop [Fig. 2(a)] fail to oscillate because each stage contributes only one pole, yielding insufficient phase at unity gain. The simplified model shown in Fig. 2(b) illustrates this effect. The total frequency-dependent phase shift around the loop reaches 180° only at infinite frequency, where the loop gain drops to zero. Thus, excess phase must be introduced in each stage [3], [4].



Fig. 3. (a) Two-stage ring oscillator with a composite load. (b) Simplified model of (a).

In Fig. 3(a), the load resistors are replaced with a composite load consisting of  $R_1$ ,  $C_1$ , and a pMOS device. With the proper choice of parameters, such a load is inductive and can provide enough phase shift to allow oscillation. The model shown in Fig. 3(b) can be used to study the behavior of the oscillator. Note that, in addition to  $C_1$ , each stage contains a load capacitance  $C_L$ , which represents the drain junction capacitance of the MOS devices, the input capacitance of the next stage, and the input capacitances of the isolation buffers.

At this point, it is necessary to determine the parameters of the composite load so as to ensure oscillation. We obtain the transfer function for the half circuit equivalent of each differential pair as

$$\frac{V_{\text{out}}}{V_{\text{in}}} = \frac{-g_{m1}(1 + R_1C_1s)}{g_{m3} + (C_1 + C_L + R_1C_1/r_{o3})s + R_1C_1C_Ls^2}.$$



Fig. 4. Gain and phase response of each delay stage.

The circuit exhibits a zero at  $-1/R_1C_1$  and two poles whose sum is given by

$$\omega_{p1} + \omega_{p2} = \frac{C_1 + C_L}{R_1 C_1 C_L}$$

where channel-length modulation is neglected.

Oscillation of the two-stage VCO depends on careful placement of the poles and the zero, with a requisite 90° phase shift at the unity-gain frequency  $\omega_u$  for each stage. This in turn mandates that each pole frequency be less than the frequency of the zero. Hence,

$$\frac{C_1 + C_L}{R_1 C_1 C_L} < \frac{2}{R_1 C_1}$$

and consequently

$$C_1 < C_L$$
.

The above condition can be easily met by the proper choice of device dimensions.

Fig. 4 plots the simulated gain and phase of each stage as a function of frequency with and without the excess phase network. Two important points can be observed: 1) the unity-gain frequency is *higher* with the  $R_1$ - $C_1$  network due to the inductive behavior of the load and 2) the circuit satisfies Barkhausen's oscillation criteria at a single frequency ( $\approx 2.5$  GHz) in the presence of  $R_1$  and  $C_1$  but falls short by 13° (per stage) at  $\omega_u$  without the excess phase network. Note that  $R_1$  is actually a pMOS device operating in the deep triode region, with its gate bypassed to  $V_{DD}$  to minimize the effect of common-mode noise.

The phase noise due to the thermal noise of resistors  $R_1$  in Fig. 3(b) is of concern. To quantify the phase noise, a small sinusoid is placed in series with one of the resistors [Fig. 5(a)], the oscillator is simulated in the time domain, and the output spectrum is examined. With proper normalization, the phase noise from the additional resistors can be determined for a given frequency offset. Fig. 5(b) shows the output spectrum for a 5-mV<sub>p</sub> sinusoidal source. This result indicates that, after normalization



Fig. 5. (a) One stage of a VCO with noise due to  $R_1$ . (b) Output spectrum of a VCO with noise tone.

to the actual noise value  $(4kTR_1)$ , the total phase noise due to all four resistors in the VCO is roughly equal to -143.2 dBc/Hz at a 5-MHz offset, a value much less than the contribution of the other devices.

In order to vary the oscillation frequency, the VCO incorporates delay interpolation [5], providing a tuning range wide enough to encompass process and temperature variations [Fig. 6(a)]. In the transistor implementation of each delay stage, shown in Fig. 6(b), the fast path consists of one differential pair,  $M_1$ - $M_2$ , whereas the slow path consists of two differential pairs,  $M_5$ - $M_6$  and  $M_7$ - $M_8$ . Interpolation is accomplished by varying the tail currents of  $M_1$ - $M_2$  and  $M_7$ - $M_8$  in opposite directions.

The fast and slow paths share the load consisting of  $M_3$ - $M_4$ and the  $R_1$ - $C_1$  networks. Since the limited voltage headroom makes it difficult to control the currents of  $M_1$ - $M_2$  and  $M_7$ - $M_8$ by another differential pair, transistors  $M_9$  and  $M_{10}$  are driven by a current folding circuit (Fig. 7). Setting the gain of both paths, both  $M_9$  and  $M_{10}$  in fact consist of smaller transistors, a narrow transistor and a wide one, so as to provide fine and coarse tuning.

An interesting phenomenon in the interpolating VCO topology used here is its relatively linear input–output characteristic. This property arises because two effects *cancel* each other: the nonlinearity associated with the voltage-to-current conversion in the pMOS differential pairs  $(M_{1c}-M_{2c})$  and





Fig. 6. (a) One stage of the delay interpolation (b) Implementation of one stage with modified load.



Fig. 7. Implementation of the coarse and fine control through the tail currents,  $M_{\rm 9}$  and  $M_{\rm 10}.$ 

 $M_{3c}$ - $M_{4c}$  in Fig. 7) and the nonlinearity in the current-to-frequency conversion in the interpolation operation. Simulations indicate that the current-to-frequency conversion by interpolation is somewhat "expansive," thereby cancelling the compressive nonlinearity of the pMOS differential pairs. This observation is confirmed by experimental results.

# B. Phase Detector

The design of phase detectors for high-speed random NRZ data is a challenging task. In "linear" phase detectors such as that in [6], the output pulsewidth is linearly proportional to the input phase difference, resulting in a constant loop gain during lock

Fig. 8. (a) D-flip-flop implementation of phase detector and output characteristic. (b) Sample-and-hold implementation of phase detector and output characteristic.

(b)

transient and minimal charge pump activity after phase lock is achieved. The difficulty, however, lies in generating pulsewidths equal to a fraction of the clock period at speeds near the limits of the technology. By contrast, bang-bang PDs [5] employ simple flipflops for maximum speed but provide only two output states [Fig. 8(a)], creating significant ripple on the control line in the locked condition and hence producing great jitter at the VCO output.

The phase detector used in this work combines the two methods so as to overcome the speed limitations of the former and avoid the high activity rate of the latter. Shown in Fig. 8(b), the PD is realized as a master–slave sample-and-hold circuit (an "analog D-flipflop"), whereby each rising data transition samples the instantaneous value of the VCO output. The circuit thus generates an output that is linearly proportional to the input phase difference in the vicinity of lock. The voltage then drives a V/I converter and the loop filter.

The transistor implementation of the PD is depicted in Fig. 9. Each of the master and slave stages consists of a differential pair whose tail current and load devices turn off simultaneously,



Fig. 9. Circuit implementation of a master-slave sample-and-hold phase detector.



Fig. 10. Output characteristic of the phase detector.

thereby storing the instantaneous value of  $V_{\rm VCO}$  on the parasitic capacitances  $C_{P1}-C_{P4}$ . To allow operation from a low-supply voltage, the tail current is controlled by a current mirror and a pMOS differential pair. Transistor  $M_T$  is a narrow and long device so that when it is on,  $M_3$  and  $M_4$  (which are wide and short) are forced into the triode region. This obviates the need for common-mode feedback (CMFB).

In addition to a relatively linear behavior with respect to the input phase difference, the PD of Fig. 9 exhibits two other properties. First, the master–slave sample-and-hold circuit avoids a transparent path from  $D_{in}$  to  $V_{out}$ , producing a voltage proportional to the phase difference for most of the period. Second, the path with large switching transients, namely, from  $D_{in}$  to each stage, operates only at the data rate rather than the VCO rate. Consequently, the bandwidth of this path can be as low as  $0.7 \times 2.5 \text{ GHz} = 1.75 \text{ GHz}$ , allowing a low-power implementation in 0.4- $\mu$ m CMOS technology.

The simulated input–output characteristic of the phase detector is presented in Fig. 10. The behavior is relatively linear for phase differences as large as  $\pm 50^{\circ}$ .

# C. V-I Converter and Loop Filter

Fig. 11 illustrates the differential V-I converter and loop filter. The current generated by  $M_1$  and  $M_2$  is folded up so as to produce an output common-mode level compatible with the VCO control path. A simple common-mode feedback (CMFB) net-



Fig. 11. V–I converter and loop filter.



Fig. 12. (a) Differential control voltage for lock simulation. (b) Corresponding phase offset.

work defines the output CM level. The loop filter is external in this design and consists of a simple lead-lag network.

The mismatch between the differential output currents of the V-I converter translates to a static phase error between the data and the VCO output. The circuit therefore incorporates relatively large devices to minimize this error. Note that this stage runs at a frequency equal to the *difference* between the input data rate and the VCO frequency, allowing a relaxed tradeoff between speed, device dimensions, and power dissipation.

Fig. 12 presents the simulated lock behavior of the overall clock recovery circuit (with random input data) at the transistor



Fig. 13. Die photo of clock recovery circuit.



Fig. 14. Free-running characteristics of VCO.

level. The differential control voltage of the oscillator is plotted in Fig. 12(a), suggesting a lock time of 650 ns. Fig. 12(b) shows the phase offset between the VCO output and the data transition edges. This plot is obtained by determining the zero crossings of the VCO output and the data and calculating the difference between the two. Ideally, the phase offset should converge to zero for perfect phase locking, but, due to the inherent skew between  $D_{in}$  and  $V_{VCO}$  in Fig. 9, a static offset of 15 ps arises.

## **IV. EXPERIMENTAL RESULTS**

The clock recovery circuit has been fabricated in a digital 0.4- $\mu$ m CMOS technology. Shown in Fig. 13 is a photograph of the die, whose active area measures 0.8 mm  $\times$  0.4 mm. The circuit has been tested in a chip-on-board assembly while running from a 3.3-V power supply.

The free-running frequency characteristics of the VCO are shown in Fig. 14 for both coarse and fine tuning. Note the linear behavior across a wide range for both control inputs. The phase noise of the free-running VCO is  $-90 \, dBc/Hz$  at a 5-MHz offset.

Fig. 15 shows the measured output and the jitter histogram in response to a 2.5-Gb/s PRBS sequence of length  $2^7 - 1$ , which is relevant for applications using 8 b/10 b coding. The

| TABLE I             |
|---------------------|
| PERFORMANCE SUMMARY |

| Bit Rate                     | 2.5 Gb/s                               |
|------------------------------|----------------------------------------|
| Capture Range                | 15 MHz                                 |
| Lock Range                   | 50 MHz                                 |
| Phase Noise at 5 MHz Offset  | -80 dBc/Hz                             |
| Jitter for PRBS $2^7 - 1$    | 10.7 ps, rms                           |
| Jitter for PRBS $2^{23} - 1$ | 17.4 ps, rms                           |
| Power Dissipation            | 33.5 mW                                |
| VCO                          | 10 mW                                  |
| VCO Buffer                   | 13.5 mW                                |
| PD and V/I                   | 10 mW                                  |
| Supply Voltage               | 3.3 V                                  |
| Die Area                     | $0.8 \text{ mm} \times 0.4 \text{ mm}$ |
| Technology                   | 0.4-µm CMOS                            |

rms and peak-to-peak jitters are equal to 10.8 ps and 90.8 ps, respectively. Fig. 16 illustrates the output for a sequence length of  $2^{23} - 1$ , exhibiting 17.4 ps and 167 ps of rms and peak-to-peak jitter, respectively. The increase in jitter can be explained as follows. The phase detector selects a linear value of current as a function of the phase difference between clock and data. Since this current remains constant until the next data transition, a long sequence of 1s or 0s leads to greater jitter.

Fig. 17 depicts the phase noise for a random data sequence of length  $2^7 - 1$  with a resolution bandwidth of 1 kHz. The phase noise is equal to -80 dBc/Hz at a 5-MHz offset. The plot shows effect of noise shaping by the loop.

Table I summarizes the measured performance of the clock recovery circuit. The capture range is a function of the loop filter bandwidth, which is 15 MHz for this design. The lock range is determined by the the fine tuning range, which corresponds to 50 MHz. The coarse tuning range is 800 MHz. The total power dissipation (excluding that of the 50- $\Omega$  drivers) is 33.5 mW from a 3.3-V power supply. The performance is comparable with that of a clock recovery circuit designed in a 30-GHz bipolar technology [7].



Fig. 15. (a) Time-domain waveform at 2.5 GHz. (b) Jitter histogram for a PRBS of  $2^7 - 1$ .



Fig. 16. (a) Time-domain waveform at 2.5 GHz. (b) Jitter histogram for a PRBS of  $2^{23} - 1$ .



Fig. 17. Phase noise measurement for a PRBS of length  $2^7 - 1$  (center frequency: 2.48 GHz; horizontal scale: 5 MHz/div.; vertical scale: 10 dB/div.; resolution bandwidth: 1 kHz).

From the single pole approximation in [8] and [9], the jitter transfer and jitter tolerance of the prototype can be estimated. Simulations using the parameters of the prototype indicate a jitter transfer function with a -3-dB bandwidth of 2 MHz and a jitter tolerance of 0.32 unit intervals (UI<sub>pp</sub>) at 1 MHz.

## V. CONCLUSION

In this paper, we presented a low-power, integrated 2.5-Gb/s clock recovery designed in a digital 0.4- $\mu$ m CMOS technology. A method to resolve the conflict between a wide tuning range and low sensitivity is presented. Also, a two-stage ring-based VCO is introduced that utilizes an inductive load to provide excess phase shift, guaranteeing oscillation. Since the VCO consists of only two stages, it can reach speeds above 2.5 GHz while maintaining low power consumption. A high-speed phase detector that combines the speed advantages of a bang-bang phase detector with linear characteristics is also described.

#### REFERENCES

- F. M. Gardner, "Properties of frequency difference detectors," *IEEE Trans. Commun.*, vol. COM-33, pp. 131–138, Feb. 1985.
- [2] A. Pottbaker, U. Langman, and H. Schreiber, "A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s," *IEEE J. Solid-State Circuits*, vol. 27, pp. 1747–1751, Dec. 1992.
- [3] H. Djahanshahi and C. Salama, "Robust two-stage current-controlled oscillator in sub-micrometer CMOS," *Electron. Lett.*, vol. 35, no. 21, pp. 1837–1839, Oct. 1999.

- [4] Y. Moon and K. Yoon, "A 3.3-V high-speed CMOS PLL with a twostage self-feedback ring oscillator," in *IEEE AP-ASIC Dig. Tech. Papers*, 1999, pp. 287–290.
- [5] B. Lai and R. C. Walker, "A monolithic 622 Mb/s clock extraction data retiming circuit," in *ISSCC Dig. Tech. Papers*, Feb. 1991, pp. 144–145.
- [6] C. Hogge, "A self correcting clock recovery circuit," J. Lightwave Technol., vol. LT-3, pp. 1312–1314, Dec. 1985.
- [7] M. Soyuer, "A monolithic 2.3-Gb/s 100-mW clock and data recovery circuit in silicon bipolar technology," *IEEE J. Solid-State Circuits*, vol. 28, pp. 1310–1313, Dec. 1993.
- [8] Y. Greshishchev and P. Schvan, "SiGe clock and data recovery IC with linear-type PLL fir 10-Gb/s SONET application," *IEEE J. Solid-State Circuits*, vol. 35, pp. 1353–1359, Sept. 2000.
- [9] L. De Vito, "A versatile clock recovery architecture and monolithic implementation," in *Monolithic Phased-Locked Loops and Clock Recovery Circuits: Theory and Design*, B. Razavi, Ed. New York: IEEE Press, 1996, pp. 405–442.



Seema Butala Anand was born in Modasa, India, in 1970. She received the B.S. degree in electrical engineering from the University of California, Los Angeles (UCLA), in 1992 and the M.S. degree in electrical engineering and computer science from the University of California, Berkeley, in 1994. She is currently working toward the Ph.D. degree in electrical engineering at UCLA.

From 1994 to 1997, she was a Member of Technical Staff at Hewlett-Packard Laboratories, Palo Alto, CA, where her research involved integrated

circuit design for wireless communication systems. Her current interests include broadband data communications as well as low-voltage and low-power circuits.



**Behzad Razavi** (S'87–M'90) received the B.Sc. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1985, and the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988 and 1992, respectively.

He was with AT&T Bell Laboratories, Holmdel, NJ, and subsequently with Hewlett-Packard Laboratories, Palo Alto, CA. Since September 1996, he has been an Associate Professor of electrical engineering at the University of California, Los

Angeles. His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995. He is a member of the Technical Program Committees of the Symposium on VLSI Circuits and the International Solid-State Circuits Conference (ISSCC), in which he is the chair of the Analog Subcommittee. He is an IEEE Distinguished Lecturer and the author of *Principles of Data Conversion System Design* (New York: IEEE Press, 1995), *RF Microelectronics* (Englewood Cliffs, NJ: Prentice-Hall, 1998), and *Design of Analog CMOS Integrated Circuits* (New York: McGraw-Hill, 2000), and the editor of *Monolithic Phase-Locked Loops and Clock Recovery Circuits* (New York: IEEE Press, 1996).

Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998. He has also served as Guest Editor and Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Electronics.