# An 8-Bit 150-MHz CMOS A/D Converter

Yun-Ti Wang and Behzad Razavi, Member, IEEE

Abstract—This paper describes an 8-bit 5-stage pipelined and interleaved analog-to-digital converter that performs analog processing only by means of open-loop circuits such as differential pairs and source followers to achieve a high conversion rate. The concept of sliding interpolation is proposed to obviate the need for a large number of comparators or interstage digital-to-analog converters and residue amplifiers. The pipelining scheme incorporates distributed sampling between the stages so as to relax the linearity-speed tradeoffs in the sample-and-hold circuits. A clock edge reassignment technique is also introduced that suppresses timing mismatches in interleaved systems, and a punctured interpolation method is proposed that reduces the integral nonlinearity error with negligible speed or power penalty. Fabricated in a 0.6-µm CMOS technology, the converter achieves differential and integral nonlinearities of 0.62 and 1.24 LSB, respectively, and a signal-to-(noise + distortion) ratio of 43.7 dB at a sampling rate of 150 MHz. The circuit draws 395 mW from a 3.3-V supply and occupies an area of  $1.2 \times 1.5$  mm<sup>2</sup>.

*Index Terms*—A/D converters, interpolation, pipelining, sample-and-hold circuits.

## I. INTRODUCTION

**H** IGH-SPEED analog-to-digital converters (ADC's) are widely used in communications, instrumentation, and consumer electronics. ADC's achieving a resolution of approximately 8 bits and sampling rates well above 100 MHz find application in the twisted-pair interface of Gigabit Ethernet and in code conversion for flat-panel displays. In addition to performance, cost and integrability in VLSI technologies are also important concerns in these applications, making CMOS implementations attractive.

This paper presents the design of an 8-bit 150-MHz ADC implemented in a  $0.6-\mu m$  CMOS technology. With a 6.7-ns clock period and rise and fall times on the order of 0.5 ns, the timing budget for sampling and quantization is extremely tight, prohibiting the use of high-precision feedback stages in the analog signal path. This work introduces a number of architecture and circuit techniques that obviate the need for closed-loop circuits in multistage ADC's, thereby relaxing the speed–precision tradeoff. Using such techniques and an open-loop front-end sample-and-hold amplifier (SHA), the converter performs pipelining with no interstage digital-to-analog converters (DAC's), subtractors, or precision charge transfer.



Fig. 1. Traditional active 2x interpolation architecture.

Section II introduces the ADC architecture, presenting techniques such as sliding interpolation, embedded pipelining and interleaving, clock edge reassignment, and punctured interpolation. Section III describes the design of the building blocks and various tradeoffs at the circuit level and the architecture level. Section IV presents the experimental results obtained for the prototype.

## II. PROPOSED ADC ARCHITECTURE

In this section, we describe the ADC architecture and introduce the following techniques:

- sliding interpolation to avoid the exponential growth of power and area;
- 2) *interstage distributed sampling* to perform pipelining without op-amps;
- 3) *dual-channel interleaving* to increase the conversion rate;
- clock edge reassignment to suppress timing mismatches in interleaving;
- 5) *punctured interpolation* to reduce integral nonlinearity (INL).

The ADC is fully differential but, for the sake of brevity, most of the concepts are illustrated in single-ended form.

## A. Sliding Interpolation

Amplitude quantization can be viewed as a collection of zero crossings. As shown in Fig. 1, the differential outputs of the preamplifiers in a flash stage cross zero as  $V_{in}$  crosses  $V_{R,j}$ ,  $V_{R,j+1}$ , etc. The front-end preamplifiers can be followed by differential pairs to perform 2x interpolation [1], [2], thereby

Manuscript received July 16, 1999; revised October 21, 1999.

Y.-T. Wang was with the University of California, Los Angeles, CA 90095 USA. He is now with Silicon Bridge, San Jose, CA 95134 USA (e-mail: yt-wang@siliconbridge.com).

B. Razavi is with the University of California, Los Angeles, CA 90095 USA (e-mail: razavi@ee.ucla.edu).

Publisher Item Identifier S 0018-9200(00)00560-6.



(a)



Fig. 2. (a) Sliding interpolation architecture and (b) addition of overlap.

creating additional zero crossings and hence increasing the resolution.

Interpolation relaxes a number of tradeoffs in the design of the front end. The preamplifiers typically sustain the most stringent requirements in terms of input common-mode range, input capacitance, power dissipation, overdrive recovery speed, voltage gain, and capacitive feedthrough to the reference ladder. Thus, it is desirable to reduce the number of preamplifiers through the use of interpolation. Another important aspect of interpolation is that it does not require a precise gain in any of the stages because only the zero crossings carry information. Consequently, the interpolating stages can be realized by differential pairs, greatly simplifying the design in submicrometer, low-voltage technologies. Interpolation also reduces the differential nonlinearity (DNL) [1] (but not the INL).

The principal drawback of the interpolation scheme depicted in Fig. 1 is the exponential growth of power and hardware with the resolution. However, we recognize that if  $V_{in}$  lies between  $V_{R,j}$  and  $V_{R,j+1}$ , then only the outputs of  $A_j$  and  $A_{j+1}$  are of interest, and the remaining preamplifiers do not provide any additional information. Thus, the subsequent stages need not interpolate the outputs of all of the preamplifiers.

The above observation leads to the concept of sliding interpolation. Illustrated in Fig. 2(a), the idea is to use a simple, fast sub-ADC to determine which preamplifier outputs must be interpolated and route these outputs to the next rank of interpolating differential pairs by a multiplexer (MUX). The rest of the preamplifier outputs are discarded. In a sense, the interpolating stage "slides" up and down according to the decision of the sub-ADC. The architecture of Fig. 2(a) in principle requires only three differential pairs in the sliding stage to preform interpolation. In practice, however, the comparators used in the sub-ADC suffer from offsets, mandating some overlap to ensure that proper signals are selected for further interpolation. This point will become clearer in the overall architecture described below.

The concept of sliding interpolation can be repeatedly applied to cascaded stages, leading to a multistage ADC whose power and hardware grow linearly with the resolution. This is illustrated in Fig. 2(b). The first stage employs 16 preamplifiers to generate 16 zero crossings. If the analog input lies between  $V_{R,j}$ 



Fig. 3. Detailed block diagram of multistage ADC with sliding interpolation.



Fig. 4. Pipelined sliding interpolation ADC architecture.

and  $V_{R, j+1}$ , then a 4-bit coarse ADC and a 16-to-4 MUX route the outputs of the preamplifiers sensing  $V_{R, j-1}, \ldots, V_{R, j+2}$  to the next interpolating stage. Note that amplification increases the spacing between the zero-crossing points. Since only 2x-interpolation is used, each stage (excluding the first one) generates a total of seven outputs. Also, a sub-ADC detects two more bits in each stage, one of which is used for subsequent digital error correction. Thus, for a resolution of 8 bits, a total of five stages are necessary. Stages 2–5 are identical, simplifying the design and layout.

Further details of the architecture are shown in Fig. 3. The first stage incorporates 16 preamplifiers while each of the following interpolative stages requires seven amplifiers. By virtue of sliding interpolation, the total number of differential pairs reduces from roughly 500 to 50. The five sub-ADC's employ a total of 28 comparators. (Each of stages 2–5 requires three com-



Fig. 5. (a) Addition of interleaving and (b) feedforward of signal by replica SHA.

parators.) In addition to saving power and hardware, sliding interpolation readily lends itself to pipelining while requiring no DAC's, subtractors, or other high-precision interstage circuits.

## B. Embedded Pipelining

The multistage architecture of Fig. 3 can achieve a much higher conversion rate through the use of pipelining. Since each interpolating stage contains only two analog blocks-a MUX and an amplifier bank-pipelining can be applied at only one of two points: at the input or output of the MUX. As shown in Fig. 4, the interface between the multiplexer and the amplifier bank is chosen for pipelining for two reasons. First, the multiplex switches can also function as sampling devices, significantly reducing the delay because now only one switch appears in the signal path between two consecutive stages. Second, the interconnect wires between the multiplexers and the interpolating amplifiers exhibit a significant amount of parasitic capacitance, which can now be utilized as the sample-and-hold capacitors. This distributed sample-and-hold system is similar to that reported in [3] except that it is performed in conjunction with multiplexing.



Fig. 6. Clock edge reassignment in interleaved architecture.

Note that each stage in the pipeline operates in the sample mode for half of the clock period and in the hold mode for the other half. On the other hand, the sub-ADC in each stage operates only during the hold mode, raising the possibility of adding interleaving to further increase the throughput rate.

#### C. Addition of Interleaving

Even though the maximum path "length" between consecutive samplers in the pipeline of Fig. 4 corresponds to roughly two differential pairs, the settling requirements still limit the conversion speed. As shown in Fig. 5(a), the converter employs two identical interleaved channels to increase the speed. The multiplexers, distributed sampling circuits, and 2x-interpolation amplifiers are duplicated for the even and the odd channels whereas the front-end buffer, the preamplifiers, and all of the sub-ADC's are shared between the two channels. The timing is such that when one stage in the odd channel is in the sampling mode, the corresponding stage in the even channel is in the hold/amplification mode and vice versa. When the SHA in the odd channel is sampling the analog input, the SHA in the even channel is holding and applying the previous analog sample to the preamplifiers through the buffer. The sub-ADC in stage 1 then generates the four-bit digital code and commands the MUX in the even channel of stage 2 to redirect the selected preamplifier outputs to the interpolation amplifiers.

The first sub-ADC in Fig. 5(a) still poses some difficulties. First, owing to the finite impedance seen at the preamplifier outputs, the kickback noise generated by the sub-ADC significantly disturbs the analog signals at the inputs of the multiplexers, mandating a long settling time after the sub-ADC is strobed. Second, the sub-ADC cannot begin its conversion until the front-end SHA, the buffer, and the preamplifier outputs have settled. Since the buffer drives a relatively large capacitance, the settling in this path is quite slow. Third, since the sub-ADC appears in the critical path—that is, the preamplifier outputs must remain idle until the sub-ADC makes a decision—the throughput rate is still limited.

Fig. 5(b) illustrates a modification that alleviates the above issues. A replica front-end SHA is added, and its output directly drives the first sub-ADC. Scaled down in device dimensions and current levels by a factor of two with respect to the main SHA, the replica prohibits the large kickback noise of the sub-ADC from corrupting the output of the preamplifiers. Also, the replica signal experiences a shorter delay than that in the main path because of the much smaller load capacitance seen by the replica buffer. Thus, the sub-ADC can be strobed much earlier than before. Note that one bit of overlap and digital correction suppress errors due to mismatches between the main path and the replica path.

# D. Clock Edge Reassignment

Timing mismatches severely limit the precision of high-speed interleaved ADC's [4], [5]. As shown in Fig. 6(a), two interleaved samplers SHA<sub>1</sub> and SHA<sub>2</sub> require two corresponding clocks  $CK_1$  and  $CK_2$ , which are typically generated by a frequency divider. In the ideal case, each sampling edge of  $CK_1$  is placed precisely midway between the sampling edges of  $CK_2$ such that SHA<sub>1</sub> and SHA<sub>2</sub> sample the analog signal at evenly spaced points in time. In reality, however, the devices in the frequency divider suffer from substantial mismatches, especially at high speeds, introducing large timing errors between  $CK_1$  and  $CK_2$ . Since an 8-bit ADC sampling a 75-MHz signal cannot tolerate timing mismatches greater than roughly 12 ps, frequency



Fig. 7. Punctured interpolation: (a) implementation and (b) error plot.



Fig. 8. Nonlinearity-induced error in 2x interpolation.

division in CMOS technology does not provide the accuracy required in this design.

The problem of timing mismatch can be considerably relaxed if a single clock drives both SHA's. Since the duty cycle of the clock may deviate from 50%, only one of the edges must be used for the sampling command in both circuits. Fig. 6(b) illustrates how this is accomplished by clock edge reassignment. Two switches  $S_1$  and  $S_2$  and two "predictive" control signals  $V_{odd}$  and  $V_{even}$  are added to the system. A master clock  $CK_{master}$  with a frequency twice the sampling rate is applied to the two channels through the two switches. The predictive signals alternately enable one of the switches  $S_1$  or  $S_2$ , routing the falling edge of  $CK_{master}$  to either of the SHA's. The timing mismatch is now equal to the delay mismatch between  $S_1$  and  $S_2$  and between the two switches inside SHA<sub>1</sub> and SHA<sub>2</sub>, an error that can be maintained well below 10 ps even with 20% mismatch between the sizes of the switches. The timing of  $V_{odd}$  and  $V_{even}$  is quite relaxed so long as their high level contains the falling edge of  $CK_{master}$  with enough margin. Thus, they can be produced by a simple nonoverlapping clock generator.

In reality, each SHA requires both a rising edge and a falling edge to perform the sample and hold operations. As shown in Fig. 6(c), the falling edges of  $CK_{1x}$  and the rising edges of  $CK_{2x}$  are alternately applied to the SHA's, while the rising edges of  $CK_{1x}$  and the falling edges of  $CK_{2x}$  are discarded. The actual sequence of operation is as follows. First, the falling edge of  $CK_{1x}$  is routed to SHA<sub>1</sub> and the rising edge of  $CK_{2x}$ to SHA<sub>2</sub>. Next, the states of  $CK_1$  and  $CK_2$  are stored. Subsequently, the falling edge of  $CK_{1x}$  is rerouted to SHA<sub>2</sub> and the rising edge of  $CK_{2x}$  to SHA<sub>1</sub>. This concept can be easily extended from two channels to three or more channels. The front-end sample-and-hold circuit used in this work incorporates three channels.

## E. Punctured Interpolation

An important benefit of interpolation is the reduction of the differential nonlinearity resulting from the offset of the preamplifiers. However, integral nonlinearity still remains uncorrected, demanding large input devices. To alleviate the problem, a modification is introduced here. As depicted in Fig. 7(a), the original outputs ( $V_A$ 's) produced by the preamplifiers are fed into another bank of interpolation amplifiers to generate a second set of interpolated outputs ( $V_B$ 's), which, though different from  $V_A$ 's, contain sufficient information to represent the original analog input signal. If the offset components of the adjacent  $V_A$ 's are uncorrelated, the standard deviation of the offsets of the corresponding  $V_B$ 's is reduced by a factor of the square root of two

$$B_1 = \frac{A_1 + A_2}{2} \Rightarrow \sigma_{B_1} = \frac{\sqrt{\sigma_{A_1}^2 + \sigma_{A_2}^2}}{2} = \frac{\sigma_{\text{original}}}{\sqrt{2}}.$$
 (1)

Shown in Fig. 7(b),  $INL_A$  and  $INL_B$  are defined as the maximum error in the zero crossings of  $V_A$ 's and  $V_B$ 's, respectively. If only the *interpolated* zero crossings are sensed by the following stages and the original zero crossings are discarded, then the overall INL is reduced by approximately 30%. Monte Carlo simulations confirm this result. Since some of the outputs are discarded, this method is called punctured interpolation.

The reduction of the INL translates into a higher tolerance of offsets in the preamplifiers, allowing smaller input devices and a two-fold reduction in the capacitance seen by the buffer driving the first stage. Note that the redundancy associated with punctured interpolation is necessary only in the first stage of the





Fig. 9. Realization of a slice of the signal path in the first stage.



Fig. 10. Dual-channel interleaved SHA.

pipeline, where the cumulative gain is still low; in stages 2–5, all zero crossings are utilized. Thus, punctured interpolation is obtained at the cost of a few additional differential pairs.

## F. Effect of Nonlinearity in Interpolation

While the 2x interpolation stage is quite insensitive to the nonlinearity of differential pairs [6], the subsequent interpolation operations do require some linearity. Fig. 8 illustrates the effect. Curves A and B are the original characteristics with the zero-crossing points at  $V_0$  and  $V_2$ . After the first 2x interpolation, curve C is generated with a zero crossing at  $V_1$  and a slope equal to one-half of the slope of the original characteristic. If another 2x interpolation is performed between curves B and C,

the resulting zero crossing must ideally fall midway between  $V_1$  and  $V_2$ , i.e., at  $V_{id}$ . In practice, however, the actual zero crossing  $V_{act}$  deviates from  $V_{id}$  because B and C exhibit different slopes. The difference between  $V_{act}$  and  $V_{id}$  is denoted by  $\delta$ .

In the worst case, curve A is flat for  $V_{in} > V_1$ , and the slope of curve C is equal to one-half of that of B. It can be shown that  $\delta = (V2-V1)/6$  and hence curve D suffers from a DNL of 1/3 LSB. In order to further increase the resolution by 2x interpolation, the linear portion of curves A or B must be extended accordingly. Since after the first stage, four more bits must be detected, a linearity of about 4 bits is required of curves A and B between  $V_0$  and  $V_2$ . For the following stages, fewer bits remain to be determined but the cumulative gain is higher. If the gain of each stage is about two, then all of the stages can use approximately the same amount of source degeneration to ensure a small DNL.

Fig. 9 shows the realization of a slice of the signal path in the first stage. The actual design is fully differential. It is important to note that the converter requires no floating capacitors and can therefore utilize native metal-sandwich structures in digital CMOS technologies.

## **III. CIRCUIT DESIGN AND LAYOUT CONSIDERATIONS**

## A. Front-End Sample-and-Hold Circuit

The front-end SHA plays a critical role in the dynamic behavior of the converter. In order to achieve fast settling, this circuit incorporates a simple top-plate sampling method and a PMOS source follower as shown in Fig. 10. The n-well of the source follower is tied to its source to suppress nonlinearity and gain error due to body effect. Simulations indicate that two such



Fig. 11. Timing and circuit diagrams for a triple-channel interleaved SHA.

followers operating differentially achieve a linearity of about 10 bits. Interleaving is realized in the sampling network by alternately connecting  $C_1$  and  $C_2$  to  $V_{in}$ . Since the source follower is shared between the two channels, gain and offset mismatches arise primarily from the charge injection mismatches of  $S_1$ - $S_4$ . These errors are maintained well below 1 LSB by proper choice of device dimensions and careful layout.

The input-dependent charge injection of  $S_1$  and  $S_3$  does introduce nonlinearity but it is partially cancelled by the charge absorbed by  $S_2$  and  $S_4$ . Also, differential operation as well as large sampling capacitors (1 pF) improve the overall linearity to about 9 bits.

The finite input capacitance of the source follower results in an equivalent resistor connected between the outputs of the two channels, yielding a gain rolloff at high frequencies. From another perspective, the capacitance seen at node X and switches  $S_2$  and  $S_4$  form a switched-capacitor low-pass filter. With proper design, this rolloff is limited to 1 dB at an input frequency of 75 MHz.

In the actual design, the front-end SHA is realized with triple-channel interleaving. This is because the sampling phase is quite faster than the hold/quantization/multiplexing phase, thereby requiring a clock duty cycle of about 30% [Fig. 11(a)]. Since the duty cycle deviates substantially from 50%, it is difficult to employ dual-channel interleaving without any "dead" time. To resolve this issue, the clock period is divided into three equal time slots: one for front-end sampling, one for



Fig. 12. Comparator used in the first stage (CMP\_A).

coarse quantization by the sub-ADC, and one for multiplexing [Fig. 11(b)]. Three clock phases are then used to interleave the three sampling capacitors. (To generate the time slots with reasonable accuracy, the 150-MHz clock is divided by three on the chip.) The actual triple-channel interleaved SHA circuit is shown in Fig. 11(c). For each channel in the main SHA, the operation sequence is: 1) sample, 2) hold, and 3) hold and connect the held sample to the follower. On the other hand, the replica operates in a slightly different sequence: 1) sample, 2) hold and connect the held sample to the follower (whose output is then sensed by the first sub-ADC), and 3) hold.



Fig. 13. Layout floorplan.

#### B. Comparator

The design of the comparators used in the sub-ADC's directly affects the speed and power dissipation of the overall converter. Shown in Fig. 12 is the high-speed comparator utilized in the first sub-ADC. When CK is low,  $S_{b1}$  and  $S_{b2}$  are off,  $S_1-S_4$ are on, and nodes P, Q, X, and Y are precharged to  $V_{DD}$ , placing the comparator in the reset mode. When CK goes high,  $S_{b1}$  and  $S_{b2}$  turn on and  $M_1-M_4$  compare the positive input voltage  $V_{in}^+$  with the positive reference voltage  $V_r^+$  and the negative input voltage  $V_{in}^-$  with the negative reference voltage  $V_r^-$ . Since  $M_5-M_8$  are initially off, the resulting differential current first flows through the total capacitance seen at nodes X and Y, creating a differential voltage at these nodes by the time  $M_7$ and  $M_8$  turn on. After the cross-coupled devices turn on, the circuit regeneratively amplifies the voltage, producing rail-to-rail swings at P and Q.

The comparator of Fig. 12 offers three important properties that make it attractive for high-speed design. First, the static power dissipation is zero. Second, the circuit requires only a single-phase clock, greatly simplifying the routing across the chip. Third, the input offset is dominated by that of the differential pairs rather than by the offset of the cross-coupled devices. To explain this property, we reexamine the comparator in the amplification mode. After CK goes high, the input difference is amplified by  $M_1-M_4$  and the parasitic capacitances at nodes X and Y until  $V_X$  and  $V_Y$  drop below  $V_{\rm DD}$  by  $V_{\rm THN}$ . At this point,  $M_7$  and  $M_8$  turn on but  $M_5$  and  $M_6$  contribute a small regenerative gain until  $M_5$  and  $M_6$  turn on, and initiate the final regeneration. The key point here is that the input is amplified substantially before  $M_5-M_8$  turn on.



Fig. 14. Die photo.

Using SPICE, it is possible to calculate the contribution of  $M_5-M_6$  and  $M_7-M_8$  to the input-referred offset. With the device dimensions chosen in this design, simulations suggest that the offset voltage of  $M_5-M_6$  is divided by a factor of 20 and that of  $M_7$  and  $M_8$  by a factor of two. Since the channel area of  $M_7$  and  $M_8$  is about one-fourth of that of the input devices, they contribute roughly equal amounts of input-referred offset. Simulations and measurements on individual comparators reveal an overall input offset of about 10 mV.

Another important phenomenon in the comparator is the large kickback noise produced at the beginning of reset and regeneration modes. This effect is particularly critical in the first stage and can introduce significant dynamic offsets, saturating the second stage and creating nonlinearity. Adding a pair



Fig. 15. DNL and INL at  $f_{in} = 1.8$  MHz and  $f_{sample} = 150$  MHz.



Fig. 16. FFT at  $f_{in} = 1.76$  MHz.

of cross-coupled capacitors equal to 8 fF at the input reduces the kickback noise. The effect of the capacitors is somewhat process-dependent, but complete cancellation is not necessary because the first sub-ADC is fed by the replica SHA.

The comparators in stages 2-5 are essentially the same as that in the first stage, except that their input network includes multiplexing switches for interleaving. Due to the cumulative gain after stage 1, larger comparator offsets can be tolerated in stages 2-5.

## C. Floor Plan and Layout Considerations

The floor plan and layout of the ADC must deal with issues such as routing of critical paths, power and ground isolation, noise coupling from the digital sections to the analog sections, etc. Due to the nature of sliding interpolation, the high-speed digital control signals must travel through the analog sections. Also, the sub-ADC's in stages 2–5 must be embedded with the interpolating stages. These issues underscore the importance of careful layout to suppress various noise coupling effects.

Fig. 13 shows the floorplan of the ADC. In order to reduce the wiring capacitance in the critical path, the front-end



Fig. 17. SNDR and SFDR at  $f_{\text{sample}} = 150$  MHz.

TABLE I MEASUREMENT SUMMARY

| Technology                                                          | 0.6-µm, 1-poly, 3-metal CMOS       |
|---------------------------------------------------------------------|------------------------------------|
| Resolution                                                          | 8 bits                             |
| DNL                                                                 | 0.62 LSB                           |
| INL                                                                 | 1.24 LSB                           |
| Sampling Rate                                                       | 150 MHz                            |
| SNDR @ f <sub>in</sub> =1.8 MHz<br>f <sub>in</sub> =70 MHz          | 43.7 dB<br>40 dB                   |
| Analog Input Swing                                                  | 1.6 V <sub>p-p</sub>               |
| Input Capacitance                                                   | 1.5 pF                             |
| Active Chip Area                                                    | $1.2 \text{ mm}^2$                 |
| Supply Voltage                                                      | 3.3 V                              |
| Power Consumption<br>Analog<br>Digital<br>Reference Ladder<br>Total | 330 mW<br>53 mW<br>12 mW<br>395 mW |

building blocks in the first stage (CMP\_A, reference ladder, preamplifiers, MUX, and the distributed sample-and-hold) are folded into a U shape. The front-end SHA output and the reference ladder are routed between the comparator bank and the preamplifier bank. The reference ladder is made of silicide poly resistors with a length of two squares (about 8  $\Omega$ ) between consecutive taps. Each preamplifier provides an empty strip so that the digital control signals from CMP\_A to MUX can run through it without interfering with the analog signal path. The digital signals have also been shielded on both sides with analog ground along the entire path. The ROM generates the four corresponding digital bits in the first stage.

Fig. 14 shows the die photo. The chip size is  $1.5 \times 1.2 \text{ mm}^2$  with the active area about  $1.2 \text{ mm}^2$ . The differential analog input signals enter from the left side of the chip and are shielded with a common  $V_{\text{DD}}$  in metal 2. Digital outputs leave the chip from the lower and the right sides of the chip. Three different power lines are used in the layout, one for the analog section, one for the digital section, and one for the first sub-ADC.

The front-end SHA is placed at the left-top corner and right above the reference ladder so that its outputs readily reach the preamplifiers and the first sub-ADC. The high-speed (300-MHz) input clocks and the clock generator are placed on the top of the chip.

The modularity of the design can be seen in stages 2–5. The resulting layout is quite compact and relatively easy to handle in transistor-level simulations.

### **IV. EXPERIMENTAL RESULTS**

The ADC has been fabricated in a 0.6- $\mu$ m single-poly triple-metal CMOS technology. The circuit is tested with a 3.3-V supply with differential input swings of 1.6 V<sub>pp</sub> and a sampling rate of 150 MHz.

Fig. 15 shows the measured DNL and INL profiles obtained from code density (histogram) tests. The maximum values of DNL and INL are 0.61 and 1.24 LSB, respectively.

The dynamic performance of the converter is measured in the frequency domain. Fig. 16 depicts the spectrum of the reconstructed signal at 1.76 MHz, exhibiting harmonics 50 dB below the fundamental and a signal-to-(noise + distortion) ratio (SNDR) of 43.7 dB, which implies that the effective number of bits (ENOB) is equal to 7 bits.

The spurious-free dynamic range (SFDR) and SNDR as a function of the analog input frequency are plotted in Fig. 17. SFDR starts from around 50 dB at low frequencies and reaches about 44 dB at high frequencies. SNDR is about 43.7 dB at low frequencies and about 40 dB (ENOB = 6.5) for frequencies above 40 MHz.

The consistent SNDR performance in the high-frequency range up to the Nyquist rate indicates that the clock edge reassignment technique indeed minimizes timing mismatches in the interleaved system.

Table I summarizes the overall performance. The analog power consumption is higher than expected mainly due to the discrepancy between the target resistance values and the actual values. Since the sheet resistance of the fabricated poly resistors is about 30% higher than that used in the simulations, more power consumption is required to improve the large-signal slewing behavior of the differential pairs.

## V. CONCLUSION

An 8-bit 150-MHz CMOS ADC has been described that incorporates sliding interpolation, distributed sampling, interleaving, clock edge reassignment, and punctured interpolation. The converter utilizes only open-loop circuits to achieve a high speed. A triple-channel interleaved topology and a front-end open-loop sample-and-hold circuit are adopted to reduce the settling time in the critical path. The clock edge reassignment technique relaxes the timing-mismatch problem in multichannel interleaved systems, enhancing the dynamic performance. Sliding interpolation eliminates the need for interstage D/A converters, subtractors, and residue amplifiers in traditional pipelined structures. Parasitic wiring capacitance is used for the distributed sampling scheme. Since the converter requires no floating capacitors, it is well suited to digital CMOS processes. The ADC employs only source followers and differential pairs, avoiding the use of op-amps. Thus, it suffers much less from tradeoffs among gain, supply voltage, and speed. The prototype is functional even with a 2.5-V supply, though it was designed for a 3.3-V system. The simple and modular design of the pipelining structure results in a compact layout, requiring a core area of 1.2 mm<sup>2</sup> in a 0.6- $\mu$ m CMOS process.

## REFERENCES

- C. Lane, "A 10-Bit, 60-MS/s flash ADC," in *Proc. BCTM*, Sept. 1989, pp. 44–47.
- [2] H. Kimura et al., "A 10-b 300-MHz interpolated parallel A/D converter," IEEE J. Solid-State Circuits, vol. 28, pp. 438–446, Apr. 1993.
- [3] A. G. W. Venes and R. J. van de Plassche, "An 80-MHz, 80-mW, 8-b CMOS folding A/D converter with distributed track-and-hold preprocessing," *IEEE J. Solid-State Circuits*, vol. 31, pp. 1846–1853, Dec. 1996.
- [4] W. Black and D. A. Hodges, "Time interleaved converter arrays," *IEEE J. Solid-State Circuits*, vol. SC-15, pp. 1022–1029, Dec. 1980.
- [5] Y. C. Jenq, "Digital spectra of nonuniformly sampled signals: Fundamentals and high-speed waveform digitizers," *IEEE Trans. Instrum. Meas.*, vol. 37, pp. 245–251, June 1988.
- [6] B. Razavi, *Principles of Data Conversion System Design*. New York: IEEE Press, 1995.



**Yun-Ti Wang** was born in Taiwan, R.O.C. He received the B.S. degree from National Taiwan University in 1984 and the M.S. and Ph.D degrees in electrical engineering from the University of California, Los Angeles, in 1989 and 1999, respectively.

He worked for Cirrus Logic from 1993 to 1995. Since 1999, he has been with Silicon Bridge. His main interests are in high-speed mixed-signal integrated circuit designs, including active filters, A/D and D/A converters, PLL's, and high-speed transceivers.



**Behzad Razavi** (S'87–M'90) received the B.Sc. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1985 and the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988 and 1992, respectively.

He was with AT&T Bell Laboratories, Holmdel, NJ, and subsequently Hewlett-Packard Laboratories, Palo Alto, CA. Since September 1996, he has been an Associate Professor of electrical engineering at the University of California, Los Angeles. His current re-

search includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995. He is a member of the Technical Program Committees of the Symposium on VLSI Circuits and the International Solid-State Circuits Conference (ISSCC), in which he is the chair of the Analog Subcommittee. He is the author of *Principles of Data Conversion System Design* (New York: IEEE Press, 1995), *RF Microelectronics* (Englewood Cliffs, NJ: Prentice-Hall, 1998), and *Design of Analog CMOS Integrated Circuits* (New York: McGraw-Hill, 2000), and the editor of *Monolithic Phase-Locked Loops and Clock Recovery Circuits* (New York: IEEE Press, 1996).

Prof. Razavi received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998. He has also served as Guest Editor and Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS and International Journal of High Speed Electronics.