# An 8 Bit 4 GS/s 120 mW CMOS ADC

Hegong Wei, Member, IEEE, Peng Zhang, Bibhu Datta Sahoo, and Behzad Razavi, Fellow, IEEE

*Abstract*—A time-interleaved ADC employs four pipelined time-interleaved channels along with a new timing mismatch detection algorithm and a high-resolution variable delay line. The digital background calibration technique suppresses the interchannel timing mismatches, achieving an SNDR of 44.4 dB and a figure of merit of 219 fJ/conversion-step in 65 nm CMOS technology.

*Index Terms*—Analog-to-digital conversion, interleaving, pipelined analog-to-digital converter (ADC), time error detection and correction, timing calibration, variable delay lines.

### I. INTRODUCTION

**F** OR a given resolution, the power consumption of analog-to-digital converters (ADCs) rises linearly with the speed up to some point and then begins to ascend at an increasingly higher rate. Consequently, the figure of merit (FOM) remains relatively constant for slower designs and tends to degrade for faster converters. With time interleaving, on the other hand, each channel is granted a longer conversion cycle, thus returning to the linear power-speed region. For example, [1]–[3] employ interleaving to reach a sampling rate of several gigahertz with a resolution of 10 to 12 bits, but they rely on careful layout to minimize interchannel mismatches.

This paper presents an 8 bit 4 GS/s interleaved ADC incorporating a new timing mismatch calibration technique [4]. The proposed technique does not require digital multiplication and, therefore, lends itself to a low-power, low-complexity implementation. A low-jitter, high precision timing correction method is also introduced. With four interleaved pipelined channels, the ADC achieves an FOM of 219 fJ/conversion-cycle in 65 nm CMOS technology.

Section II provides the background for this work, dealing with interleaving issues and the tolerable imperfections. Section III describes the proposed timing mismatch calibration technique, and Section IV presents the ADC implementation. Section V summarizes the experimental results.

Manuscript received December 07, 2013; revised March 05, 2014; accepted March 06, 2014. Date of publication April 08, 2014; date of current version July 21, 2014. This paper was recommended by Guest Editor Ken Suyama. This work was supported by the DARPA HEALICS Program, Realtek Semiconductor, and Pullman Lane Productions.

H. Wei and B. Razavi are with the Electrical Engineering Department, University of California, Los Angeles, CA 90095-1594 USA (e-mail: weihegong@gmail.com; razavi@ee.ucla.edu).

P. Zhang was with Tsinghua University, Beijing 100036, China. He is now with Marvell Semiconductor, Ltd., Beijing 100084, China (e-mail: 13488862166@qq.com).

B. D. Sahoo is with Amrita University, Amritapuri, Kerala, India (e-mail:bsahoo@gmail.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2014.2313571

# II. BACKGROUND

# A. Interleaving Considerations

ADC architectures generally entail "timing overheads" that only weakly scale with power dissipation [5]. In a pipelined system, for example, the sub-ADC response, the digital-to-analog converter (DAC) settling, the nonoverlap time of the clocks, and the rise and fall times of the clocks are ultimately dictated by the technology, placing a lower bound on the conversion cycle even if power dissipation is unimportant. For example, the 1 GHz ADC design described in [6] and used here for each channel exhibits the following values in 65 nm technology: a sub-ADC response of 180 ps, a DAC time constant of 45 ps, a nonoverlap time of 50 ps, and a clock transition of 20 ps. Thus, with about six time constants necessary for the DAC settling, it becomes exceedingly difficult to accommodate conversion times well below 500 ps even if the residue amplifiers in each stage bear a linear power-speed tradeoff.

This situation naturally calls for interleaving, ideally by a factor sufficiently large to make the weakly scalable timing overheads only a small fraction of the per-channel cycle. Such a choice would allow operation in the linear power–speed region, thereby affording the lowest FOM. However, several factors oppose increasing the number of channels: 1) a direct area penalty; 2) a proportionally higher input capacitance, which may demand a power-hungry buffer [5]; and 3) additional mismatches due to the routing of the analog input and clock phases to the channels. A compromise is therefore necessary. This work employs four channels.

With interleaving comes interchannel mismatches, demanding calibration techniques [7]–[16]. The correction of offset and gain mismatches is fairly straightforward [17], [18], but the timing mismatch presents a greater challenge and is the focus of this work.

## B. Tolerable Imperfections

Before developing a calibration algorithm for timing mismatches, we must determine the maximum tolerable imperfections that remain after the system is calibrated. Specifically, we must decide that: 1) hog mismatch is acceptable and 2) how much jitter the timing correction, if performed in the analog domain, can contribute.

To address the first point, we compute the signal-to-noise ratio (SNR) penalty resulting from timing mismatches. It can be shown that, for four M-bit interleaved ADCs sensing a sinusoidal input of frequency  $f_{in}[5]$ , we have

$$SNR = \frac{1}{\frac{1}{6} \left(\frac{2}{2^M}\right)^2 + \frac{9\pi^2 \Delta T^2 f_{in}^2}{4}}$$
(1)

1751

0018-9200 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Maximum tolerable timing mismatch for different SNR penalties for  $f_{\rm in} = 2$  GHz.

where  $\Delta T$  represents the rms mismatch of the second, third, and fourth channels with respect to the first. Fig. 1 plots the maximum tolerable  $\Delta T$  for different SNR penalties if M = 8and  $f_{\rm in} = 2$  GHz. We observe that a 1 dB penalty translates to  $\Delta T < 180$  fs. In practice, we aim for even smaller residual mismatches because other imperfections such as clock jitter and ADC electronic noise also demand their own budget in the denominator of (1).

For the second point, namely the jitter [5], we also write

$$SNR = \frac{1}{\frac{1}{6} \left(\frac{2}{2^{M}}\right)^{2} + 4\pi^{2} f_{in}^{2} \sigma_{t}^{2}}$$
(2)

where  $\sigma_t$  is the rms jitter. In this case, a 1 dB penalty dictates  $\sigma_t < 130$  fs if M = 8 and  $f_{in} = 2$  GHz. For the same reason as above, the jitter produced by the correction circuit must fall well below this value.

### **III. PROPOSED TIMING-MISMATCH CALIBRATION**

Numerous timing-mismatch techniques have been proposed for interleaved ADCs [5], [7], [9]–[11], [13], [16], [19]. Among these, the works [9] and [16] require an extra channel for calibration, the work in [19] has a limited input bandwidth, those in [5], [9]–[11] demand digital multipliers, and those in [7] and [13] employ long FIR filters.

As with other types of errors, the timing mismatch can be removed by performing two functions, namely, detection and correction, with the former lending itself better to digital domain. For the latter, we can choose: 1) the digital domain and hence a sufficiently long high-speed FIR filter in the output data path [7], which can consume a high power, or 2) the analog domain and



Fig. 2. Two-channel ADC: (a) waveform showing effect of timing error and (b) timing mismatch detection block diagram.

hence a variable delay line (VDL), which can add jitter. In this work, the detection and correction are realized in the digital and analog domains, respectively, and operate in the background.

### A. Timing-Mismatch Detection

The proposed detection method incorporates only registers and digital adders. We first describe the idea for two interleaved channels. Suppose, as shown in Fig. 2(a), channel 1 samples the analog input x(t) at  $t = t_1$  and  $t_3$  and channel 2 at  $t = t_2$ , where  $t_2$  is offset from its ideal value by  $\Delta T$ . This means that the time difference between samples  $x_1$  and  $x_2$  is  $2\Delta T$  seconds greater (or less) than that between  $x_2$  and  $x_3$ . Let us now form two differences  $x_2 - x_1$  and  $x_3 - x_2$ , and note intuitively that they would exhibit equal averages if  $\Delta T$  were zero. In other words, we surmise that the average value of  $|x_2 - x_1| - |x_3 - x_2|$  is proportional to  $\Delta T$ .<sup>1</sup> It is difficult to prove this conjecture directly, but if we approximate the absolute value operation by a squaring function, we can develop some insight.

Our objective is to prove that the average difference between  $(x_2 - x_1)^2$  and  $(x_3 - x_2)^2$  is proportional to  $\Delta T$ . We write the expectation of  $(x_2 - x_1)^2$  as

$$E[(x_2 - x_1)^2] = E[x_2^2] + E[x_1^2] - 2E[x_2x_1]$$
(3)  
=  $\sigma_{x_2}^2 + \sigma_{x_1}^2$ 

$$-2E[x(t_1 + T_S + \Delta T)x(t_1)]$$
 (4)

where  $T_S$  denotes the nominal sampling period and  $\sigma^2$  is the average power. Since the expectation on the right-hand side of (4) is in fact the autocorrelation of x(t),  $R(\tau)$ , evaluated at  $T_S + \Delta T$ , we have

$$E[(x_2 - x_1)^2] = 2\sigma_x^2 - 2R(T_S + \Delta T).$$
 (5)

<sup>1</sup>The absolute values are necessary to ensure consecutive samples do not cancel.

Similarly, the average value of  $(x_3 - x_2)^2$  is equal to

$$E[(x_3 - x_2)^2] = 2\sigma_x^2 - 2R(T_S - \Delta T).$$
 (6)

For a small  $\Delta T$ ,  $R(T_S \pm \Delta T) \approx R(T_S) \pm \Delta T dR/d\tau$ , yielding the difference between the averages as

$$E[(x_2 - x_1)^2] - E[(x_3 - x_2)^2] \approx -4\Delta T \frac{dR}{d\tau}.$$
 (7)

The difference therefore reveals the magnitude and sign of the timing mismatch if  $dR/d\tau$  does not vanish at  $\tau = T_S$ . We prove in Appendix I that, for a signal whose bandwidth is limited to  $f_S/2$ , the autocorrelation's derivative cannot be zero at  $\tau = T_S$ . The foregoing analysis suggests that the timing mismatch between two channels can be obtained by performing four operations: 1) delay  $x_1$  and  $x_2$  by  $T_S$  seconds; 2) subtract the results from  $x_2$  and  $x_3$ , respectively; 3) calculate the absolute value of each difference; and 4) take the average of the difference between these two differences. Fig. 2(b) depicts the high-level implementation.

We now extend the above concepts to four interleaved channels. To this end, we consider the waveform shown in Fig. 3(a) and view the first channel's sampling times  $t_1$  and  $t_5$  as ideal. We must then compute the timing mismatch of channels 2, 3, and 4 with respect to channel 1. This calculation proceeds in two steps: 1) detect and correct the mismatch between channel 3 and channel 1, making the third channel ideal and 2) detect and remove the other two mismatches while relying on the corrected channel 3. The first step evaluates  $|x_3 - x_1| - |x_5 - x_3|$  and the second evaluates  $|x_2 - x_1| - |x_3 - x_2|$  and  $|x_4 - x_3| - |x_5 - x_4|$ . Shown in Fig. 3(b), the implementation produces the three errors as  $e_{2,1}$ ,  $e_{3,1}$ , and  $e_{4,1}$ . (As explained below, initially  $e_{3,1}$  returns to the correction circuit to remove the mismatch between channels 1 and 3 while  $e_{2,1}$  and  $e_{4,1}$  remain idle.)

In summary, the proposed detection algorithm operates in the digital domain, requires only adders and registers, needs no redundancy in the analog domain, and affords low-cost, lowpower background calibration (Section IV).

# B. Simulation Results

The mismatch detection technique can be verified with different analog inputs. Fig. 4 shows, as an example,  $e_{2,1}$  as a function of the timing mismatch between channels 2 and 1 for a sinusoidal and a random, band-limited input. We observe that the error varies monotonically and crosses zero at  $\Delta T = 0$ .

In order to ensure convergence of the calibration loop, we construct a MATLAB behavioral model consisting of four mismatched sampling channels, the mismatch detector of Fig. 3(b), and VDLs for clock phase adjustment. We then apply a multitone or random input and examine the control of the VDLs as a function of time and the overall output spectrum before and after calibration. Fig. 5(a) shows, for a three-tone input, the time behavior of one of the VDL controls, and Fig. 5(b) and (c) shows the corresponding spectra. The sampling rate per channel is 625 MHz, and each cycle collects 16 000 points. We observe that the loop settles in about 15 cycles and the spurs fall below



Fig. 3. Four-channel ADC: (a) waveform showing effect of timing error and (b) timing mismatch detection block diagram.



Fig. 4. Simulated error as a function of timing mismatch with different types of inputs (arbitrary vertical scale).

the noise floor. Fig. 6 repeats the simulation with a random input whose bandwidth is limited to 100 MHz so as to illustrate the effect of the mismatches clearly. The effect of timing mismatch manifests itself as several local peaks in the spectrum, vanishing after calibration.

Also of interest is the performance of the mismatch calibration algorithm in the presence of clock jitter. We recognize that



Fig. 5. Simulated (a) VDL convergence, and ADC output spectrum (b) before and (c) after calibration for a multitone input.

the clock jitter experiences low-pass shaping and subsequently manifests itself in the delay adjustment. As mentioned above, the mismatch is measured over 16 000 points, thus averaging out jitter components that vary sufficiently fast in this time interval. With a sampling rate of 4 GHz, this averaging is roughly equivalent to applying a low-pass filter to jitter with a corner frequency of  $1/(400 \ \mu) = 250 \text{ kHz}$ . Since the bandwidth of the phase-locked loop generating the 1 GHz clock is typically much greater, the jitter energy in this bandwidth is negligible.

# C. Misdetection Considerations

Timing mismatch calibration techniques can be prone to misdetection and or divergence in the presence of certain inputs. For example, the detection method described in [7] generates an incorrect dc value in response to two input tones at  $f_S/4$  and  $3f_S/4$ , thus prohibiting convergence. It is therefore important to study such issues for the proposed approach.

The mismatch detection scheme illustrated in Fig. 3 has a "singularity" at  $f_{in} = f_S/2$ , i.e., it generates a zero error if



Fig. 6. Simulated (a) VDL convergence, and ADC output spectrum (b) before, and (c) after calibration for a random input.



Fig. 7. Sampling at  $f_{in} = f_S/2$ .

the input contains only a tone at  $f_S/2$ . This can be explained by noting that the proposed technique requires at least four unequal consecutive samples to provide a measure of  $\Delta T$ , but, as shown in Fig. 7, the case of  $f_{\rm in} = f_S/2$  yields only two such samples.

While the zero error for a single tonal input at  $f_S/2$  implies that the calibration loop fails, it also means that if the input contains additional frequency components, then the loop converges.



Fig. 8. Simulated VDL convergence for (a) single tone at  $f_S/2$  and (b) such a tone plus a random signal.



Fig. 9. ADC architecture.

For example, we surmise that such a tone along with a band-limited signal creates a meaningful output error. This scenario is more realistic as an ADC typically digitizes a random signal while possibly sensing some leakage at  $f_S/2$  as well. Fig. 8 repeats the plot in Fig. 6 for the two cases, namely, a single tone at  $f_S/2$  and such a tone plus a random band-limited signal. We observe that the former does not converge but the latter does. This is another important advantage of the proposed detection technique over multiplication-based algorithms.



Fig. 10. Front-end bootstrapping circuit for each channel.



Fig. 11. Simulated (a) THD and (b) attenuation of input sampler.

# IV. ADC IMPLEMENTATION

## A. ADC Architecture

Shown in Fig. 9, the 65 nm CMOS ADC prototype consists of four pipelined channels, a phase generator, and a phase-correction circuit. The pipeline is based on the design in [6], in-corporating a 4 bit first stage, seven 1.5 bit stages, and a 2 bit last stage. The multiplexed and downsampled outputs are sent off-chip for three different calibration tasks: 1) per-channel calibration to remove the gain error of the pipelined stages due to capacitor mismatch and the finite gain of the residue amplifiers [6]; 2) interchannel offset and gain mismatch correction; and 3) timing mismatch detection as proposed in Fig. 3(b). The results created by this detection travel back to the chip on a serial bus and drive the phase correction circuit so as to suppress the timing mismatches.

The maximum analog input frequency that can be digitized by an interleaved system is ultimately limited by the front-end sampling circuit in each channel. Based on the ADC in [6], this design allocates 25% of the 1 GHz per-channel clock period to



Fig. 12. Implementation of (a) phase generator, (b) latch used in the divider, (c) 25% duty cycle logic, and (d) retiming logic.

sampling and 75% to conversion. Thus, the sampler must acquire in 250 ps a 2 GHz full-scale input signal with sufficient linearity *and* acceptable attenuation. Fortunately, bootstrapping affords such a performance in 65 nm CMOS technology. Fig. 10 shows the bootstrapping circuit (adopted from [20]) used in each channel, and Fig. 11 plots the simulated total harmonic distortion (THD) and voltage gain of the sampler as a function of the input frequency for a sampling rate of 1 GHz. We observe a performance exceeding 10 bits for frequencies up to 2 GHz.

# B. Phase Correction and Detection

The interleaved system requires four 1 GHz clock phases, each having a 25% duty cycle so as to allow 250 ps for sampling and 750 ps for conversion in each channel. As shown in Fig. 12(a), a 4 GHz input clock is divided by two twice and the 1 GHz phases are logically combined to generate outputs  $\phi_0-\phi_{270}$  with a duty cycle of 25%. Fig. 12(b) shows the latch topology used in each  $\div$ 2 circuit and Fig. 12(c) the 25% duty cycle logic. The overall phase generator consumes 17 mW at full rate.

The above chain is roughly equivalent to a cascade of 11 gates, accumulating significant jitter. According to simulations, the falling edges of  $\phi_0$  exhibit a total rms jitter of 53 fs. As explained in Section I, a smaller jitter is desirable so as to minimize the SNR penalty. In addition, the second  $\div 2$  circuit and the duty cycle logic in Fig. 12(a) contribute substantial phase mismatches. Both of these effects can be suppressed through the use of retiming [Fig. 12(d)]. Gating  $\phi_0$ - $\phi_{270}$ , the falling edge of 2 GHz clock now defines the sampling points created by  $\phi'_0$ - $\phi'_{270}$ , removing the above jitter and mismatch components. The jitter observed in the retimed phases is about 31 fs.

The phase-correction circuit employs analog VDLs and appears in all four clock paths to avoid systematic skews. This circuit must provide: 1) a delay tuning range wide enough to accommodate the maximum anticipated mismatch,  $\Delta T_{\text{max}}$ , and 2) a sufficiently fine step size,  $\Delta T_{\text{min}}$ , to minimize the SNR penalty. From floor plan considerations, we select  $\Delta T_{\text{max}} = 3$  ps and from Section II, we target  $\Delta T_{\text{min}} = 30$  fs, arriving at a resolution of about 7 bits.

In addition to  $\Delta T_{\rm max}$  and  $\Delta T_{\rm min}$ , two other factors govern the design of the VDL. First, the jitter must remain well below the value of 130 fs computed in Section II, calling for a *short* delay line. Second, the delay control must be somewhat linear to avoid sharp changes, especially at the ends of the characteristic. For example, the starved inverter shown in Fig. 13(a) exhibits a slow rise in its delay as  $V_{\rm cont}$  decreases from  $V_{DD}$ , but a fast change as  $V_{\rm cont}$  approaches the threshold voltage of  $M_3 (\approx 300 \text{ mV})$ . With process, temperature, and supply variations, it is difficult to obtain a wide range and yet avoid the very nonlinear region.

In order to linearize the characteristics of starved inverters and achieve a fine resolution, we incorporate one transistor that is always on in parallel with another device whose on-resistance is controlled. Depicted in Fig. 13(b), the circuit provides a maximum delay bounded by  $(W/L)_4$  (when  $M_3$  is off). Fig. 13(c) plots the simulated delay of the original and the modified inverters as function of the control voltage. The new circuit's delay range, however, falls short of the 3 ps target. We now extend the idea to the retiming NAND gate as shown in Fig. 12(d), where a one-bit coarse control shifts the previous characteristic up or down by 2 ps. The fine delay adjustment is realized by  $M_3$ , whose gate voltage can be varied from  $V_1 \approx V_{\rm TH}$  to  $V_2$  in 64 steps. With  $W_4 = 2.25W_3$  and  $L_4 = L_3$ , this scheme pro-



Fig. 13. Implementation of (a) starved inverter, (b) modified inverter, (c) simulated delays, and (d) proposed VDL.



Fig. 14. Simulated delay versus VDL code under different process corner.

vides a  $\Delta T_{\min}$  of 30 fs. Fig. 14 plots the simulated delay as a function of the control code for different process corners, displaying a variation of about 0.5 ps. (The discontinuity at code 63 results from the overlap between the coarse and fine sections, a precaution necessary to avoid "dead zones" in the delay characteristic. Since the search begins with the coarse bit, this non-monotonicity does not prohibit convergence.) Simulations also reveal a total delay variation of 0.7 ps as the temperature varies from 0 °C to 80 °C and 150 fs as the supply varies by ±50 mV.

It is possible to observe the response of the calibration loop to a change in the supply of the delay line. Since the mismatch be-



Fig. 15. Simulated calibration code convergence with a supply drop.



Fig. 16. Prototype ADC micrograph.

tween the clock paths slightly varies with the supply, the system must settle to new codes. Fig. 15 plots as an example the code for the second delay line as the supply steps by 100 mV at calibration cycle 6. In this transient simulation, the back end collects 16 000 points for each cycle and accordingly adjusts the VDL so as to minimize  $e_{2,1}$ .

# C. Logic Complexity and Power Consumption

Though realized off-chip in Matlab, the calibration logic has also been investigated in 65 nm technology so as to estimate its complexity and power consumption. The detection scheme of Fig. 3 requires four registers (delay elements), nine subtractors, six absolute value operations,<sup>2</sup> and three averaging blocks, all with a word length of 12 bits. These functions translate to approximately 800 gates. To estimate the power consumption, we assume an average fanout of 3 and hence a load capacitance of about 7.5 fF for each gate. If all 800 gates toggle at 1 GHz, the logic draws 8.6 mW from a 1.2 V supply. As proposed in [5], the detection need not be active at all times and can operate in short, infrequent bursts while tracking temperature variations. Such a timing would further reduce the power.

<sup>2</sup>The absolute value is calculated by inverting the sign bit.



Fig. 17. Measured DNL and INL at a clock rate of 4 GS/s, (a) before , and (b) after per-channel calibration.



Fig. 18. Measured (a) timing calibration code convergence and (b)SNDR during convergence of timing calibration.

# V. EXPERIMENTAL RESULTS

The four-channel ADC, including the phase generation and correction circuits, has been fabricated in TSMC's 65 nm digital CMOS technology. Shown in Fig. 16 is a photograph of the die, whose active area measures 900  $\mu$ m × 1500  $\mu$ m. The four ADCs are stacked, with the analog input and the clock entering from the center left and traveling to the four channels.

In order to facilitate testing and characterization, the outputs of the channels are multiplexed and downsampled by a factor of 625 on the chip. The ADC is mounted directly on a printedcircuit board and tested with a 1.2 V supply. All of measurement results are reported for a sampling rate of 4 GHz.



Fig. 19. Measured output spectrum (a) before, and (b) after timing mismatch calibration (decimated by a factor of 625).



Fig. 20. Measured SNDR as a function of  $f_{in}$  at 4 GS/s.

Plotted in Fig. 17 are the overall differential nonlinearity (DNL) and integral nonlinearity (INL)<sup>3</sup> before and after per-channel calibration. The peak DNL drops to -0.75 LSB

<sup>3</sup>The INL errors at 4-GHz sampling rate arise primarily from the incomplete settling of the reference voltages in the first stage of the pipeline.



Fig. 21. Multiplication of spectra.

 TABLE I

 COMPARISON WITH STATE-OF-THE-ART DESIGNS

|                       | This    | JSSC'12 | ISSCC'13 | ISSCC'11    |
|-----------------------|---------|---------|----------|-------------|
|                       | Work    | [4]     | [5]      | [6]         |
| f <sub>S</sub> (GS/s) | 4       | 3       | 3.6      | 2.6         |
| SNDR                  | 44.4    | 40      | 47.5     | <b>19 E</b> |
| @ Nyq. (dB)           | 44.4    | 43      | 47.5     | 40.5        |
| Supply (V)            | 1.2/1.4 | 2.5     | 1.2/2.5  | 1.2/1.3/1.6 |
| Power (mW)            | 120     | 500     | 795      | 480         |
| FOM (fJ)              | 219     | 724     | 1140     | 1000        |
| Tech. (nm)            | 65      | 40      | 65       | 65          |
| Area (mm²)            | 1.35    | 0.4     | 7.4      | 5.12        |

and the peak INL to -1.5 LSB. Fig. 18(a) plots as a function of time the measured calibration codes driving the VDLs in channels 2–4. This test is performed with a full-scale sinusoid at 2 GHz. We observe that first channel 3 converges and then channels 2 and 4. From these codes, we estimate the following timing mismatches as an example:  $\Delta T_{1,2} = 3$  ps,  $\Delta T_{1,3} = 0$  ps,  $\Delta T_{1,4} = 1.28$  ps.

Fig. 19 displays the measured output spectra with a 1.89-GHz input before and after calibration. The spurs resulting from timing mismatches fall to about -60 dB and the SNDR rises from 38.4 dB to 44.4 dB. Fig. 20 plots the measured SNDR as a function of the analog input frequency, indicating 2 dB of degradation near the Nyquist rate. Fig. 18(b) plots the measured SNDR during this convergence, revealing significant degradation at the beginning due to the uncorrected timing mismatches. The system was allowed to calibrate the timing mismatches in the background for each input frequency. With a random input, the calibration code would settle to an intermediate value so as to minimize the errors  $e_{2,1}$ ,  $e_{3,1}$ , and  $e_{4,1}$  in Fig. 3(b).

The ADC draws 120 mW: 57 mW in the analog section, 46 mW in the digital section, and 16 mW in the four reference ladders used in the first sub-ADCs of the pipelined channels.

Table I compares our prototype's performance to that of recent gigahertz ADCs with an SNDR range of 44 to 49 dB.

### VI. CONCLUSION

This paper proposes an efficient digital background timing calibration algorithm and a high-resolution delay adjustment circuit. Avoiding redundancy or digital multipliers, the proposed difference-based method affords precise calibration for different types of input signals and covers a wide frequency band. Using these concepts, a 4 GS/s time-interleaved ADC exhibits an SNDR of 44.4 dB and an FOM of 219 fJ/conversion-step at the Nyquist rate.

## APPENDIX

Here, we prove that, if the bandwidth of a signal is limited to  $f_S/2$ , then the derivative of its autocorrelation is not zero at  $\tau = T_S(=1/f_S)$ .

It can be shown [21] that the derivative of the autocorrelation is given by

$$\frac{dR}{d\tau}(\tau) = j \int_{-\infty}^{+\infty} (2\pi f) S_x(f) e^{j2\pi f\tau} df \tag{8}$$

where  $S_x(f)$  denotes the signal spectrum. We write  $\exp(j2\pi f\tau) = \cos(2\pi f\tau) + j\sin(2\pi f\tau)$  and replace  $\tau$  with  $T_S$ . It follows that

$$\frac{dR}{d\tau}(\tau = T_S) = j \int_{-\infty}^{+\infty} (2\pi f) S_x(f) \cos(2\pi f\tau) df$$
$$- \int_{-\infty}^{+\infty} (2\pi f) S_x(f) \sin(2\pi f\tau) df. \quad (9)$$

The first integral is equal to zero because  $(2\pi f)S_x(f)\cos(2\pi fT_S)$  is an odd function. To

investigate the second integral, we recognize from Fig. 21(a) that  $(2\pi f)S_x(f)$  is an odd function with a bandwidth confined to  $\pm f_S/2$ . We now multiply this function by  $\sin(2\pi fT_S)$  and integrate the result. As shown in Fig. 21(b), the odd symmetry of the former and the even symmetry of the latter guarantee that the product has a finite, positive area and hence  $dR/d\tau(\tau = T_S) \neq 0$ .

### ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle Program for chip fabrication.

## REFERENCES

- C. Chen et al., "A 12-bit 3 GS/s pipeline ADC with 0.4 mm<sup>2</sup> and 500 mW in 40 nm digital CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 1013–1021, Apr. 2012.
- [2] E. Janssen et al., "An 11b 3.6GS/s time-interleaved SAR ADC in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2013, pp. 464–465.
- [3] K. Doris et al., "A 480 mW 2.6 GS/s 10b 65 nm CMOS time-interleaved ADC with 48.5 dB SNDR up to Nyquist," in *IEEE ISSCC Dig. Tech. Papers*, 2011, pp. 180–181.
- [4] H. Wei, P. Zhang, B. Sahoo, and B. Razavi, "An 8-bit 4-GS/s 120-mW CMOS ADC," in *Proc. CICC*, Sep. 2013, pp. 1–4.
- [5] B. Razavi, "Design considerations for interleaved ADCs," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1806–1817, Aug. 2013.
- [6] B. D. Sahoo and B. Razavi, "A 10-b 1-GHz 33-mW CMOS ADC," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1442–1452, Jun. 2013.
- [7] S. M. Jamal et al., "A 10-b 120-Msample/s time-interleaved analog-todigital converter with digital background calibration," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1618–1627, Dec. 2002.
- [8] C.-C. Huang, C.-Y. Wang, and J.-T. Wu, "A CMOS 6-bit 16-GS/s time-interleaved ADC using digital background calibration techniques," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 848–858, Apr. 2011.
- [9] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit time-interleaved flash ADC with background timing skew calibration," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 838–847, Apr. 2011.
- [10] J. A. McNeill *et al.*, "Split ADC calibration for all-digital correction of time-interleaved ADC errors," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 56, no. 5, pp. 344–348, May 2009.
- [11] A. Haftbaradaran and K. W. Martin, "A sample-time error compensation technique for time-interleaved ADC systems," *Proc. CICC*, pp. 341–344, Sep. 2007.
- [12] J. Elbornsson, F. Gustafsson, and J.-E. Eklund, "Blind equalization of time errors in a time-interleaved ADC system," *IEEE Tran. Signal Process.*, vol. 53, no. 4, pp. 1413–1424, Apr. 2005.
- [13] S. Huang and C. Levy, "Adaptive blind calibration of timing offset and gain mismatch for two-channel time-interleaved ADCs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 53, no. 6, pp. 1278–1288, Jun. 2006.
- [14] V. Divi and G. W. Wornell, "Blind calibration of timing skew in timeinterleaved analog-to-digital converters," *IEEE J. Sel. Topics Signal Process.*, vol. 3, no. 6, pp. 509–522, Jun. 2009.
- [15] C.-Y. Wang and J.-T. Wu, "A background timing-skew calibration technique for time-interleaved analog-to-digital converters," *IEEE Trasn. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 4, pp. 299–303, Apr. 2006.
- [16] D. Stepanovic and B. Nikolic, "A 2.8-GS/s 44.6-mW time-interleaved ADC achieving 50.9 SNDR and 3-dB effective resolution bandwidth of 1.5 GHZ in 65-nm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp. 84–85.
- [17] Y. C. Jenq, "Digital spectra of nonuniformly sampled signals: Fundamentals and high-speed waveform digitizers," *IEEE Trans. Instrum. Meas.*, vol. 37, no. 6, pp. 245–251, Jun. 1988.
- [18] N. Kurosawa et al., "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.*, vol. 38, no. 3, pp. 261–271, Mar. 2001.

- [19] H. Jin, "A digital-background calibration technique for minimizing timing-error effects in time-interleaved ADCs," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 47, no. 7, pp. 603–613, Jul. 2000.
- [20] A. M. Abo and P. R. Gray, "A 1.5-V, 10-Bit, 14.3-MS/s CMOS Pipeline Analog-to-Digital Converter," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 599–606, May 1999.
- [21] J. S. Bendat and A. G. Piersol, Random data: Analysis and measurement procedures, 4th ed. ed. Hoboken, NJ, USA: Wiley, 2010.



**Hegong Wei** (S'05–M'10) received the B.Sc., M.Sc., and Ph.D. degrees (with honors) in electrical and electronics engineering from the University of Macau, Macao SAR, China, in 2006, 2008, and 2011, respectively.

He was a Project Leader with the State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau. He is currently a Postdoctoral Fellow with the Circuit Communication Laboratory, University of California, Los Angeles, CA, USA. His research interests include high-speed and high-performance data

converters and mixed-signal circuits design, and he has authored or coauthored over 20 technical journals and conference papers in this field.

Dr. Wei was the recipient of the Silk-Road Award at ISSCC 2011.



**Peng Zhang** received the B.S. degree in electronic engineering from the University of Science and Technology, Beijing, China, in 2009, and the M.S. degree from the Institute of Microelectronics, Tsinghua University, Beijing, China, in 2012.

He is currently an Analog IC Design Engineer with Marvell Technology Group, Ltd., Beijing, China, where he works on integrated power management circuits and data converters.



**Bibhu Datta Sahoo** received the B.Tech. degree in electrical engineering from the Indian Institute of Technology, Kharagpur, India, in 1998, the M.S.E.E. degree from the University of Minnesota, Minneapolis, MN, USA, in 2000, and the Ph.D.E.E. degree from the University of California, Los Angeles, CA, USA, in 2009.

From 2000 to 2006, he was with DSP Microelectronics Group, Broadcom Corporation, Irvine, CA, USA, where he designed analog and digital integrated circuits for signal-processing applications.

From December 2008 to February 2010, he was with Maxlinear Inc., Carlsbad, CA, USA, where he was involved in designing integrated circuits for CMOS TV tuners. From March 2010 to November 2010, he was a Post-Doctoral Researcher with the University of California, Los Angeles, CA, USA. From December 2010 to December 2011, he was an Assistant Professor with the Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, India. Since December 2011, he has been an Associate Professor with the Department of Electronics and Communication Engineering, Amrita University, Amritapuri, India. His research interests include data conversion, signal processing, and analog circuit design.

Dr. Sahoo was the recipient of the 2008 Analog Devices Outstanding Student Designer Award.



**Behzad Razavi** (F'03) received the B.S.E.E. degree from Sharif University of Technology, Tehran, Iran, in 1985, and the M.S.E.E. and Ph.D.E.E. degrees from Stanford University, Stanford, CA, USA, in 1988 and 1992, respectively.

He was with AT&T Bell Laboratories and Hewlett-Packard Laboratories until 1996. Since 1996, he has been Associate Professor and subsequently Professor of electrical engineering with University of California, Los Angeles, CA, USA. He was an Adjunct Professor with Princeton Uni-

versity, Princeton, NJ, USA, from 1992 to 1994, and with Stanford University, Stanford, CA, USA, in 1995. He is the author of *Principles of Data Conversion System Design* (IEEE, 1995), *RF Microelectronics* (Prentice-Hall, 1998, 2012, translated to Chinese, Japanese, and Korean), *Design of Analog CMOS Integrated Circuits* (McGraw-Hill, 2001, translated to Chinese, Japanese, and Korean), *Design of Integrated Circuits for Optical Communications* (McGraw-Hill, 2003, and Wiley, 2012), and *Fundamentals of Microelectronics* (Wiley, 2006, translated to Korean and Portuguese) and the editor of *Monolithic Phase-Locked Loops and Clock Recovery Circuits* (IEEE Press, 1996) and *Phase-Locking in High-Performance Systems* (IEEE, 2003). His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters.

Prof. Razavi was the recipient of the Beatrice Winner Award for Editorial Excellence at the 1994 IEEE International Solid-State Circuits Conference (ISSCC), the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998, and the McGraw-Hill First Edition of the Year Award in 2001. He was the co-recipient of both the Jack Kilby Outstanding Student Paper Award and the Beatrice Winner Award for Editorial Excellence at the 2001 ISSCC. He received the Lockheed Martin Excellence in Teaching Award in 2006, the UCLA Faculty Senate Teaching Award in 2007, and the CICC Best Invited Paper Award in 2009 and in 2012. He was also recognized as one of the top 10 authors in the 50-year history of ISSCC. He received the 2012 Donald Pederson Award in Solid-State Circuits and was the corecipient of the 2012 VLSI Circuits Symposium Best Student Paper Award. He has served as an IEEE Distinguished Lecturer. He served on the Technical Program Committees of the ISSCC from 1993 to 2002 and VLSI Circuits Symposium from 1998 to 2002. He has also served as Guest Editor and Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Electronics.