

# The Design of a Phase Interpolator

THE ANALOG MIND

Phase interpolators (PIs) find application in beamforming wireless systems and in wireline transceivers. The latter typically employ PIs within their clock and data recovery (CDR) loops.

In this article, we design a PI for a 56-Gb/s receiver (RX), targeting the following specifications:

- operation frequency: 28 GHz
- phase resolution: 0.4 ps
- jitter: <100 fs<sub>rms</sub>
- power consumption: <2 mW.

The circuit is realized in a 28-nm CMOS technology with a supply voltage of 0.95 V and simulated in the slow–slow corner at 75 °C. All transistor drawn lengths are equal to 30 nm. The reader is referred to [1], [2], [3], [4], and [5] for background information.

# **PI Environment**

Shown in Figure 1 is a 56-Gb/s RX example where the in-phase (I) and quadrature (Q) components of a 28-GHz clock arrive from the transmit side. The PI generates a clock phase from the I and Q waveforms and drives the half-rate CDR loop. Based on the information provided by its phase detector, this loop commands the PI to adjust its output phase until optimum sampling of  $D_{in}$  is achieved.

The performance of the PI is characterized primarily by its phase resolution and random jitter, both of which cause the sampling instants to depart from their ideal points in time. Given

Digital Object Identifier 10.1109/MSSC.2023.3315653 Date of current version: 14 November 2023 that  $D_{in}$  in Figure 1 has a bit period (also called the *unit interval*) equal to  $T_b = 17.9$  ps, we aim for a PI resolution of 0.4 ps and a root mean-square (rms) jitter of <100 fs<sub>rms</sub> so that the overall timing error remains below 1 ps most of the time. Moreover, since two or four PIs are in practice necessary for delivering complementary and/or quadrature clocks to the CDR, we constrain the power consumption of a single PI to 2 mW.





# **General PI Concepts**

Figure 2(a) illustrates, as an example, interpolation by a factor of 2 between the quadrature inputs,  $V_I$  and  $V_Q$ . Ideally, we have  $\overline{V_{\text{out}}} = (V_I + V_O)/2$ . This function can be realized by means of current-mode logic or CMOS (rail-torail) topologies. For the sake of simplicity, we pursue the latter. As shown in Figure 2(b), two identical inverters can perform interpolation by virtue of their finite output impedances. We note that at  $t = t_{12}$  the NMOS transistor in  $Inv_1$  and the PMOS device in Inv<sub>2</sub> are heavily on, fighting each other. If these two devices have equal strengths (if the N/P ratio is chosen properly), then  $\overline{V_{\text{out}}} \approx (V_l + V_Q)/2$ . The output is denoted by  $\overline{V_{\text{out}}}$  to include the inversion.

Phase interpolation fundamentally requires that the input transitions be sufficiently slow. As depicted in Figure 2(c), if the input edge spacing,  $t_2 - t_1$ , is greater than the transition times of  $V_I$  and  $V_Q$ , then  $\overline{V_{out}}$  suffers from a "kink," incurring a greater jitter.

The interpolation network of Figure 2(b) faces two issues. First,



FIGURE 2: (a) Interpolation by a factor of 2, (b) its simple implementation, and (c) the kink problem.

the exact value of  $t_{12}$  depends on the relative strengths of NMOS and PMOS transistors, thus varying across process corners. Second, due to their nonlinear output resistance, the transistors introduce additional timing errors as we raise the resolution.

As an example, consider the structure shown in Figure 3(a), where the inputs can be either  $V_I$  or  $V_Q$ . If  $V_1 = \cdots = V_4 = V_I$ , then  $V_{out}$  simply represents the *I* phase [Figure 3(b)]. If  $V_1 = V_Q$  and  $V_2 = V_3 = V_4 = V_I$ , then  $V_{out}$  rotates by 22.5°, etc.

We first simulate the circuit with the NMOS and PMOS widths equal to  $W_N = 200 \text{ nm}$  and  $W_P = 400 \text{ nm}$ , respectively, arriving at the results plotted in Figure 4(a). All of the interpolated waveforms exhibit large errors. Since the 2X-interpolated transition rises earlier than desired, we conclude that the PMOS device should



**FIGURE 3:** (a) and (b) Interpolation by a factor of 4.

be weaker. Choosing  $W_P = 300$  nm, we note from Figure 4(b) that this transition is corrected but the other two interpolated edges still deviate considerably from their ideal instants.

The foregoing issues are alleviated if we allow linear resistors to participate in the output summation. Illustrated in Figure 5, this method, in principle, yields uniformly spaced output transitions if R is much greater than the output resistance of the inverters,  $R_{inv}$ . With this assumption and using superposition, we have:

$$V_{\text{out}} \approx \frac{\frac{R}{3}}{R + \frac{R}{3}} \overline{V_1} + \dots + \frac{\frac{R}{3}}{R + \frac{R}{3}} \overline{V_4} \quad (1)$$
$$\approx \frac{1}{4} (\overline{V_1} + \dots + \overline{V_4}). \quad (2)$$

For example, if  $V_1 = V_2 = V_3 = V_l$  and  $\overline{V_4} = V_Q$ , then  $V_{out} \approx (3/4)\overline{V_l} + (1/4)\overline{V_Q}$ , providing linear interpolation.

In practice, several factors make it difficult to ensure that R is much greater than  $R_{inv}$ . First, a high value for R translates to a large time constant at the output as well as a large footprint for the overall Pl. Second, if wide transistors are chosen to lower  $R_{inv}$ , then the inverters occupy a large area, yield a high input capacitance, and draw significant power. Third, most importantly, a high resistance permits fast transitions at the inverters' outputs, thus introducing kinks in the interpolated waveforms. Figure 6(a) and (b) present the results for  $R = 10 \text{ k}\Omega \gg R_{\text{inv}}$  and  $R = 1 \text{ k}\Omega \approx R_{\text{inv}}$ , respectively. The latter case causes the output of one inverter to be "pulled" by others, creating longer transitions. Figure 6(c) illustrates this point by plotting  $\overline{V_1}$  for  $R = 10 \text{ k}\Omega$  and  $R = 1 \text{ k}\Omega$ . While avoiding kinks, a low value for R still leads to substantial nonuniformity. In other words, no value of R offers acceptable performance.

We continue with nearly minimumwidth transistors, select  $R = 1 \text{ k}\Omega$ , and seek other methods of improving the linearity.

# Use of a Virtual Ground

It is possible to improve the phase uniformity by performing the output summation at a virtual-ground node. Depicted in Figure 7(a), this approach allows more linear voltage-to-current conversion. Figure 7(b) plots the interpolation results, displaying greater linearity.

One can omit the 1-k $\Omega$  resistors in Figure 7(a) and still obtain similar waveforms, but the direct fight that thus ensues among the inverters draws large currents from the supply voltage. As explained below, the resistors also permit "predistortion."

Unlike typical virtual grounds, node X in Figure 7(a) swings by hundreds of millivolts due to the large feedback resistor. This is necessary for two reasons: first, it guarantees rail-to-rail swings in  $V_{out}$ , and second it ensures that  $Inv_X$  rapidly travels through its high-gain region, thereby producing minimal jitter and consuming little power.



**FIGURE 4:** Output waveforms of 4X interpolation network for (a)  $W_N = 200$  nm and  $W_p = 400$  nm, and (b)  $W_N = 200$  nm and  $W_p = 300$  nm.

 $V_{1} \circ \qquad R \\ V_{2} \circ \qquad R \\ V_{3} \circ \qquad R \\ V_{4} \circ \qquad V_{0} \circ \qquad$ 

**FIGURE 5:** Use of resistors in the interpolation network.



FIGURE 6: (a) Output waveforms for  $R = 10 \text{ k}\Omega$ , (b) output waveforms for  $R = 1 \text{ k}\Omega$ , and (c) one inverter output for the two cases.



FIGURE 7: (a) Interpolation using a virtual ground and (b) the resulting waveforms.



FIGURE 8: (a) The 16X interpolation network, (b) the illustration of the first quadrant, and (c) the resulting waveforms.

### **Single-Quadrant PI**

To achieve a phase resolution of 0.4 ps, we divide the time difference between the quadrature input phases by this value: (17.4 ps/4)/ $(0.4 \text{ ps}) \approx 11$ . We select an interpolation factor of 16 as a conservative measure. Illustrated in Figure 8(a), the circuit employs 16 inverter/resistor branches that must be driven by  $V_l$  or  $V_Q$ , hence the need for the 2-to-1 multiplexers (MUXes). The MUXes receive a thermometer code that determines how many inverters sense  $V_l$  and how many sense  $V_Q$ . For example, a code with 15 ONEs and one ZERO translates to  $V_{out} \propto 15 V_I + V_Q$  and, therefore, a rotation of  $\tan^{-1}(1/15) = 3.8^{\circ}$ . Operating with only  $V_l$  and  $V_Q$ , this circuit interpolates in the first quadrant [Figure 8(b)].

The interpolating inverters' transistors have nearly minimum widths so as to achieve an acceptable power consumption and input capacitance. Even though such small dimensions introduce large mismatches, the thermometric coding of the inverters guarantees that the phase varies monotonically.

Plotted in Figure 8(c) are the resulting waveforms. The phase spacing is not quite uniform and varies from 385 to 690 fs. To understand the reason, suppose the thermometer code contains *m* ONEs and 16 - m ZEROs, generating  $V_{\text{out}} \propto mV_I + (16 - m)V_Q$ . The interpolation angle is then equal to:

$$\theta(m) = \tan^{-1} \frac{16-m}{m}.$$
 (3)

The nonlinear dependence of  $\theta$  upon *m* causes larger increments if *m* is near 8 and smaller ones if *m* is near zero or 16. Figure 9 displays this behavior.

# **Phase Correction by Predistortion**

Equation (3) and Figure 9 give us an idea: the interpolating inverters near the midscale can be made weaker so as to obtain more uniform phase increments. Alternatively, those at the top and bottom of the array in Figure 8(a) can be made stronger, a more practical remedy in view of the small widths that we have chosen. We say the array is *predistorted*. But with the series resistors present, we benefit from additional flexibility here:  $R_1-R_{16}$  permit predistortion and obviate the need for nonuniform inverters.

Resistor predistortion must be carried out while bearing in mind the resistance "spread," i.e., the maximumto-minimum ratio, that it requires. We assume a 500- $\Omega$  unit resistor and set an upper bound of 3 k $\Omega$ .

To increase the increments near m = 0 and m = 16, we select  $R_1 = R_2 = R_{15} = R_{16} = 500 \Omega$ . Similarly, to decrease the phase steps in the vicinity of m = 8, we have  $R_7 = R_8 = R_9 = R_{10} = 3 \text{ k}\Omega$  Finally, we opt for  $R_3 = R_4 = R_{13} = R_{14} = 1 \text{ k}\Omega$  and  $R_5 = R_6 = R_{11} = R_{12} = 2 \text{ k}\Omega$ 

Figure 10 plots the results, revealing minimum and maximum steps equal to 420 fs and 636 fs, respectively. Predistortion has improved the linearity but we still do not meet our 400-fs resolution target.

#### **Raising the Resolution**

We wish to raise the PI resolution by a factor of 2. Rather than double the number of inverter/resistor branches—which would double the input capacitance—we surmise that a fine increment can be created by adding a "half-strength" branch. Illustrated in Figure 11(a), such a branch consists of  $Inv_0$  and  $R_0$  and contributes a step equal to 0.5 least-significant bit (LSB).  $Inv_0$ employs transistors whose lengths are doubled by series stacking. The choice of  $R_0$ , however, entails a compromise in view of predistortion. If  $R_0$  is twice  $R_1$ , then the 0.5-LSB branch yields a proper step near m = 0 and m = 16, but an excessively large increment in the vicinity of m = 8. For this reason, we choose  $R_0 = 2 k\Omega$ , arriving at the waveforms shown in Figure 12. The phase steps range from 156 fs to 362 fs.



**FIGURE 9:** The output phase versus the input thermometer code.

#### Four-Quadrant PI

It is desirable for the PI output to rotate, seamlessly, from 0° to 360° [Figure 13(a)]. This requires that the circuit interpolate between  $V_I$  and  $V_Q$ ,  $V_Q$  and  $\overline{V_I}$ , etc. We then modify the design as shown in Figure 13(b), where another rank of MUXes is inserted to select  $V_I$  or  $\overline{V_I}$  and  $V_Q$  or  $\overline{V_Q}$ . Two additional control bits are necessary for this operation.

The 2-to-1 MUXes in Figure 13(b) are realized by transmission gates [Figure 13(c)] The cascaded switches raise the input capacitance by about a factor of 4, presenting a total load of roughly  $(0.5 \,\mu\text{m} \times 4) \times 8 \times (1 \,\text{fF}) =$  16 fF to  $V_I$  and to  $V_Q$  for operation in the first quadrant. Two inverters driving these loads would draw about 0.9 mW at 28 GHz. The cascaded transistors also attenuate the input by about 15%.



FIGURE 10: The output waveforms after predistortion.



FIGURE 11: Addition of a 0.5-LSB branch to double the resolution.



FIGURE 12: Output waveforms of 32X interpolation network.



FIGURE 13: (a) Illustration of four-quadrant interpolation, (b) its implementation, and (c) the realization of the two MUX stages.

In the last step of our effort, we compute the PI output jitter by running a transient noise simulation with minimum and maximum noise frequencies

of 1 MHz and 200 GHz, respectively. Plotting the output eye diagram and examining one of the edges (Figure 14), we note a peak-to-peak jitter of about

#### EDITOR'S NOTE (continued from p. 4)

We hope you enjoy reading IEEE Solid-State Circuits Magazine. Please send comments to me at lbelosto@ ieee.org.

# Appendix: Related Articles

- [A1] M. Zhang et al., "Wireless compact neural interface for freely moving animal subjects: A review on wireless neural interface SoC designs," IEEE Solid-State Circuits Mag., vol. 15, no. 4, pp. 20-29, Fall 2023, doi: 10.1109/MSSC.2023.3312227.
- [A2] Y. Jia and L. Zhao, "Implantable medical devices for wireless optical neuromodulation and neural recording: Energy-

efficient integrated circuit and system design," IEEE Solid-State Circuits Mag., vol. 15, no. 4, pp. 30-40, Fall 2023, doi: 10.1109/MSSC.2023.3305589.

- [A3] M. Shoaran, "Next-generation closedloop neural interfaces: Circuit and AIdriven innovations," IEEE Solid-State Circuits Mag., vol. 15, no. 4, pp. 41-49, Fall 2023, doi: 10.1109/MSSC.2023.3309782.
- [A4] C. Sawigun, X. Yang, and C. M. Lopez, "Ultra-low-power voltage references: Exploring picowatt-level design using CMOS and hybrid architectures," IEEE Solid-State Circuits Mag., vol. 15, no. 4, pp. 50-57, Fall 2023, doi: 10.1109/MSSC.2023.3309769.
- [A5] A. Concannon, "Evolution of ESD robust IC design: How ESD design and ESD control have changed our industry," IEEE Solid-State



FIGURE 14: PI output jitter.

60 fs. For a Gaussian distribution, we estimate the rms value (i.e., the standard deviation) to be about one-sixth of the peak-to-peak swing. Thus, the rms jitter is less than 10 fs.

#### References

- [1] M.-S. Chen, A. Hafez, and C.-K. K. Yang, "A 0.1-1.5 GHz 8-bit inverter-based digital-to-phase converter using harmonic rejection," IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2681-2692, Nov. 2013, doi: 10.1109/JSSC.2013.2274892.
- [2] J. Lee, G. Jung, S. Kim, and M. Lee, "An 8-bit 1.24 mW sub-1ps DNL sub-1V supply inverter-based phase interpolator using a PVT-tracking adaptive-bias circuit," IEEE Trans. Circuits Syst., II, Exp. Briefs, vol. 70, no. 8, pp. 2749-2753, Aug. 2023, doi: 10.1109/TCSII.2023.3247595.
- [3] J. Hu, X. Wang, and Z. Zhu, "A 50-ps gated VCRO-based TDC with compact phase interpolators for flash LiDAR," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 12, pp. 5096-5117, Dec. 2022, doi: 10.1109/ TCSI.2022.3200944.
- [4] Y. Huang and B. Chen, "An 8b injectionlocked phase rotator with dynamic multiphase injection for 28/56/112Gb/s SerDes application," in Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), Feb. 2019, pp. 486-488, doi: 10.1109/ISSCC.2019.8662292.
- G. Souliotis, A. Tsimpos, and S. Vlassis, [5] "Phase interpolator-based clock and data recovery with jitter optimization," IEEE Open J. Circuits Syst., vol. 4, pp. 203–217, Aug. 2023, doi: 10.1109/OJCAS.2023.3295649.

SSC

- *Circuits Mag.*, vol. 15, no. 4, pp. 58–63, Fall 2023, doi: 10.1109/MSSC.2023.3298872.
  [A6] A. Sheikholeslami, "Tellegen's theorem [Circuit Intuitions]," *IEEE Solid-State Circuit Intuitions* 1, *Circuit Intuitions* 1, *Circuit* cuits Mag., vol. 15, no. 4, pp. 11-12, Fall 2023, doi: 10.1109/MSSC.2023.3315668.
- [A7] B. Razavi, "The design of a phase interpolator [The Analog Mind]," IEEE *Solid-State Circuits Mag.*, vol. 15, no. 4, pp. 6–10, Fall 2023, doi: 10.1109/MSSC. 2023.3315653.
- [A8] C. Mangelsdorf, "Phase data: You're not going to throw that away, are you? [Shop Talk: What You Didn't Learn in School], IEEE Solid-State Circuits Mag., vol. 15, no. 4, pp. 13-19, Fall 2023, doi: 10.1109/ MSSC.2023.3315669.

SSC