

# Fifty Applications of the CMOS Inverter—Part 2

As explained in the first part of this article [1], this series deals with interesting and useful applications of the CMOS inverter, a versatile building block in today's designs. In addition to studying the operation of such circuits, we also quantify their performance by simulations in 28-nm CMOS technology. Simulations are performed in the slow-slow corner at  $T = 75^{\circ}$  C with  $V_{DD} = 0.95$  V. An estimate of the layout parasitic capacitances is also

Digital Object Identifier 10.1109/MSSC.2024.3473737 Date of current version: 15 November 2024 included. Unless otherwise stated, L = 30 nm for all transistors.

#### The Ring Oscillator

Inverter-based ring oscillators can serve as a compact solution with a wide tuning range. While suffering from high phase noise (PN), rings have become common in die-to-die serial links ("chiplets") because a significant fraction of their PN is removed by "clock forwarding" [2].

Consider the ring shown in Figure 1(a). The oscillation frequency,  $f_0$ , is equal to  $1/(6T_D)$ , where  $T_D$  denotes the gate delay. (We neglect

the additional loading presented by the buffer.) The oscillator core power consumption, *P*, is given by  $3f_0 C_0 V_{DD}^2$ , where  $C_0$  is the total capacitance seen from each node to ground. For small transistors, the PN is dominated by flicker noise and can be expressed as [3]

$$S_{\phi}(f) = \frac{f_0^2}{2MI_D^2 f^2} [S_N(f) + S_P(f)]. \quad (1)$$

Here, *M* is the number of stages,  $I_D$  is the transistor current under the conditions shown in Figure 1(b), and  $S_N(f)$  and  $S_P(f)$ , respectively,



FIGURE 1: (a) A simple ring oscillator, (b) condition under which I<sub>p</sub> and S<sub>u</sub>(f) are measured, (c) oscillation waveform at X, and (d) PN profile.

denote the flicker noise drain current spectrum of NMOS and PMOS devices under the same condition. This equation offers a wealth of insight into the ring's PN behavior.

Let us first simulate the circuit with  $(W/L)_N = 0.5 \ \mu m/30 \ nm$  and  $(W/L)_P = 1 \ \mu m/30 \ nm$ . Depicted in Figure 1(c), the waveform at *X* suggests that  $f_0 \approx 46 \ GHz$ . We also have  $P = 340 \ \mu A \times 0.95 \ V \approx 320 \ \mu W$ , both attractive results. The profile in Figure 1(d), however, reveals a PN of -48 dBc/Hz at 1-MHz offset, an exceedingly high value. The 30-dB/ decade slope of the PN profile signifies flicker noise upconversion for frequency offsets as high as several hundred megahertz.

In practice, we run ring oscillators at lower frequencies. This point holds, for example, in a chiplet environment due to the difficulties facing the *distribution* of high-frequency clocks. We then ask: How can we lower  $f_0$  in Figure 1(a)? If we simply increase the



FIGURE 2: A stage with continuous and discrete tuning.

capacitances at the three nodes by a factor of n,  $f_0$  drops by the same factor, P remains constant, and  $S_{\phi}$ falls by  $n^2$ . On the other hand, if we raise the number of stages to nM, we still obtain  $f_0^2/n^2$  in the numerator of (1) but also another factor of n in its denominator. The power is unchanged. This method is thus advantageous.

The larger number of stages does lead to substantial complexity if we target a practical voltage-controlled oscillator (VCO). Consider the stage shown in Figure 2, where the varactor, Cvar, provides continuous tuning, and unit capacitors  $C_1 \cdots C_k$  implement discrete tuning. This topology exemplifies the design complexity if  $f_0$  is scaled by simply increasing the number of stages. We now wish to see whether the  $1/n^3$  PN reduction can be realized without raising *M*. This is possible if both the width and length of the transistors are increased by a factor of  $\sqrt{n}$ . Consequently,  $I_D$  in Figure 1(b) remains constant, and  $S_N(f)$  and  $S_P(f)$  fall by *n* because they are inversely proportional to the transistors' channel areas. Since the node capacitances rise by approximately a factor of *n*,  $f_0$  scales proportionally, causing  $S_{\phi}$ to fall by  $n^3$ . We therefore prefer this approach.

Shown in Figure 3(a) is an example using  $(W/L)_N = 1 \,\mu m/100$  nm and  $(W/L)_P = 2 \,\mu m/100$  nm. Due

to drain junction capacitances, W and L are scaled somewhat differently, with the goal of scaling  $f_0$  by a desired factor (three in this case) and keeping P constant. We now have  $f_0 = 15.7$  GHz and the PN profile plotted in Figure 3(b). The PN at 1-MHz offset has decreased to -62 dBc/Hz, i.e., by 14 dB. We could obtain the same reduction by tripling the number of stages but at the cost of greater complexity. We call this design the "reference oscillator." For lower PN values, we can trade power consumption; if all of the oscillator's transistor widths are scaled up by m,  $f_0$  remains constant,  $S_{\phi}(f)$  in (1) drops by *m*, and *P* rises by the same factor.

When comparing the PN of different oscillators, we should normalize the values to  $f_0^2$  and *P* because  $S_{\phi}(f) \propto f_0^2/P$ . Suppose  $f_0$  changes to  $\alpha f_0$  and *P* changes to  $\beta P$ . If the change in  $S_{\phi}(f)$  in dB is equal to  $10\log \alpha^2 + 10\log \beta^{-1}$ , we say the performance has not changed.

### **The Differential Ring Oscillator**

It is possible to couple two singleended rings so as to guarantee that they oscillate with opposite phases. Illustrated in Figure 4(a), the idea is to inject the output of each oscillator into the other by an inverter. Since  $Inv_1$  and  $Inv_2$  prefer to sustain complementary inputs and outputs, the waveforms at X and Y develop



FIGURE 3: (a) A ring oscillator with larger transistors and (b) its PN profile.

a phase difference of  $180^{\circ}$ . If excessively strong, however, these two inverters can create latch-up by virtue of positive feedback. For this reason, we typically scale Inv<sub>1</sub> and Inv<sub>2</sub> down by a factor of four with respect to the main inverters.

Figure 4(b) plots the waveforms at *X* and *Y*, revealing an oscillation frequency of 14.1 GHz, slightly less than that of the single-ended counterpart shown in Figure 3(a). This occurs due to the additional capacitances introduced by the cross-coupled inverters. We also have  $P = 690 \ \mu A \times 0.95 \ V \approx 660 \ \mu W$ . Presented in Figure 4(c), the PN profile suggests a value of -66 dBc/Hz at a 1-MHz offset, about 4 dB lower than that in Figure 3(b). This is to be expected because  $10 \log (660 \ \mu W/340 \ \mu W) + 10 \log (15.7 \ GHz/14.1 \ GHz)^2 = 3.7 \ dB.$ 

#### The Quadrature Ring Oscillator

We can envision placing four inverters in a loop so as to generate quadrature phases (see Figure 5(a)]. This circuit, however, prefers to latch up and settle into "degenerate" conditions A = B = 0and  $X = Y = V_{DD}$  (or the other way around). We know from the previous section that cross-coupled inverters, e.g., Inv1 and Inv2 in Figure 4(a), pose complementary values at their two terminals and can therefore avoid the degenerate conditions. As shown in Figure 5(b), we attach such pairs between A and *B* and between *X* and *Y* [4]. Plotted in Figure 5(c) are the quadrature waveforms exhibiting a frequency of 9.8 GHz. From Figure 5(d), we measure a PN of -67 dBc/Hz at a 1-MHz offset, 5 dB lower than that

of our reference oscillator. With  $P = 850 \ \mu A \times 0.95 \ V = 810 \ \mu W$ , we recognize that, ideally, the PN of the reference oscillator should drop by  $10 \log(810 \ \mu W/340 \ \mu W) + 10 \log(15.7 \ GHz/9.8 \ GHz)^2 = 7.7 \ dB$ . That is, quadrature operation costs about 2.7 dB in performance.

### **The Current-Controlled Oscillator**

The high supply sensitivity of the ring oscillators studied in the previous sections demands on-chip voltage regulators having low output noise. An alternative that alleviates this issue supplies the ring by a *current source* [Figure 6(a)], assuming that  $I_1$  has little dependence on  $V_{\text{DD}}$  [5]. Capacitor  $C_1$  further suppresses the supply noise. Since the oscillation frequency depends on  $V_X$  and since  $V_X$  can be adjusted by I1,



FIGURE 4: (a) A differential ring, (b) the X and Y waveforms, and (c) the PN profile.



FIGURE 5: (a) A four-stage ring, (b) the addition of cross-coupled inverters, (c) the ring's waveforms, and (d) the PN profile.

the circuit is called a "current-controlled oscillator" (CCO).

It is helpful to estimate the smallsignal resistance seen at node *X* [Figure 6(b)]. Each inverter can be modeled as depicted in Figure 6(c), which represents a capacitance,  $C_L$ , switching between two nodes periodically. Such a structure displays an average resistance equal to  $1/(f_0 C_L)$ , yielding  $R_X \approx 1/(3f_0 C_L)$ . Thus, the pole at *X* in Figure 6(a) is located at  $\omega_X \approx 3f_0 C_L/C_1$ , demanding a large  $C_L$ if low-frequency supply noise must be suppressed.

One advantage of the CCO is its very wide frequency tuning range; even if  $I_1$  is very small,  $V_X$  falls to a level barely necessary for the inverters to operate properly, still sustaining oscillation. The principal drawback of the CCO is that it is highly sensitive to  $I_1$ , converting the noise of this current source to PN. This effect is particularly acute if  $I_1$  carries a great deal of flicker noise. In the arrangement



FIGURE 6: (a) A CCO, (b) the circuit's core for resistance calculation, and (c) the equivalent circuit of one inverter.

shown in Figure 7(b), for example, both  $M_a$  and  $M_b$  introduce flicker noise in the "control" path, thereby modulating the oscillation frequency and phase. To avoid amplifying the noise of  $M_a$ , we select  $(W/L)_a = (W/L)_b = 10 \ \mu m/100 \ nm$ , burning as much power in the reference current path as in the oscillator. We obtain  $f_0 = 11.1$  GHz. From Figure 7(b), we observe a PN of -64 dBc/Hz at a 1-MHz offset. Given  $P = 400 \ \mu A \times 0.95 \ V = 380 \ \mu W$ , we expect a PN drop of 10log  $(380 \ \mu W/340 \ \mu W) + 10 \log (15.7 \ GHz/W)$   $11.1 \text{ GHz})^2 = 3.4 \text{ dB}$ . But the actual PN is 2 dB lower than that of our reference oscillator.

#### The Digitally Controlled Oscillator

Digital phase-locked loops incorporate digitally controlled oscillators (DCOs) rather than VCOs. For this purpose, we can simply rely on the discrete tuning scheme depicted in Figure 2. Alternatively, we can turn to the topology shown in Figure 8(a) [6], where the main ring consisting of  $Inv_1$ - $Inv_3$  is accompanied by additional inverters that can be enabled or disabled by a thermometer code. With only the main ring enabled, the circuit oscillates at its *lowest* frequency because of the parasitic capacitances introduced by the disabled inverters. If we now activate, e.g.,  $Inv_4$ , it provides a greater drive strength at node *Y* while negligibly raising the total capacitance at *X*. The oscillation frequency thus increases.

Employing our reference oscillator of Figure 3(a) in this environment with a total of nine inverters, we arrive at the tuning characteristic presented in Figure 8(b). The frequency rises from 4.2 to 11.6 GHz as the number of active inverters varies from three to nine. Of course, for smaller steps, a large number of inverters are necessary.

Recall from our analysis of the basic ring oscillator that simply adding capacitances to the internal nodes does not reduce the PN as much as increasing the transistor dimensions does. We therefore predict that the DCO of Figure 8(a) exhibits a less favorable PN-power tradeoff at its lowest oscillation frequency. On the other hand, the circuit reduces to m rings in parallel at its maximum frequency, reducing the PN by a factor of m with respect to that of the reference oscillator while consuming m times the power.

#### The Crystal Oscillator

A single inverter along with two capacitors can provide a negative resistance and hence the possibility of oscillation if it is connected to a resonator. Illustrated in Figure 9(a), such a structure displays the following impedance:

$$Z_X = \frac{1}{C_1 s} + \frac{1}{C_2 s} + \frac{g_{mN} + g_{mP}}{C_1 C_2 s^2}$$
(2)

where channel-length modulation is neglected and  $g_{mN}$  and  $g_{mP}$  denote the transconductances of the inverter's NMOS and PMOS transistors, respectively. For  $s = j\omega$ , the third term emerges as  $-(g_{mN} + g_{mP})/(C_1 C_2 \omega^2)$ , signifying a frequency-dependent negative resistance. We now construct the



FIGURE 7: (a) A CCO using a current mirror and (b) its PN.





oscillator shown in Figure 9(b), where  $R_F$  ensures that  $Inv_1$  begins in its highgain region.  $Inv_2$  sharpens the waveform's edges.

The crystal model of Figure 9(c) reveals both series and parallel resonances occurring at  $\omega_s = 1/\sqrt{L_1 C_s}$  and  $\omega_p = 1/\sqrt{L_1 C_s C_p}/(C_s + C_p)$ , respectively. The topology of Figure 9(b) operates at the latter.

We select a 25-MHz crystal model from [7] with the following parameter values:  $L_1 = 12.6 \text{ mH}$ ,  $C_s =$ 3.4 fF,  $C_p = 1.2$  pF, and  $R_s = 20 \Omega$ . Note that  $C_p \gg C_s$ , yielding  $\omega_s \approx \omega_p$ . To minimize the flicker noise of the inverters, we resort to large transistors:  $(W/L)_N = 20 \,\mu m / 100 \,\text{nm}$  and  $(W/L)_P = 40 \ \mu m/100 \ nm.$  We also select  $R_F = 1 M\Omega$  to minimize its noise contribution. With  $C_1 = C_2 = 10 \text{ pF}$  and  $g_{mN} + g_{mP} = 1/(100 \Omega)$ , the negative resistance at 25 MHz amounts to about -4 k $\Omega$ , more than adequate to cancel the parallel equivalent resistance of the crystal (=  $L_1^2 \omega^2 / R_s \approx 2 \times 10^{11} \Omega$ ).

Figure 10(a) plots the voltages at nodes *A* and *B* in Figure 9(b), revealing an oscillation startup time of about 300  $\mu$ s. Plotted in Figure 10(b) are the PN profiles at node *B* and at the output of Inv<sub>2</sub>. We recognize that Inv<sub>2</sub> raises the PN at low offset frequencies due to the slow edges at *B*.

## The Burst-Mode Clock and Data Recovery Circuit

"Burst-mode" optical communication systems require that their clock and data recovery (CDR) circuits phase-

The beauty of

that it relies on ring oscillators

for their ability

to recover almost

*instantaneously* 

from a reset state.

this technique is

lock to the incoming data,  $D_{in}$ , in a very short time. Let us consider a ring oscillator under injection by  $D_{in}$  [Figure 11(a)]. When  $D_{in}$  is low, the circuit oscillates at its natural frequency, and when  $D_{in}$ is high, node X is pulled to near zero and V

to near zero and  $V_{out}$ remains high. This output is not a periodic waveform and cannot serve as a



FIGURE 9: (a) A circuit exhibiting a negative resistance, (b) the basic crystal oscillator, and (c) a crystal model.

clock. But if two such rings are driven by  $D_{in}$  and  $\overline{D_{in}}$ , and their outputs are applied to a NAND gate [Figure 11(b)] [8], we can produce a recovered clock. The beauty of this technique is that it relies on ring oscillators for their ability to recover almost instantaneously from a reset state.

One can argue that the rings in Figure 11(b) do not actually phase-lock to  $D_{in}$  because, when enabled, they operate at their natural frequency,  $f_0$ . This perspective reveals that any difference between  $f_0$  and the input

> data rate translates to phase error accumulation between the output clock and the input data. This effect proves serious if  $D_{in}$  contains long sequences of consecutive ones or zeros.

The burst-mode CDR entails an issue that deter-

mines how the strength of  $M_a$  and  $M_b$ in Figure 11(b) must be chosen. Suppose  $M_a$  is strong enough to cause  $V_X \approx 0$ . When  $M_a$  turns off, E resides at a low level, causing X to charge by a PMOS device whose gate-source voltage is equal to  $V_{\text{DD}}$  [Figure 11(c)]. This yields a certain rise time,  $t_{r1}$ . In the oscillation mode, however, nodes E and X bear a phase difference of 120°, producing a different rise time at X,  $t_{r2}$ . This difference translates to output jitter when the input makes transitions. For this



FIGURE 10: The crystal oscillator's (a) waveforms and (b) PN profile.

reason, we select the strength of  $M_a$  and  $M_b$  only to bring  $V_X$  or  $V_Y$  down to a few hundred millivolts.

Another jitter mechanism occurs if the two inputs of the NAND gate in Figure 11(b) experience different delays. To resolve this issue, we employ two such gates with their inputs swapped in Figure 11(d).

Beginning with our reference oscillator of Figure 3(a), we implement the CDR for operation at 15 GHz. We have  $(W/L)_{a,b} = 1.5 \,\mu\text{m}/30$  nm. Figure 12(a) presents the waveforms at *X* and *Y*, and Figure 12(b) shows the recovered clock. The peak-to-peak jitter is about 200 fs.

## **The Feedforward Frequency Divider**

Recall from the first part of this article series [1] that a dynamic latch can be realized as a transmission gate followed by an inverter. We have seen in [1] that a  $\div 2$  circuit using this latch operates up to an input frequency of 45 GHz. This bound can be raised by means of "feedforward." Illustrated in Figure 13(a) [9],



FIGURE 11: (a) A ring enabled and disabled by random data, (b) a burst-mode CDR circuit, (c) an illustration of oscillator activation, and (d) two crisscrossed NAND gates for improving propagation symmetry.

the idea is to add  $Inv_4$  so that the waveform at *X* finds a faster path to *Y*. The divider can thus run at higher frequencies. However, it now faces a lower bound because  $Inv_4$  overwhelms the main path from *X* to *Y* at low clock rates. These bounds are studied in [9].

Figure 13(b) plots the waveform at *Y* for an input frequency of 52 GHz, indicating correct operation. As depicted in Figure 13(c), the circuit operates properly from 23 to 54 GHz.

## The Frequency Divider With Quadrature Outputs

Wireless and wireline applications often require in-phase (*I*) and quadrature (*Q*) components of periodic waveforms. To generate these phases by a frequency divider, we return to the quadrature oscillator of Figure 5(b) and surmise that it can act as a  $\div$  2 circuit if  $Inv_1$ - $Inv_4$  are controlled by the input clock. Shown in Figure 14(a), the result can be viewed as an injectionlocked oscillator or as a loop employing "clocked CMOS" (C<sup>2</sup>MOS) latches.

The device dimensions in this divider must be chosen carefully. The clocked transistors are twice as wide as the devices that they enable so as to maintain reasonable pulldown and pull-up strengths for the latches. Moreover, the cross-coupled inverters are twice as weak as the main inverters to avoid latch-up. With these dimensions, the circuit



FIGURE 12: The CDR circuit's (a) internal waveforms and (b) output clock.



FIGURE 13: (a) A divide-by-two circuit with feedforward, (b) the waveform at Y, and (c) its frequency characteristic.



FIGURE 14: (a) A divide-by-two circuit providing quadrature outputs, (b) its output waveforms, and (c) its frequency characteristic.

generates the *I* and *Q* outputs plotted in Figure 14(b) for an input frequency of 60 GHz. The characteristic of Figure 14(c) suggests correct operation up to 62 GHz.

## **The Phase Interpolator**

In some applications, we generate the I and Q phases of a clock and then interpolate between them so as to create finer phase spacings. Inverters can serve this purpose at fairly high speeds.

Consider the arrangement shown in Figure 15(a), where quadrature inputs,  $V_l$  and  $V_Q$ , ideally yield  $\overline{V_{out}} = (V_l + V_Q)/2$ . At  $t = t_{12}$ , the NMOS device in  $Inv_1$  and the PMOS device in  $Inv_2$  are heavily on, fighting each other. With equal strengths, therefore, these two transistors produce  $\overline{V_{out}} = (V_I + V_Q)/2$ . Of course, the NMOS/PMOS strength ratio varies with process corners.

Phase interpolation requires sufficiently *slow* transitions for  $V_l$  and  $V_Q$  so that they avoid any "nonoverlap" time. As illustrated in Figure 15(b), if  $V_l$  and  $V_Q$  respectively reside at high and low levels at the same time, a "kink" appears in the output, causing considerable jitter.

To lessen the effect of process corners, we can add resistors to the



FIGURE 15: (a) Two inverters interpolating between quadrature inputs and (b) the kink problem.



FIGURE 16: (a) A 16× phase interpolator, (b) its phasor diagram, and (c) its interpolated waveforms.

interpolation network. Shown in Figure 16(a) is an example for interpolation by a factor of 16 at 28 GHz [10]. We use a virtual ground, node *X*, for the summation of the currents produced by the inverters and the resistors. The multiplexers receive a thermometer code that determines how many inverters sense  $V_l$  and how many sense  $V_Q$ . For example, a code with 15 ones and one zero translates to  $V_{out} \propto 15 V_I + V_Q$  and hence a rotation of  $tan^{-1}(1/15) = 3.8^{\circ}$  [Figure 16(b)]. Plotted in Figure 16(c) are the interpolated waveforms, displaying some nonlinearity; the phase spacing varies from 385 to 690 fs. This issue can be alleviated through the use of predistortion [10].

#### References

- B. Razavi, "Fifty applications of the CMOS inverter—Part 1," *IEEE Solid-State Circuits Mag.*, vol. 16, no. 3, pp. 7–14, Summer 2024, doi: 10.1109/MSSC.2024.3419528.
- [2] B. Casper et al., "A 20Gb/s forwarded clock transceiver in 90nm CMOS B.," in *Proc. IEEE Int. Solid State Circuits Conf.* - *Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2006, pp. 87-88. doi: 10.1109/ ISSCC.2006.1696056.
- [3] A. Homayoun and B. Razavi, "Relation between delay line phase noise and ring oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 384–391, Feb. 2014, doi: 10.1109/JSSC.2013.2289893.
- [4] L. Sun and A. Kwasniewski, "A 1.25-GHz 0.35-um monolithic CMOS PLL based on a multiphase ring oscillator," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 910–916, June 2001, doi: 10.1109/4.924853.
- [5] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, "A 320 MHz, 1.5 mW@1.35 V CMOS PLL for microprocessor clock generation," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1715–1722, Nov. 1996, doi: 10.1109/JSSC.1996.542316.
- [6] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, "A wide power supply range, wide tuning range, all static CMOS all digital PLL in 65 nm SOI," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 42–51, Jan. 2008, doi: 10.1109/JSSC.2007.910966.
- [7] "Crystal Oscillator and Crystal Selection for the CC13xx Family of Wireless MCUs," Application Note, pp. 1–18, Apr. 2024 [Online]. Available: https://www.ti.com/ lit/pdf/swra945
- [8] M. Banu and A. Dunlop, "A 660 Mb/s CMOS clock recovery circuit with instantaneous locking for NRZ data and burst-mode transmission," in *Proc. IEEE Int. Solid State Circuits Conf. – Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 1993, pp. 102– 103, doi: 10.1109/ISSCC.1993.280066.
- [9] O. Memioglu, Y. Zhao and B. Razavi, "A 300-GHz 52-mW CMOS receiver with onchip LO generation," *IEEE J. Solid-State Circuits*, vol. 58, no. 8, pp. 2141–2156, Aug. 2023, doi: 10.1109/JSSC.2023.3257820.
- [10] B. Razavi, "The design of a phase interpolator," *IEEE Solid-State Circuits Mag.*, vol. 15, no. 4, pp. 6–10, Fall 2023, doi: 10.1109/ MSSC.2023.3315653.