# A Digital RF Transmitter With Background Nonlinearity Correction

Seyed-Mehrdad Babamir<sup>10</sup>, Student Member, IEEE, and Behzad Razavi<sup>10</sup>, Fellow, IEEE

Abstract—This article describes a new digital transmitter architecture that automatically corrects static and dynamic nonlinearities with no need for digital predistortion or adaptation. We draw upon the Newton–Raphson method of solving equations and show that it leads to  $\Delta\Sigma$  modulation as a special case and to a compact, efficient transmitter in the general case. A complete transmitter realized in 28-nm CMOS technology achieves an overall efficiency of 50% while delivering +24 dBm with an adjacent channel power ratio of -35.4 dB and a receive-band noise of -150 dBc/Hz.

*Index Terms*—Digital power amplifier, digital predistortion, digital transmitter, Newton–Raphson algorithm, wideband code-division multiple access (WCDMA).

## I. INTRODUCTION

**D** IGITAL radio-frequency (RF) transmitters (TXs) have gained popularity in recent years. The realization of transmit functions in the digital domain offers many advantages, e.g., analog blocks, such as variable-gain amplifiers, offset-cancellation digital-to-analog converters (DACs), and predrivers, are omitted.

The greatest challenge facing RF transmitters, analog or digital, is the tradeoff between linearity and efficiency, which in turn has led to many linearization techniques. Since the die temperature varies considerably with the TX output power, the linearization must continue in real time; i.e., foreground calibration techniques prove inadequate if they attempt to correct a highly nonlinear output stage.

This article introduces a new approach to TX linearization that corrects for both static and dynamic nonlinearity in the background. The correction's efficacy allows designing the DAC for maximum efficiency with almost arbitrary integral nonlinearity (INL). Targeting the wideband code-division multiple access (WCDMA) standard as an example, the simple, compact architecture affords the highest efficiency reported to date. Realized in 28-nm standard CMOS technology, the transmitter delivers +24.1 dBm with an adjacent channel power ratio (ACPR) of -35.4 dB and an overall efficiency of 50%.

Manuscript received June 7, 2019; revised October 18, 2019 and December 13, 2019; accepted January 13, 2020. Date of current version May 27, 2020. This article was approved by Associate Editor Waleed Khalil. This research was supported by Realtek Semiconductor. (*Corresponding author: Seyed Mehrdad Babamir.*)

The authors are with the Department of Electrical and Computer Engineering, University of California at Los Angles, Los Angeles, CA 90095-1594 USA (e-mail: smbabamir@ucla.edu; razavi@ee.ucla.edu).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2020.2968796

 $\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\end{array}\\
\end{array}\\
\end{array}\\
\end{array} \\
\begin{array}{c}
\end{array}\\
\end{array} \\
\begin{array}{c}
\end{array}\\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\begin{array}{c}
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array}$ \left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} 
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} \\
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} 
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} 
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} 
\left( \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} \\
\end{array} 
\left( \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \end{array} \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\left( \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\left( \\
\end{array} 
\left) \\
\left( \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\left( \\
\end{array} 
\left) \\
\end{array} 
\left) \\
\end{array} 
\left( \\
\end{array} 
\left) \\
\end{array} 
\left) \\
\end{array} 
\left( \\

Fig. 1. (a) Simplified TX architecture in [11]. (b) DAC unit cell.

Section II provides a brief background on nonlinearity calibration, and Section III deals with the performance requirements. Section IV describes the basic idea, and Section V presents the evolution of the TX architecture. Sections VI and VII describe the design of the building blocks, and Section VIII summarizes the experimental results.

# II. BACKGROUND

Extensive work has been dedicated to transmitters and their building blocks [1]–[10]. As expected, in digital architectures, the RF DAC has been the focus of these developments since it limits the overall TX performance. Fig. 1(a) shows an architecture example [11], where the baseband quadrature signals,  $I_{in}$  and  $Q_{in}$ , are applied to a digital predistortion (DPD) block before reaching the RF DACs. We assume a current-switching DAC architecture. The DPD can be viewed as the inverse of the DAC's characteristic. The DACs are clocked by the local oscillator (LO) phases,  $LO_I$  and  $LO_O$ , so as to upconvert the baseband signals. Fig. 1(b) depicts a unit cell of the DACs, where  $D_i$  denotes the cell's digital input [11]. In order to maximize the power efficiency, the transistors in the cell act as switches with a low ON-resistance. The two DACs thus suffer from a low output resistance, affecting each other's signals. This interaction between the I and Q DACs requires a 2-D polynomial correction [11]-[14], hence the need for the 2-D lookup table (LUT) in Fig. 1(a). The situation becomes even more challenging in the presence of dynamic nonlinearities, calling for "complex predistortion" (delayed polynomials) [15].

0018-9200 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

The work in [16] uses a voltage-mode switched-capacitor (SC) PA whose output impedance does not change with the input code. The design reported in [17] applies load modulation to the DAC, improving the efficiency at power backoff. The modulation is implemented using an on-chip transformer. Jin et al. [18], [19] and Yoo [20] reported voltage-mode transmitters, [18] proposes I/Q sharing with 25% duty-cycle LO phases, [20] uses I/Q sharing in a class-G SC topology, and [21] employs a Doherty architecture in a class-G SC PA to improve the efficiency, with the memoryless linearization performed in MATLAB. The work reported in [22] uses a multiphase SC PA to improve the efficiency and does not require DPD for its operation. The work in [23] employs mixed-domain filtering techniques to notch the noise floor in a certain frequency offset with 2-D DPD performed in MATLAB. We should remark that the work in [24] employs an ON-chip matching network and achieves an average PAE of 12.6% at an average power of 7.8 dBm.

In addition to the complexity of the lookup tables, conventional digital transmitters also suffer from the drift of the DAC nonlinearity with the temperature and antenna impedance. The antenna impedance can change significantly depending on the proximity to the users' hand [25], for example, from 50  $\Omega$ to 10 + j70  $\Omega$  [25]. As an example, we quantify the impact of the antenna impedance change by supposing that our DAC (Section VII) is preceded by a polynomial predistorter so as to meet the WCDMA ACPR<sub>1</sub> of 33 dB with a 50- $\Omega$  impedance. Now, if the impedance changes to 35+j0  $\Omega$ , then, according to simulations, ACPR<sub>1</sub> degrades by 5 dB.

Prior work on background calibration of analog transmitters relies on FPGA-based or MATLAB-based correction with offthe-shelf components [26], [27]. Also, the system in [27] takes half a second to adapt to new conditions, and corrects for only static nonlinearity. To our knowledge, no background nonlinearity correction has been reported for digital transmitters.

#### **III. PERFORMANCE REQUIREMENTS**

Before presenting our work, we summarize the performance requirements that a generic digital transmitter would need to satisfy. As an example, we consider the WCDMA specifications [28]: data rate: 3.84 Mcps; channel bandwidth: 5 MHz; carrier frequency: 1920–1980 MHz (band I); maximum average output power: +24 dBm (for "class C" as defined by the standard); ACPR at this power level: 33 dB; ACPR for alternate adjacent channel (ACPR<sub>2</sub>): 43 dB; output noise in the receive band: -125 dBm/Hz or equivalently, -149 dBc/Hz. While the emphasis of this work is the linearization technique rather than design for a particular standard, we have chosen WCDMA as an example for its extremely stringent ACPR and receive-band noise requirements. This standard poses other spurious emission constraints that are not considered in this work.

The RF DAC output quality is dictated by the ACPR [or the error vector magnitude (EVM)] and the receive-band noise (RXBN), the former imposing a maximum INL of 5% [29] and the latter requiring a resolution of 13.5 bits and an output thermal noise floor below -125 dBm/Hz. (The necessary DAC



Fig. 2. (a) Nonlinear DAC. (b) Linearization by predistortion. (c) Linearization by  $\Delta\Sigma$  loop.

resolution is estimated by noting that an average power of +24 dBm across a bandwidth of 3.84 MHz along with a noise floor of -125 dBm/Hz translates to a signal-to-noise ratio of 83 dB = 13.5 bits. This value is relaxed by the oversampling ratio of the DAC.) These bounds must be met with acceptably small differential nonlinearity (DNL) and output glitches, both of which tend to raise the noise floor.

A digital TX employing background calibration can downconvert the RF DAC output to baseband signals, digitize these signals, and provide real-time feedback. The analog-to-digital converters (ADCs) necessary in the loop must also satisfy certain conditions so as to negligibly corrupt the downconverted signals. We expect that the necessary INL and resolution of the ADCs are similar to those of the RF DAC.

## IV. BASIC IDEA

Fig. 2(a) shows a DAC that, for now, is assumed to have only static nonlinearity, exhibiting an input–output characteristic f(x). To linearize the DAC, we wish to precede it with a block whose behavior is to be determined [Fig. 2(b)]. Here, w is the main input and eventually represents the TX baseband data. In conventional predistortion, the first block approximates  $f^{-1}(\cdot)$  by a polynomial of the form  $a_1w + a_2w^2 + \cdots + a_nw^n$  with the coefficients  $a_j$  selected so as to make y = f(x) a faithful replica of w. Equivalently,  $f^{-1}(\cdot)$ maps each value of w uniquely and statically to a value of xonce the coefficients are frozen.

Let us approach the problem from a different perspective by focusing on x. For y = f(x) in Fig. 2(b) to become equal to w, we must force f(x) - w to zero, where both  $f(\cdot)$  and x are known. We denote this difference (the "error function") by g(x). For every value of the input, w, we wish to choose x so that  $g(x) \rightarrow 0$ , i.e., so that x is a root of g(x). To ensure that x satisfies g(x) = 0, we can utilize any equation solver, e.g., the Newton–Raphson technique. That is, to solve g(x) = 0, we iteratively select

$$x_{n+1} = x_n - \frac{g(x_n)}{g'(x_n)}.$$
 (1)



Fig. 3. Conceptual TX architecture.

In contrast to static nonlinearity correction, the iteration relies on the past values of  $x_n$  and hence can correct for dynamic nonlinearity as well. In our case,  $g'(x_n) \approx f'(x_n)$  if w changes slowly with time.

Before reducing these seemingly abstract concepts to practice, we make two remarks. First, the Newton–Raphson iteration must occur fast enough to keep up with the dynamics of the baseband signal, w. Second, the computation of  $g(x_n)/g'(x_n)$  must be managed such that it can be realized efficiently. In this regard, let us, for now, make a rather coarse approximation and write  $g'(x_n) \approx 1$ . Substituting this value for  $g'(x_n)$  in (1) and replacing  $g(x_n)$  with  $f(x_n) - w_n$ , we have

$$x_{n+1} = x_n - [f(x_n) - w_n] = x_n - (y_n - w_n).$$
(2)

This result implies that the next value of x can be obtained by subtracting the present error,  $y_n - w_n$ , from the present value of x, leading to the implementation shown in Fig. 2(c). Here, the one-cycle delay,  $z^{-1}$ , senses  $x_n + (w_n - y_n)$  and generates  $x_{n+1}$ . Interestingly, the function within the dashed box is simply an integrator, thereby revealing that the overall system acts as a first-order  $\Delta \Sigma$  modulator (DSM). That is, the DSM is a "poor man's" realization of the Newton–Raphson technique.

In retrospect, we could have directly conceived this idea: by placing the transmitter in a  $\Delta \Sigma$  loop, we can suppress the static and dynamic imperfections of the RF DAC. If the DAC errors are viewed as components appearing at its output, the high loop gain provided by the integrator substantially reduces them within some bandwidth. Nevertheless, our original idea, (1), in fact proves more powerful and allows us to further refine the transmitter architecture. Specifically, we will approximate  $1/g'(x_n)$  by a function that readily lends itself to hardware implementation, thus reducing the receive-band noise (Section V).

## V. PROPOSED TRANSMITTER ARCHITECTURE

From the developments in the previous section emerges the conceptual TX architecture depicted in Fig. 3. Since a  $\Delta\Sigma$  loop is realized as a negative-feedback system including



Fig. 4. (a) Uncorrected uniformly sized DAC constellation. (b) Nonuniformly sized DAC constellation. (c) Overall TX constellation.

at least one integrator, this architecture can also be viewed as a Cartesian feedback loop but with an integrator preceding the RF DAC. Here, the *I* and *Q* paths and their integrators form two  $\Delta \Sigma$  modulator loops. The TX output is downconverted, digitized, and subtracted from the input baseband signals,  $I_{in}$ and  $Q_{in}$ . Unlike digital predistortion techniques, this architecture requires no lookup tables, digital multipliers, or finite impulse response (FIR) filters.

It can be seen from Fig. 3 that the nonlinearity due to the interaction between the *I* and *Q* DACs is also suppressed by the DSM action: since the high loop gain ensures that  $I_F \approx I_{\rm in}$  and  $Q_F \approx Q_{\rm in}$ , the RF signal contains only the  $I_{\rm in}$  and  $Q_{\rm in}$  information. To assess the efficacy of the DSM loops and the 1/g' correction, we plot in Fig. 4(a)–(c) the simulated output signal constellations for the uncorrected



Fig. 5. (a) *I* branch showing the thermometer-code register. (b) Use of register as integrator. (c) Simplified architecture  $[I_{in}$  is a digital quantity and must be converted to analog form (Section VI)].

uniformly sized DACs, the nonuniformly sized DACs, and the overall TX, respectively.

#### A. Architecture Evolution

In this section, we describe a multitude of techniques that, step by step, simplify this transmitter, eventually leading to a compact, efficient architecture. The final signal processing machine preceding the RF DAC contains only 512 flip flops.

In the first step of architecture evolution, we note the DACs in Fig. 3 must be configured as a segmented topology [30], [31] (using nominally equal units) and be driven by a thermometer code. Segmentation proves essential here as it minimizes both the DNL and the output glitch energy [32]. We must then decide whether, in Fig. 3, binary-to-thermometer decoders should be interposed between the integrators and the DACs or the integrators themselves should be so realized as to generate a thermometer-code input. Fig. 5(a) depicts the situation for the I path; the Q path is similar.

An important observation comes to our aid here: a simple shift register holding a thermometer code can, in fact, act as an integrator if it receives a 1-bit input: when the input is +1,



Fig. 6. 1-bit oversampled integration of the error signal.

we shift the code up by one and insert another 1 at the bottom, and when the input is (equivalently) -1, we shift the code down by one. Thus, the explicit integrator in Fig. 5(a) can be omitted if the data is available in 1-bit form., i.e., as an oversampled sequence.

These thoughts lead to the conceptual arrangement shown in Fig. 5(b), where the baseband data is applied to a parallelto-serial converter (e.g., a multiplexer) and the feedback signal is digitized by means of an oversampling ADC. We now have 1-bit representations in both paths. Note that the overall TX feedback loop still operates as a  $\Delta \Sigma$  modulator.

Fig. 6 illustrates the ADC/integrator operation from a different perspective. Considering both blocks as oversampling functions, we suppose the ADC input is constant and equal to 10 mV, causing its output to be high for 1 clock cycle out of 100 cycles. The integrator, thus, increments by 1 every 100 cycles. Now, if the ADC input rises to 20 mV, its output remains high for 2 clock cycles out of every 100, yielding an increment of 2 at the integrator output. Thus, the integrator increment step is proportional to the ADC input. Further simplification is possible if we recognize that the parallelto-serial converter need not be a memoryless multiplexer and can alternatively be realized as an oversampling converter. For example, a digital  $\Delta \Sigma$  modulator can convert the multibit data at  $I_{in}$  to a 1-bit sequence. We can therefore "factor out" the oversampling modulators from the input and feedback paths and employ only one after the subtractor [Fig. 5(c)].

It is important not to confuse the oversampling ADC in Fig. 5(c) with the digital  $\Delta \Sigma$  action operating on the overall transmitter loop. The former is a simple 8-GHz digitizer (Section VI) while the latter traverses the input subtractor, this ADC, the shift register, the output DAC, and the feedback path. With a clock frequency of 8 GHz, the  $\Delta \Sigma$  loop can process input signal bandwidths as high as 40 MHz.

The architecture illustrated in Fig. 5(c) merits five remarks. First, the system requires no adaptation or training. After an initial settling time (about 50 ns in our work), the negativefeedback loop automatically corrects the DAC imperfections. Second, the oversampling converter digitizes the *difference* between  $I_{in}$  and  $I_F$  and hence its dynamic range can be much narrower than that of ADCs in Fig. 3. In our design, this transformation is equivalent to relaxing the ADC resolution by about 7 bits. Third, since  $I_F$  is an analog signal, so should be *I*<sub>in</sub>, requiring proper implementation (Section VII). Fourth, the loop oversampling ratio must be high enough to suppress the DAC imperfections in the adjacent channels and in the RX band. Similarly, the ADC resolution and oversampling ratio must be chosen as to ensure acceptable quantization noise. Fifth, in more demanding applications, the system in Fig. 5(c) can accommodate digital predistortion in the form of a lookup table interposed between the register and the DAC.



Fig. 7. Output spectra for different values of  $f_{CK}$ .

One issue in the TX architecture of Fig. 5(c) is the low loop gain, an effect that can substantially degrade the performance. In Section VII, we explain the cause of this low gain and introduce a simple method of compensating it.

As seen in subsequent sections, the analog circuitry in the transmit baseband path consists of only a compact 8-bit binary-weighted resistor DAC and four comparators for I,  $\overline{I}$ , Q, and  $\overline{Q}$  signals, with no need for an explicit subtractor. Note that a completely digital implementation would require a subtractor running at 8 GHz.

We should make a remark about the downconversion mixers. With a maximum single-ended RF output swing of 3.5  $V_{pp}$ , the passive mixer is preceded by a voltage divider having an attenuation factor of 5. Also, a 2.5 k $\Omega$  in series with the mixer improves its linearity. The resistance of the switch itself varies from 85 to 140  $\Omega$  for a 0.5  $V_{pp}$  input swing. As a result, the ACPR at the mixer output is kept above 45 dB. The mixer noise referred to the TX output is -131 dBm/Hz.

#### B. Choice of Loop Oversampling Ratio

We expect tradeoffs among various parameters in the architecture of Fig. 5(c), e.g., the DAC resolution and raw nonlinearity, the loop oversampling clock rate,  $f_{CK}$ , and the ADC's oversampling ratio. Based on practical DAC design issues (Section VII), we assume for it a resolution of 8 bits and the INL profile presented in Section VII. Higher resolutions lead to excessive complexity in the TX layout, and design for higher linearity degrades the power efficiency. We wish to determine the minimum acceptable  $f_{CK}$ . Our analysis employs a transistor-level model for the DAC and is performed in Agilent's ADS for its efficient frequency-domain analysis. For now, we assume the oversampling ADC has an infinite resolution.

Fig. 7 plots the simulated TX output spectra for both open-loop and closed-loop operation. We observe that the latter exhibits an adjacent channel power of -29, -31, and -33 dB as  $f_{CK}$  rises from 1 GHz to 2 GHz to 4 GHz, respectively. From simulations, we also arrive at the corresponding results for the RX band noise, obtaining -118, -121, and -124 dBm/Hz, respectively.

At the extreme peaks of the output signal, the integrator output momentarily exceeds the full scale of the DAC, equivalently causing clipping in the data. This instantaneous effect is not corrected by  $\Delta \Sigma$  loop and becomes the limiting factor in the improvements observed in Fig. 7.

This analysis suggests that even with  $f_{CK} = 4$  GHz, we do not meet the WCDMA specifications of RXBN = -125 dBm/Hz. We must also revisit these results after we include the oversampling ADC's nonidealities.

#### C. Architecture Refinement

In this section, we introduce a refinement to the TX architecture of Fig. 5(c) that substantially improves the performance, thus easing the design of the building blocks. Our focus is on the ACPR and the RX-band noise. Let us return to the Newton–Raphson method expressed by (1) and ask whether the approximation for  $g'(x_n)$  can be improved from a constant value to a function that still lends itself to efficient implementation. The Newton–Raphson method requires that the error, g(x) = f(x) - w, be multiplied by 1/g' in each iteration, leading to the architecture depicted in Fig. 8(a). We factor out this coefficient and insert it in the input and output paths [Fig. 8(b)]. The output is now equal to y/g' rather than the desired output, y, but if we multiply the main input by g', the output changes back to y. That is, the 1/g' factor after w should be removed.

We now turn to the 1/g' factor following the DAC and seek a hardware-efficient implementation for it. Specifically, we explore the possibility of merging the two. Denoting the transfer characteristic of the cascade by P(x), we make two observations: (1) a 1-LSB increase in x produces a change of P(x + 1 LSB) - P(x) at the TX output, which can be considered the derivative of P with respect to x; and (2) according to simulations, the derivative of P(x) behaves as shown in Fig. 8(c). [For a given DAC design, P(x) =f(x)/f'(x) is known.] We should point out that x represents the digital baseband I (or Q) component. Owing to the interaction between I and Q paths of the RF DAC, P'(x)displays different trajectories for different Q values. It is unclear at this point how P'(x) solves the 1/g' problem, but we show below that P'(x) prescribes the manner in which the DAC cells must be sized.

Let us see how the 1/g' block can be absorbed by the DAC. The first observation made above must apply to the new DAC as well: in response to a 1-LSB increment in x, its output must change by an amount equal to P'(x). This change is created by turning on one more DAC unit. With the P' shape depicted in Fig. 8(c), we predict that the DAC output increment should be smaller for low x values and larger for high x values. Correspondingly, the DAC unit cells should be "weaker" for low x values and "stronger" for high x values. In principle, we can taper the units according to the shape of P'(x), but the DAC design is greatly simplified if we approximate P'(x) by a staircase function [Fig. 8(c)]. Specifically, we choose scaling factors equal to 0.25, 0.375, 0.5, 0.75, 1, 2, and 4 for  $1 \le$  $x \le 64$ ,  $64 < x \le 80$ ,  $80 < x \le 160$ ,  $160 < x \le 192$ ,  $192 < x \le 208$ ,  $208 < x \le 240$ , and  $240 < x \le 256$ , respectively.



Fig. 8. (a) Inclusion of 1/g'(x) to improve the accuracy of Newton–Raphson technique. (b) Transformation showing that 1/g'(x) can be included in the DAC. (c) Behavior of P'(x) and its staircase approximation. (d) Simulated output spectra showing reduction of ACPR.

While this approximation appears rather coarse, it is selected so as to avoid significant complexity in the DAC layout and the routing of the signals.

Derived from the Newton–Raphson technique, this free modification of the RF DAC reduces the ACPR by 4 dB and the RX-band noise by 3 dB. Fig. 8(d) plots the TX output spectrum before and after nonuniform sizing of the DAC units. In this simulation, the DAC output delivers an average power of +24 dBm, exercising the RF DACs' full scale. To appreciate the significance of this "no-cost" improvement, we note that a 4-dB reduction in ACPR is equivalent to about a factor of  $10^{4/20} \approx 1.6$  decrease in the INL, which would be otherwise difficult to obtain without compromising the DAC efficiency. It is possible to approximate the P'(x) curves in Fig. 8(b) by different and smoother functions, but at the cost of a much more complex layout. The arrangement in Fig. 8(b) is the final architecture except that the baseband input, w, must be converted to analog form (Section VI). Also, the 1/g'block at the input is omitted. It is important to distinguish between the proposed architecture and the Cartesian feedback topology [33]. First, to our knowledge, the latter has not employed  $\Delta \Sigma$  modulation to provide a very high loop gain near the carrier while maintaining stability. Second, it is the Newton-Raphson perspective that eventually leads to the architecture refinement shown in Fig. 8(b); Cartesian feedback simply would not predict this concept. Third, even if Cartesian feedback were to include an integrator, the digital realization would face the issues outlined in Sections V. For example, the high oversampling ratio in the digital domain would translate to substantial power penalty. Our proposed solutions in Fig. 5 address these issues.

It is worth mentioning that the nonuniform sizing introduced here is different from that in [24], which implements an  $f^{-1}$  function to approach an overall linear characteristic in a static system. In our case, on the other hand, the Newton-Raphson technique suggests sizing according to P'(x) = d[f(x)/f'(x)]/dx.

## VI. OVERSAMPLING ADC

# A. $\Delta$ Modulator as ADC

The oversampling ADC in Fig. 5(c) plays a critical role in the overall TX performance in terms of ACPR and the receive-band noise. We propose the use of a highly oversampled  $\Delta$  modulator as an ADC. As explained below, the simplicity of the circuit, along with several new ideas, affords an efficient solution.

Fig. 9(a) shows a simple  $\Delta$  modulator, where the high gain of the comparator ensures that the output's running average, produced by the feedback RC network, tracks the analog input. For our subsequent design work, we must formulate the quantization noise spectrum of the output. The comparator itself acts as a 1-bit quantizer, exhibiting a total quantization noise power equal to  $\Delta^2/12$ , where  $\Delta = V_{\text{DD}}$  (Appendix).

The low-pass feedback loop around the comparator creates a high-pass shaping function for the noise. Fig. 9(b) depicts a linear model for the modulator, indicating that the quantization noise, Q, is shaped by

$$\frac{V_{\text{out}}}{Q} = \frac{1}{1 + \frac{A_0}{R_1 C_1 s + 1}} = \frac{R_1 C_1 s + 1}{R_1 C_1 s + 1 + A_0}.$$
(3)

We thus expect a suppression factor of  $1 + A_0$  for noise frequencies below  $1/(2\pi R_1 C_1)$ .



Fig. 9. (a) Simple  $\Delta$  modulator. (b) Quantization noise model. (c) Circuit's waveforms.

We must now address two questions: 1) how much is  $A_0$ ? and 2) how do we select  $R_1$   $C_1$ ? For the former, we first note that the gain of a 1-bit quantizer depends on its input amplitude. Let us observe that the high loop gain produces a small difference between  $V_{in}$  and the running average that appears in  $V_F$  in Fig. 9(a) if the input frequency is sufficiently smaller than  $f_{CK}$ . That is, the comparator does not see a significant differential voltage related to  $V_{in}$ . However,  $V_F$  still experiences moderate changes due to the output rail-to-rail swings [Fig. 9(c)]. The triangular waveform at  $V_F$  exhibits a peak swing of approximately  $[V_{DD}/(4R_1 C_1)](T_{CK})$ , which we assume much greater than the difference seen by the comparator due to the analog input. To find the gain, we view the comparator as an amplifier that senses this triangular waveform and generates an output first harmonic amplitude equal to  $2V_{\rm DD}/\pi$ . Finding the first harmonic of the triangular waveform, we define the gain as the ratio of the output and input fundamental amplitudes

$$A_{0} \approx \frac{2V_{\rm DD}/\pi}{(8/\pi^{2})V_{\rm DD}/(4R_{1}C_{1}f_{\rm CK})} \approx \pi R_{1}C_{1}f_{\rm CK}.$$
(4)

While intuitive, the forgoing calculation of  $A_0$  tends to overestimate the gain. In Appendix, we formulate  $A_0$  using a different approach and observe that  $A_0$  is closer to  $R_1C_1f_{CK}$ . In practice,  $A_0$  depends on the input signal statistics and lies in this range. We hereafter conservatively assume that  $A_0 \approx R_1C_1f_{CK}$ .

The comparator's quantization noise spectrum,  $\Delta^2/(12 f_{\text{CK}})$ , is divided by  $(1 + A_0)^2$  up to a frequency of  $1/(2\pi R_1 C_1)$ , emerging as

$$S_Q(f) \approx \frac{V_{\text{DD}}^2}{12R_1^2 C_1^2 f_{\text{CK}}^3} \text{ for } f < \frac{1}{2\pi R_1 C_1}.$$
 (5)

For example, if  $1/(2\pi R_1 C_1) = 250$  MHz and  $f_{CK} = 4$  GHz, we have  $S_Q(f) \approx 2.2 \times 10^{-12} \text{ V}^2/\text{Hz} \equiv -87$  dBm/Hz at  $R = 1 \Omega$ . This is the quantization noise in the  $\Delta$  modulator output. To refer this noise to the TX output in Fig. 5(c), we must divide it by the gain from Y to E through the feedback path. We return to this point in Section VI-C.

Equation (5) makes it desirable to maximize the value of  $R_1 C_1$ , but an excessively low corner frequency in the feedback path attenuates the signal of interest in  $V_F$  (the running average), affecting the information carried by  $V_{out}$ . Since in our TX environment, both the baseband signal and the RX-band noise are of interest, we choose  $1/(2\pi R_1 C_1) \approx 250$  MHz.

# B. Circuit Refinements

The architecture and circuit developments in the previous sections have assumed an 8-bit RF DAC with an INL of about 40% and a 4-GHz  $\Delta$  modulator acting as the oversampling ADC in Fig. 5(c). To meet the performance specifications described in Section III, the TX loop must reduce the DAC INL to about 5%. Moreover, the  $\Delta$  modulator quantization noise spectral density must be further lowered by about 6 dB. In this section, we introduce a multitude of new circuit techniques that dramatically improve the performance. The quantization noise reductions presented here relate to in-band and RX-band, with the spectrum assumed flat across this range.

In order to reduce the  $\Delta$  modulator quantization noise, we double the effective oversampling rate by interleaving two  $\Delta$  modulators. The circuit in Fig. 10(a) employs two StrongArm comparators [34] while running with  $f_{CK} = 4$  GHz. Each comparator draws only 250  $\mu$ W. The doubling of the sampling rate here is not equivalent to doubling the clock rate of a single  $\Delta$  modulator. This is because it only halves the quantization noise floor of the comparator  $(\Delta^2/12/f_{\rm CK})$ ; however, the input swing of the comparator remains the same, and hence  $A_0$  does not change. Therefore, it is expected to lower  $S_Q$  by 3 dB. Simulations show that the quantization noise drops by 1.2 dB. This is due to the inaccurate assumption of white quantization noise for a 1-bit comparator in an oversampling ADC [35]. In fact, the quantization noise floor of the comparators in Figs. 9(a) and 10(b) is not completely flat, and therefore, the  $S_Q$  reduction is not 3 dB.

As observed in the derivation leading to (4), the openloop gain of the  $\Delta$  modulator can potentially increase if we attenuate the clock swing sensed by the comparator. We now present three methods for this purpose. In the interleaved circuit of Fig. 10(a), we recognize that the two comparator outputs carry the first clock harmonic with opposite signs and the signal of interest with the same sign. We therefore feed the output of each comparator to the input of the other [Fig. 10(b)], thereby reducing the clock swings at their inputs. This lower input swing results in higher gain ( $A_0$ ).

It is interesting to compare the performance improvement afforded in Fig. 10(a) and (b). In the former, the clockinduced swing returning to the comparator input corresponds to the first harmonic of  $f_{CK}$  and in the latter, to the second. Since the second harmonic is further attenuated (by 6 dB), the comparator gain is about 6 dB higher in Fig. 10(b). With  $R_1 = \cdots = R_4$ , simulations indicate that the quantization noise falls by another 5.3 dB. We also interpose a passive notch filter between the feedback network and each comparator's input, with the notch frequency chosen equal to



Fig. 10. (a) Time-interleaved  $\Delta$  modulators. (b) Addition of  $R_3$  and  $R_4$  to attenuate clock swings. (c) Use of notch filter to attenuate clock swings.

4 GHz [Fig. 10(c)]. With this change, the input swing of the comparator due to the 4-GHz clock decreases and the  $\Delta$  modulator's quantization noise drops by another 0.8 dB, reaching  $4.1 \times 10^{-13} \text{ V}^2/\text{Hz} \equiv -94 \text{ dBm/Hz}$  at  $R = 1 \Omega$ .

### C. $\Delta$ Modulator With High Gain

The next modification of the  $\Delta$  modulator deals with the low loop gain of the overall TX architecture. We first describe the cause of the low gain. To measure the loop transmission, we break the loop at the mixer output in Fig. 5(c), apply a 1-mV step to the subtractor input, and examine the DAC output voltage sensed by the mixer. For simplicity, we assume a mixer conversion gain of unity. The oversampling ADC (the  $\Delta$  modulator) generates a periodic sequence consisting of one pulse of height  $V_{DD}$  (= 1 V) and 999 low levels so as to deliver an average value equal to 1 mV. The register thus increments by 1 LSB every 1000 clock cycles, producing at the DAC output a staircase voltage with a slope of  $V_{\text{LSB,DAC}}/(1000T_{\text{CK}})$ , where  $V_{\text{LSB,DAC}}$  denotes the DAC output voltage LSB size. More generally, for a step of  $\Delta V$  at the subtractor input in Fig. 3, the DAC output has a slope of  $(V_{\rm LSB,DAC}/V_{\rm DD})(\Delta V/T_{\rm CK})$ . The discrete-time loop transmission is therefore given by  $(V_{\text{LSB,DAC}}/V_{\text{DD}})[z^{-1}/(1-z^{-1})]$ , implying a gain of  $V_{\text{LSB,DAC}}/V_{\text{DD}}$  for the integrator. If the DAC output full-scale voltage is comparable to  $V_{DD}$ , then this factor is around 1/256 for an 8-bit DAC, degrading the TX loop's ability to correct the DAC distortion.

Since the overall loop consists of the  $\Delta$  modulator, the register, the DAC, and the downconversion mixer, we have few options for introducing a gain of 200–300 to compensate for the integrator loss. If realized by a conventional amplifier, such a high gain would entail severe nonlinearity and noise issues.



Fig. 11. (a)  $\Delta$  modulator having a closed-loop gain of  $1 + R_1/R_M$ . (b) Simulated output spectrum showing the gain.

We thus propose a new amplification method that simply draws upon the  $\Delta$  modulator's comparator.

Illustrated in Fig. 11(a), the idea is to view the comparator as a high-gain amplifier and place a resistive network around it to obtain a low-frequency closed-loop gain of  $1 + R_1/R_M$ . We select  $(R_1||R_M)C_1$  according to the desired corner frequency and  $R_1/R_M \approx 200$  to achieve a high closed-loop gain.

The topology resembles a high-gain feedback amplifier except that the comparator acts as a discrete-time circuit running at a high oversampling ratio. Fig. 11(b) plots the simulated input and output spectra of the high-gain  $\Delta$  modulator with  $V_{in} = (2 \text{ mV}) \cos(2\pi \times 15.875 \text{ MHz} \times t)$ ,  $R_1 = 300 \text{ k}\Omega$ ,  $R_M = 1.5 \text{ k}\Omega$ , and  $C_1 = 1 \text{ pF}$ . We observe a gain of about 46 dB. Simulations also indicate little change in the harmonic distortion at the  $\Delta$  modulator output when dc gain is raised from 1 to 200.

We should mention that the comparator offset,  $V_{OS}$ , is compensated by the overall TX loop because  $V_{OS}$  appears before the integrator. For the integrator output to remain bounded, the feedback path in Fig. 5(c) must bring to the subtractor an offset exactly equal to  $V_{OS}$ .

The foregoing modifications dramatically reduce the  $\Delta$  modulator's quantization noise as it is referred to the TX output. With a gain of 200 and a mixer loss of 15 dB, the -94-dBm/Hz noise calculated in the previous section is attenuated by approximately 31 dB when referred to the TX output, well exceeding WCDMA specification of -125 dBm/Hz.

# D. $\Delta$ Modulator With Baseband Inputs

We must still address an issue raised in Section V: in Fig. 5(c), we must somehow subtract the analog output of the mixer from the digital baseband data. To this end,



Fig. 12. (a) Conversion of digital baseband signal,  $I_{in}$ , to analog form. (b) Differential interleaved  $\Delta$  modulators. (c) Simplified topology. (d) Connection to the rest of the transmitter.

we convert node X in Fig. 11(a) to a virtual ground by grounding the other input of the comparator, utilize a binary-weighted resistor DAC to convert  $I_{in}$  to an analog current, and inject the result into X [Fig. 12(a)]. Similarly, the mixer output,  $V_{mix}$ , is summed with  $I_{in}$  in the current domain. In this work, the unit resistor,  $R = 2.9 \text{ k}\Omega$ , is chosen large enough so that resistor mismatches still allow a monotonic behavior for the DAC. The unit resistor used in the DAC has a width of 0.4  $\mu$ m and a length of 4.26  $\mu$ m, which according to Monte Carlo simulations yields an INL of about 0.3 LSB. These DACs occupy an area of 1000  $\mu$ m<sup>2</sup>, about 1% of the total chip area.

Note that  $R ||(2R)|| \cdots ||(2^6R)||(2^7R)||(2^7R)||R_{\text{mix}}$  acts as  $R_M$  in Fig. 11(a) and, along with  $R_1$ , defines the closed-loop gain. With a Thevenin equivalent value of 1.5 k $\Omega$ , these resistors contribute a thermal noise density of -133 dBm/Hz at the TX output. Similarly,  $R_1$  and  $R_M$  contribute -156 dBm/Hz.

We now describe another simplification in the  $\Delta$  modulator design. The  $\Delta$  modulator/subtractor studied thus far has single-ended inputs. Shown in Fig. 12(b) is the fully differential, interleaved circuit. For simplicity, the baseband input is denoted by BB and its DAC by a single resistor in a dashed box. An interesting question that arises here is whether we can short the virtual ground nodes  $X_1$  and  $X_2$ . These nodes carry the desired signal with the same polarity and the clock waveform with opposite polarities. Thus, shorting  $X_1$  to  $X_2$  and  $Y_1$  to  $Y_2$  removes the odd harmonics of the clock, reduces the clock swings at these nodes, and hence increases the open-loop gain of the comparators. According to simulations, this method raises the signal-to-quantizationnoise ratio (SQNR) at the output by 6 dB. The topology can be further simplified if the multiplexer is inserted within the feedback loop, as shown in Fig. 12(c). The interface between this circuit and the rest of the transmitter chain is illustrated in Fig. 12(d).

# VII. RF DAC DESIGN

As the most power-hungry block in a transceiver, the RF DAC merits extensive design iteration and optimization. In contrast to conventional PA design, our methodology maximizes the efficiency for the peak desired output power with no emphasis on the DAC nonlinearity; the  $\Delta \Sigma$  modulator loop effectively removes the resulting static and dynamic distortion.

Each DAC cell in the I and Q paths must translate the baseband data to RF and convert the resulting voltage to current. Fig. 13(a) depicts a simple implementation where the two paths operate with 25%-duty-cycle LO waveforms and meet in the current domain. But we can also view the output current combining operation as an OR function and, since only one output transistor is on at a time, we move this function to the digital domain [Fig. 13(b)] [18], [19], [36]–[39]. This merging of I and Q DACs halves the area and the output capacitance.

The merged DAC consists of 256 differential cells, each implemented as shown in Fig. 13(c). To maximize the efficiency, the DAC output stage and the off-chip matching network are designed for class-E operation, but at the cost of a single-ended drain voltage swing of  $3.5 V_{pp}$ . Thus, the unit cells employ triple cascodes, with  $M_3$  and  $M_6$  realized by thick-oxide transistors. Note that the 1.8-V supply tied to gates need not deliver any dc current and can be generated on chip by a charge pump. (Our experimental prototype uses an external 1.8-V supply.) In the CMOS technology used here, the drain-bulk voltage of the thick-oxide devices is allowed to reach  $2V_{DD=3.6 V}$ . According to simulations, the peak voltage is 3.5 V.

Beyond class-E operation, the series resistance of  $M_1-M_3$ and  $M_4-M_6$  in Fig. 13(c) determines the efficiency, as these devices must act as switches rather than current sources. On the other hand, wider transistors translate to greater input and output capacitances, with the former directly reducing the efficiency. Viewing the three NAND gates preceding  $M_1$ 



Fig. 13. (a) Conceptual slice of RF DAC cell. (b) Merging of output transistors. (c) Complete RF DAC cell. (d) Simulated DAC INL as a function of input.

(and  $M_4$ ) as a "predriver," we note that their power dissipation, given by  $f C V_{DD}^2$ , rises with  $W_1$ . In this work, the total width of the input transistor on each side is about 5.7 mm, leading to a total predriver power of about 12 mW at 2 GHz.

Fig. 13(d) plots the simulated INL with uniform sizing, revealing about 45% of nonlinearity. Recall from Section V that, to alleviate the situation, the DAC units are scaled by factors of 0.25, 0.375, 0.5, etc. A width of 6.25  $\mu$ m is chosen for the scaling factor of 0.25 and used in cells number 1 to number 64. Fig. 14 plots the simulated drain voltage waveforms of  $M_3$  and  $M_1$  in Fig. 13(c).

We should point out that the RF DAC introduces only two artifacts, namely spectral replicas, which are spaced by 8 GHz, and power in the adjacent channels and in the receive band. The DAC does not produce quantization noise beyond that generated by the  $\Delta$  modulator and suppressed by the digital integrator.

#### VIII. EXPERIMENTAL RESULTS

The complete transmitter has been fabricated in TSMC's 28-nm CMOS technology. It consists of four  $\Delta$  modulators,



Fig. 14. Drain voltage waveform of  $M_3$  and  $M_1$  in Fig. 13(c).





Fig. 16. Output matching network.

two registers, each containing 256 flipflops, the merged RF DAC, two downconversion mixers, and I/Q clock generation with 25% duty cycle. Receiving 8-bit baseband digital I and Q inputs, the TX operates with a 1-V supply, except for the gates of the thick-oxide devices in Fig. 13(c), which are tied to 1.8 V. Fig. 15 shows the die photograph with an active area of 0.35 mm × 0.32 mm. The power drawn by the building blocks is as follows: 12 mW by all of the NAND gates immediately preceding the RF DAC cells, 3.1 mW by the first rank of NAND gates, 1 mW by the  $\Delta$  modulators, 1 mW by the shift register.

The TX generates differential outputs, which travel through multiple bond wires and tapered transmission lines on the printed-circuit board for matching and differential to single-ended conversion. The ON-chip and OFF-chip matching components are shown in Fig. 16. A binomially tapered [40] transmission line in the form of stacked metal layers consists of a first section with  $Z_0 = 4.5 \ \Omega$  and a second section with  $Z_0 = 22.4 \ \Omega$ . Both sections have a length of 90° at 2 GHz. Each output passes through 3 bond wires, whose net inductance is included as part of the matching network. While the DAC output pulses can have a 25% or 50% duty cycle, the matching network is tuned to the latter because the output



Fig. 17. Measured (a) output spectrum, ACPR, RXBN, and (b) constellation.

power is higher when the duty cycle is 50%. Also, due to QPSK constellation, the duty cycle is more frequently 50% than 25%. The chip is mounted on a grounded copper patch on the PC board so as to reduce the thermal resistance. The TX is measured with digital baseband data that has been subjected to root-raised-cosine filtering in an FPGA. All of the clock phases are generated on-chip by means of frequency dividers and logic from an external 8-GHz input. The clock frequencies and phases used in this work are as follows. The shift register is driven by complementary 8-GHz clocks, the delta modulators by complementary 4-GHz clocks. The last two sets are derived from the 8-GHz clock by means of on-chip dividers.

The results reported here are in the context of WCDMA. This prototype achieves a loop gain of 25 dB at  $\pm 20$  MHz around the carrier and can potentially accommodate wider bandwidths. However, our baseband DACs are followed by a low-pass filter to ensure that their quantization noise does not raise the RX-band noise. This limits our ability to test the prototype with higher data rates. Simulations show that in the absence of this filter, the EVM for a 10-MHz 16-QAM signal is 1.5% and the ACPRs are -42 and -52 dB.

Fig. 17(a) and (b) shows the measured output spectrum and constellation. The output power is 24.1 dBm (including 2.7 dB loss due to cables, connectors, and bond wires). The ACPRs in the adjacent and alternate adjacent channels are equal to -35.4 and -44.6 dB, respectively, exceeding the WCDMA specifications. The receive-band noise is -150 dBc/Hz at 130-MHz offset. Under these conditions, the efficiency



Fig. 18. Measured (a) EVM, (b) ACPR, and (c) efficiency as a function of output power.

is 50%. As mentioned in Section III, this standard poses other spurious emission constraints that are not considered in this article.

Fig. 18(a)–(c) respectively plot the EVM, ACPR, and efficiency as a function of the output power for a WCDMA input signal with QPSK modulation, 3.84-MHz bandwidth, and PAPR of 3.4 dB. The EVM remains much less than the WCDMA specification,<sup>1</sup> and ACPR<sub>1</sub> and ACPR<sub>2</sub> reach a minimum of -53 and -58 dB, respectively. The EVM

<sup>&</sup>lt;sup>1</sup>As a result of the root-raised-cosine filtering required by WCDMA, the QPSK signal experiences intersymbol interference, which, due to the TX nonlinearity, manifests itself in the EVM.

TABLE I Performance Summary and Comparison to Prior Art

| Reference              | [41]                    | [42]                | [43]           | [44]             | This Work     |
|------------------------|-------------------------|---------------------|----------------|------------------|---------------|
| Frequency (GHz)        | 1.95                    | 1.98                | 1.88           | 0.8              | 2             |
| P <sub>out</sub> (dBm) | 28                      | 28                  | 29.8           | 22.9             | 24.1          |
| PAE (%)                | 39 <sup>1</sup>         | 35.1                | 40.8           | 26.1             | <b>50</b> ⁵   |
| Supply (V)             | 1.2/3.7                 | 3.4                 | 3.4            | 1.2/2.4          | 1/1.8         |
| Output Matching        | ext. tuner <sup>2</sup> | off-chip            | IPD die        | on-chip          | off-chip      |
| Standard               | WCDMA                   | WCDMA               | WCDMA          | WiFi             | WCDMA         |
| ACPR (dBc)             | -41/-52                 | -42/NA              | -36/-50        | N/A              | -35/-45       |
| EVM (%)                | 7 <sup>3</sup>          | N/A                 | 3.5            | 5.4 <sup>4</sup> | 4.7           |
| RXBN (dBc/Hz)          | N/A                     | N/A                 | N/A            | N/A              | -150          |
| Technology             | 90-nm<br>CMOS           | CMOS/GaAs<br>hybrid | 153–nm<br>CMOS | 55–nm<br>CMOS    | 28–nm<br>CMOS |
| Calibration            | None                    | None                | None           | Foreground       | Background    |

1 PAE is obtained without output matching loss

2 This work uses on-chip balun but external tuner as load

3 For 20-MHz 16-QAM LTE uplink signal

4 For 20-MHz 64-QAM WLAN, using off-chip predistortion 5 System efficiency including power of all the building blocks

TABLE II

PERFORMANCE SUMMARY AND COMPARISON TO PRIOR ART

| Reference               | [45]                        | [46]                        | [24]                    | [47]       | This Work                   |
|-------------------------|-----------------------------|-----------------------------|-------------------------|------------|-----------------------------|
| Frequency (GHz)         | 2                           | 2.2                         | 2                       | 2.35       | 2                           |
| P <sub>out</sub> (dBm)  | 14.5                        | 24                          | 7.8                     | 20.6       | 20.9 <sup>3</sup>           |
| PAE (%)                 | 12.2                        | 16                          | 12.6                    | 16.7       | 33.2 <sup>3</sup>           |
| ACPR (dB)               | -30/-33 <sup>1</sup>        | N/A                         | -41/-50                 | -28/-29    | -51/-57 <sup>3</sup>        |
| Modulation              | 64 QAM                      | 64 QAM                      | 64 QAM                  | 64 QAM     | QPSK                        |
| EVM (%)                 | <b>4</b> <sup>1</sup>       | 3.5 <sup>1</sup>            | 1.8                     | 1.6        | <b>3</b> <sup>3</sup>       |
| Output Matching         | on-chip                     | on-chip                     | on-chip                 | on-chip    | off-chip                    |
| CMOS Technology         | 65 nm                       | 28 nm                       | 40 nm                   | 65 nm      | 28 nm                       |
| Noise Floor<br>(dBc/Hz) | -123 @<br>140 MHz<br>offset | -149 @<br>140 MHz<br>offset | N/A                     | N/A        | -150 @<br>130 MHz<br>offset |
| Calibration             | Foreground                  | Foreground                  | Foreground <sup>2</sup> | Foreground | Background                  |

1 Requires off-chip 2-dimensional predistortion

2 Foreground calibration in case of load variation

3 Refer to Fig. 18

is limited by the filtering necessary to reduce the baseband DACs' quantization noise. The present prototype operates with a digital input rate of about 250 MHz (limited by the FPGA providing the data). To reduce the quantization noise and allow less filtering, the digital input rate (i.e., the upsampling ratio) can be increased. The efficiency begins from about 4% at  $P_{out} = +5 \text{ dBm}$  and sharply rises to 50%.<sup>2</sup> The output power in Fig. 18(c) is the average value with an actual WCDMA signal rather than the peak value with an unmodulated tone. That is, the TX delivers a WCDMA signal with an average power of +24.1 dBm. The efficiency is based on the overall TX power consumption and is, therefore, the system efficiency. This TX delivers a peak power of 27.8 dBm with a peak efficiency of 69%. For output levels below +5 dBm, the ACPR is dominated by the baseband DAC quantization noise as the output power is directly controlled by the digital baseband inputs.

Table I summarizes the performance of our TX and compares it with prior-art transmitters in the range of 0.8–2 GHz. We have achieved the highest efficiency.

Table II compares our RX-band noise with those of transmitters in the vicinity of 2 GHz. A far higher efficiency can be observed for our prototype. Note that only our work corrects the nonlinearity in the background.

#### IX. CONCLUSION

A simplified implementation of the Newton–Raphson equation solver leads to a TX embedded in a  $\Delta \Sigma$  modulator loop. Moreover, the high-speed feedback ADCs necessary for background calibration can be realized as 1-bit  $\Delta$  modulators. With other simplifications, a compact, highly efficient architecture emerges that can serve in multiple standards.

#### APPENDIX

In this Appendix, we describe another approach to computing the gain of the comparator in the  $\Delta$  modulator of Fig. 9(a). The gain can be defined so as to ensure a zero correlation between the output quantization noise, Q, and the comparator input differential voltage, X [48]. Expressing the comparator output as

$$Y = A_0 X + Q \tag{6}$$

we find the correlation by writing

$$E[XQ] = E[X(Y - A_0X)] = E[XY] - A_0E[X^2].$$
 (7)

Thus, E[XQ] = 0 if

$$A_0 = \frac{E[XY]}{E[X^2]}.$$
 (8)

For simplicity, let us assume that the comparator output swings between  $-V_{\text{DD}}/2$  and  $+V_{\text{DD}}/2$ . With  $V_{\text{in}} = 0$  in Fig. 9(a), *Y* toggles at a rate of  $f_{\text{CK}}/2$ , creating a triangular wave at the other input with a peak amplitude of  $[V_{\text{DD}}/(4R_1C_1)]T_{\text{CK}}$ . Note that  $X = V_{\text{in}} - V_F = -V_F$ . Since the comparator acts as a discrete-time circuit, the correlation between its input and output must be calculated at the sampling points, namely, at the positive and negative peaks of the triangular wave. We thus have

$$E[XY] = \frac{V_{\rm DD}}{4R_1C_1}T_{\rm CK}\frac{V_{\rm DD}}{2} + \frac{-V_{\rm DD}}{4R_1C_1}T_{\rm CK}\frac{-V_{\rm DD}}{2}.$$
 (9)

Also,

$$E[X^{2}] = \left(\frac{V_{\rm DD}}{4R_{1}C_{1}}T_{\rm CK}\right)^{2} + \left(-\frac{V_{\rm DD}}{4R_{1}C_{1}}T_{\rm CK}\right)^{2}.$$
 (10)

It follows that

$$A_0 = 2R_1 C_1 f_{\rm CK}.$$
 (11)

The forgoing calculation assumes that in Fig. 9(a),  $V_{in} = 0$ . If, for example,  $V_{in} = V_{DD}/4$ , then  $A_0 \approx R_1 C_1 f_{CK}$ . With a time-varying input, and depending on its statistics,  $A_0$  has an average value in the range of  $R_1C_1f_{CK}$  and  $2R_1C_1f_{CK}$ .

According to [49], the quantization noise of a 1-bit quantizer can be approximated by  $\Delta^2/12$ , where  $\Delta$  denotes the total height of the quantizer's bang-bang characteristic. In our case,  $\Delta = V_{\text{DD}}$ , yielding  $\Delta^2/12 = V_{\text{DD}}^2/12$ .

#### ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle Program for chip fabrication.

 $<sup>^{2}</sup>$ It is possible to increase the output power range by raising the resolution of the baseband DACs.

#### REFERENCES

- S. Luschas, R. Schreier, and H.-S. Lee, "Radio frequency digitalto-analog converter," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1462–1467, Sep. 2004.
- [2] R. Staszewski et al., "All-digital PLL and transmitter for mobile phones," IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [3] A. Jerng and C. G. Sodini, "A wideband ΔΣ digital-RF modulator for high data rate transmitters," *IEEE J. Solid-State Circuits*, vol. 42, no. 8, pp. 1710–1722, Aug. 2007.
- [4] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, "A digitally modulated polar CMOS power amplifier with a 20-MHz channel bandwidth," *IEEE J. Solid-State Circuits*, vol. 43, no. 10, pp. 2251–2258, Oct. 2008.
- [5] M. Collados, P. T. M. van Zeijl, and N. Pavlovic, "High-power digital envelope modulator for a polar transmitter in 65 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2008, pp. 733–736.
- [6] C. D. Presti, F. Carrara, G. Palmisano, and A. Scuderi, "A highresolution 24-dBm digitally-controlled CMOS PA for multi-standard RF polar transmitters," in *Proc. Eur. Solid-State Circuits Conf.*, Sep. 2008, pp. 482–485.
- [7] D. Chowdhury, L. Ye, E. Alon, and A. M. Niknejad, "A 2.4 GHz mixed-signal polar power amplifier with low-power integrated filtering in 65 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2010, pp. 1–4.
- [8] K. H. Seah and M. Y. W. Chia, "A digital polar amplifier for ultrawideband with dynamic element matching," in *Proc. IEEE Int. Conf. Ultra-Wideband*, Sep. 2010, pp. 1–4.
- [9] D. Chowdhury, L. Ye, E. Alon, and A. M. Niknejad, "An efficient mixedsignal 2.4-GHz polar power amplifier in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1796–1809, Aug. 2011.
- [10] W. Khalil, J. McCue, B. Dupaix, W. Gaber, S. Smaili, and Y. Massoud, "On the design of RF-DACs for random acquisition based reconfigurable receivers," in *Proc. IEEE Int. Symp. Circuits Syst.*, Jun. 2014, pp. 610–613.
- [11] C. Lu et al., "A 24.7 dBm all-digital RF transmitter for multimode broadband applications in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 332–333.
- [12] W. M. Gaber, P. Wambacq, J. Craninckx, and M. Ingels, "A CMOS IQ direct digital RF modulator with embedded RF FIR-based quantization noise filter," in *Proc. ESSCIRC*, Sep. 2011, pp. 139–142.
- [13] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, "A wideband 2×13-bit all-digital I/Q RF-DAC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 732–752, Apr. 2014.
- [14] W. M. Gaber, P. Wambacq, J. Craninckx, and M. Ingels, "A 21-dBm *I/Q* digital transmitter using stacked output stage in 28-nm bulk CMOS technology," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 11, pp. 4744–4757, Nov. 2017.
- [15] F. Roger, "A 200 mW 100 MHz-to-4 GHz 11<sup>th</sup>-order complex analog memory polynomial predistorter for wireless infrastructure RF amplifiers," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 94–95.
- [16] S.-M. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, "A switched-capacitor RF power amplifier," *IEEE J. Solid State Circuits*, vol. 46, no. 12, pp. 2977–2987, Dec. 2011.
- [17] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, "Design considerations for a direct digitally modulated WLAN transmitter with integrated phase path and dynamic impedance modulation," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
- [18] H. Jin et al., "Efficient digital quadrature transmitter based on IQ cell sharing," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.* Papers, Feb. 2015, pp. 168–169.
- [19] H. Jin, D. Kim, and B. Kim, "Efficient digital quadrature transmitter based on IQ cell sharing," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1345–1357, May 2017.
- [20] S. W. Yoo, S. C. Hung, and S. M. Yoo, "A 1W quadrature class-G switched-capacitor power amplifier with merged cell switching and linearization techniques," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, Jun. 2018, pp. 124–127.
- [21] V. Vorapipat, C. S. Levy, and P. M. Asbeck, "A class-G voltage-mode Doherty power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3348–3360, Dec. 2017.
- [22] Z. Bai, E. Yuan, A. Azam, and J. S. Walling, "A multiphase interpolating digital power amplifier for TX Beamforming in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 78–79.

- [23] R. Bhat, J. Zhou, and H. Krishnaswamy, "Wideband mixed-domain multi-tap finite-impulse response filtering of out-of-band noise floor in watt-class digital transmitters," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3405–3420, Dec. 2017.
- [24] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. De Vreede, "An intrinsically linear wideband polar digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3312–3328, Dec. 2017.
- [25] K. R. Boyle, Y. Yuan, and L. P. Ligthart, "Analysis of mobile phone antenna impedance variations with user proximity," *IEEE Trans. Antennas Propag.*, vol. 55, no. 2, pp. 364–372, Feb. 2007.
- [26] Y. Guo, C. Yu, and A. Zhu, "Power adaptive digital predistortion for wideband RF power amplifiers with dynamic power transmission," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 11, pp. 3595–3607, Nov. 2015.
- [27] S. Chung and J. L. Dawson, "Digital predistortion using quadrature  $\Delta\Sigma$  modulation with fast adaptation for WLAN power amplifiers," in *IEEE MTT-S Int. Microw. Symp. Dig.*, pp. 1–4, Jun. 2011.
- [28] User Equipment (UE) Radio Transmission and Reception (FDD) (Release 16), document 3GPP TS 25.101 V16.0.0, 3rd Generation Partnership Project; Technical Specification Group Radio Access Network, 1999.
- [29] S. M. Babamir and B. Razavi, "Relation between ACPR and INL in digital RF transmitters," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, to be published.
- [30] C.-H. Lin and K. Bult, "A 10-b, 500-MSample/s CMOS DAC in 0.6 mm<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 1948–1958, Dec. 1998.
- [31] L. Duncan *et al.*, "A 10-bit DC-20-GHz multiple-return-to-zero DAC with >48-dB SFDR," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3262–3275, Dec. 2017.
- [32] B. Razavi, Principles of Data Conversion System Design. Piscataway, NJ, USA: IEEE Press, 1995.
- [33] J. Dawson and T. Lee, "Automatic phase alignment for a fully integrated Cartesian feedback power amplifier system," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2269–2279, Dec. 2003.
- [34] T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, and O. Watanabe, "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," in *IEEE VLSI Circuits Symp. Dig. Tech. Papers*, Jun. 1992, pp. 28–29.
- [35] R. M. Gray, "Quantization noise spectra," *IEEE Trans. Inf. Theory*, vol. 36, no. 6, pp. 1220–1244, Nov. 1990.
- [36] Z. Deng et al., "A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan./Feb. 2016, pp. 172–173.
- [37] M. Mehrpoo, M. Hashemi, Y. Shen, R. van Leuken, M. S. Alavi, and L. C. N. de Vreede, "A wideband linear direct digital RF modulator using harmonic rejection and I/Q-interleaving RF DACs," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, Jun. 2017, pp. 188–191.
- [38] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, "A wideband linear I/Q-interleaving DDRM," *IEEE J. Solid- State Circuits*, vol. 53, no. 5, pp. 1361–1373, May 2018.
- [39] M. Mehrpoo, L. C. N. de Vreede, and M. S. Alavi, "Digitally-intensive transmitter having wideband, linear, direct-digital RF modulator," European Patent 2018 990, Dec. 7, 2018.
- [40] D. M. Pozar, Microwave Engineering. Hoboken, NJ, USA: Wiley, 2012.
- [41] K. Oishi *et al.*, "A 1.95 GHz fully integrated envelope elimination and restoration CMOS power amplifier using timing alignment technique for WCDMA and LTE," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 2915–2924, Dec. 2014.
- [42] K. Yamamoto *et al.*, "A WCDMA multiband power amplifier module with Si-CMOS/GaAs-HBT hybrid power-stage configuration," *IEEE Trans. Microw. Theory Techn.*, vol. 64, no. 3, pp. 810–825, Mar. 2016.
- [43] J. Ko et al., "A high-efficiency multiband class-F power amplifier in 0.153μm bulk CMOS for WCDMA/LTE applications," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 40–41.
- [44] Y. Yin, L. Xiong, Y. Zhu, B. Chen, H. Min, and H. Xu, "A compact dual-band digital Doherty power amplifier using parallel-combining transformer for cellular NB-IoT applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 408–410.
- [45] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, "A quadrature switched capacitor power amplifier in 65 nm CMOS," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, May 2015, pp. 135–138.

- [46] R. Bhat, J. Zhou, and H. Krishnaswamy, "A >1W 2.2GHz switchedcapacitor digital power amplifier with wideband mixed-domain multi-tap FIR filtering of OOB noise floor," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, Feb. 2017, pp. 234–235.
- [47] J. S. Park, S. Hu, Y. Wang, and H. Wang, "A highly linear dualband mixed-mode polar power amplifier in CMOS with an ultracompact output network," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1756–1770, Aug. 2016.
- [48] S. H. Ardalan and J. J. Paulos, "An analysis of nonlinear behavior in delta-sigma modulators," *IEEE Trans. Circuits Syst.*, vol. CAS-34, no. 6, pp. 593–603, Jun. 1987.
- [49] R. Schreier and G. C. Temes, Understanding Delta-Sigma Data Converters. Piscataway, NJ, USA: IEEE Press, 2005.



**Behzad Razavi** (Fellow, IEEE) received the B.S.E.E. degree from the Sharif University of Technology, Tehran, Iran, in 1985, and the M.S.E.E. and Ph.D.E.E. degrees from Stanford University, Stanford, CA, USA, in 1988 and 1992, respectively.

He was with AT&T Bell Laboratories, Middletown, NJ, USA, and Hewlett-Packard Laboratories, Palo Alto, CA, USA, until 1996. Since 1996, he has been an Associate Professor and subsequently Professor of electrical engineering with the University of California at Los Angeles, Los Angeles, CA,

USA. He was an Adjunct Professor with Princeton University, Princeton, NJ, USA, from 1992 to 1994, and at Stanford University in 1995. He is the author of *Principles of Data Conversion System Design* (IEEE Press, 1995), *RF Microelectronics* (Prentice Hall, 1998, 2012) (translated to Chinese, Japanese, and Korean), *Design of Analog CMOS Integrated Circuits* (McGraw-Hill, 2001, 2016) (translated to Chinese, Japanese, and Korean), *Design of CMOS Phase-Locked Loops* (Cambridge University Press, 2020), and *Fundamentals of Microelectronics* (Wiley, 2006) (translated to Korean, Portuguese, and Turkish), and the Editor of *Monolithic Phase-Locking in High-Performance Systems* (IEEE Press, 2003). His current research includes wireless and wireline transceivers and data converters.

Dr. Razavi is a member of the U.S. National Academy of Engineering. He received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998, and the McGraw-Hill First Edition of the Year Award in 2001. He also received the Lockheed Martin Excellence in Teaching Award in 2006, the UCLA Faculty Senate Teaching Award in 2007, the CICC Best Invited Paper Award in 2009 and in 2012, and the 2012 Donald Pederson Award in Solid-State Circuits. He was also a recipient of the American Society for Engineering Education PSW Teaching Award in 2014. He received the 2017 IEEE CAS John Choma Education Award. He was a co-recipient of both the Jack Kilby Outstanding Student Paper Award and the Beatrice Winner Award for Editorial Excellence at the 2001 ISSCC. He was also a co-recipient of the 2012 and the 2015 VLSI Circuits Symposium Best Student Paper Awards and the 2013 CICC Best Paper Award. He was also recognized as one of the top ten authors in the 50-year history of ISSCC. He served on the Technical Program Committees of the International Solid-State Circuits Conference (ISSCC) from 1993 to 2002 and VLSI Circuits Symposium from 1998 to 2002. He has also served as a Guest Editor and an Associate Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and International Journal of High Speed Electronics. He presently serves as the Editor-in-Chief of the IEEE SOLID-STATE CIRCUITS LETTERS. He has served as an IEEE Distinguished Lecturer.



Seyed-Mehrdad Babamir (Student Member, IEEE) received the B.Sc. and M.Sc. degrees from the Sharif University of Technology, Tehran, Iran, in 2013 and 2015, respectively, and the Ph.D. degree from the University of California at Los Angeles (UCLA), Los Angeles, CA, USA, in 2019, all in electrical engineering.

He held a postdoctoral scholar position with the Communication Circuits Laboratory, UCLA. He has been with Broadcom Inc., San Diego, CA, since 2019. His research interest includes analog, RF,

and millimeter-wave integrated circuit design for wireless transceivers and frequency synthesizers.

Dr. Babamir was a recipient of the Broadcom Foundation Fellowship from 2017 to 2018.