# A 0.009 mm<sup>2</sup> Wide-Tuning Range Automatically Placed-and-Routed ADPLL in 14-nm FinFET CMOS David M. Moore<sup>®</sup>, *Student Member, IEEE*, Thucydides Xanthopoulos, *Member, IEEE*, Scott Meninger, and David D. Wentzloff, *Senior Member, IEEE* Abstract—An automatically placed-and-routed all-digital PLL for high-performance clock generation and distribution in many-core processors is presented. The proposed design leverages a phase domain architecture, and features an embedded TDC constructed from standard cells. TDC resolution is enhanced through the use of a phase interpolator. The standard cell library is extended with custom oscillator cells are designed to achieve optimized power, performance, and area. Timing and handling of potentially metastable paths is additionally handled by the digital design flow. The clock generator has been fabricated in a 14-nm FinFET process, and occupies an area of 0.009 mm². The design features a tuning range from 1.0 to 5.5 GHz (80%). Output period jitter of 1.29 ps is achieved with a power consumption of 9.7 mW. Index Terms—All-digital phase-locked loop (ADPLL), place-and-route, resolution-enhancement, small area, synthesizeable, wide tuning range. #### I. INTRODUCTION All-digital phase-locked loops (ADPLLs) have gained widespread usage as clock generators in modern digital systems, due to the many advantages they offer over their analog counterparts when implemented in advanced process nodes. Recently, design approaches using fully synthesized ADPLLs have been demonstrated, allowing for faster implementation and integration by using automated digital place-and-route design flows [1]-[5]. These designs leverage cells designed to fit into the standard cell grid to construct a digitally controlled oscillator and time-to-digital converter (TDC). These cells may come from a standard cell library or be custom designed cells, and are assembled using the digital automatic place and route (APR) flow. However, many of the designs presented to date have functioned over limited tuning ranges, which do not extend over the frequency ranges commonly employed in modern processor design spaces. Specifically, multiple designs using an injection locking architecture have been demonstrated, in order to benefit from area and noise reduction, at the expense of frequency range [2]-[4]. Additionally, most have been designed in mature processes, and have not had to cope with the increased routing concerns introduced by FinFET processes, which feature very high routing parasitic resistance [1]-[5]. In this letter, we present an automatically placed-and-routed ADPLL in a 14-nm FinFET process, in which all steps of the physical design are scripted, and therefore fully automated and portable to other processes. This ADPLL also features a phase-interpolated embedded TDC to improve resolution. The ADPLL is designed to meet the challenges of modern many-core processor clocking, including multi-GHz output frequency, balanced duty cycle, low period Manuscript received January 19, 2018; revised March 12, 2018; accepted March 21, 2018. Date of publication April 17, 2018; date of current version May 4, 2018. This paper was approved by Associate Editor Samuel Palermo. (Corresponding author: David M. Moore.) - D. M. Moore and D. D. Wentzloff are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: mooreday@umich.edu). - T. Xanthopoulos and S. Meninger are with Cavium Inc., Marlborough, MA 01752 USA. Digital Object Identifier 10.1109/LSSC.2018.2827880 Fig. 1. Conventional phase domain architecture. jitter, and a wide tuning range for dynamic frequency scaling. It is also designed around FinFET DRC requirements, such as quantized device widths and a fixed device length. Rather than injection locking, the design is based around a phase domain architecture [6]. By using optimized driver and capacitor cells to construct the DCO, and synthesized resolution enhancing phase interpolator, the proposed ADPLL achieves the highest output frequency and widest tuning range of any synthesized PLL to date, and the smallest area for a > 1 GHz clock generator. # II. SYNTHESIZABLE ADPLL DESIGN A conventional phase-domain ADPLL is shown in Fig. 1. Phase information is measured using the combination of a counter and a TDC. For each reference cycle, the counter provides the integer number of DCO cycles which have occurred, while the TDC provides the fractional number of DCO cycles. The DCO phase measurement is then used in the feedback loop to drive the oscillator phase to its desired value. However, in this configuration, in-band jitter performance is limited by TDC resolution and linearity, making it difficult to implement using APR. The proposed PLL block diagram is shown in Fig. 2. All of the design is implanted using placed-and-routed standard cells, with the exception of the DCO which is implemented using placed-and-routed custom DCO cells. Phase detection is achieved using an embedded TDC [7]. In an embedded TDC, flip flops sample the internal phase signals of a ring oscillator, and the resulting digital value is decoded to produce a number representing the location of the propagating edge inside the oscillator. This in turn directly corresponds to the phase of the oscillator. Rather than having a specific time resolution, the TDC resolution is tied to the oscillator frequency. Because the embedded TDC obtains phase information from the oscillator stages, no gain calibration is required, and PVT variations are also inherently tracked. This helps limit INL in the TDC. To enhance TDC resolution, the existing oscillator phases are interpolated as shown in Fig. 3. Inverters from each stage have outputs shorted to produce an edge which occurs in between both input edges. Buffers for noninterpolated edges use the same gates as interpolated edges in Fig. 2. Block diagram of the proposed PLL. Clock domains are highlighted as specified during synthesis and timing. Fig. 3. Segment of phase interpolated embedded TDC. Placement group constraint for driving automated placement is highlighted. order to ensure matched drive strength, improving edge alignment. Both the TDC samplers and the interpolating inverters were implemented using standard cells in the normal digital flow. The individual gates were placed inside or between the oscillator stages that they were sampling, making the wiring delay to each sampler negligible. Clock routing to the sampler flip flops was performed using the digital clock tree synthesis tool with additional skew constraints, allowing for simultaneous skew optimization across corners. All of these customizations were performed using capabilities built into the digital backend tools, and are therefore directly portable to other designs. The PLL is designed to operate up to 5 GHz, which is twice the desired frequency of 2.5 GHz, so that a divider can be used to provide a clean 50% duty cycle output, required for many memory circuits. Additionally, the PLL is tunable down to 1 GHz and features a wide bandwidth to enable processor DVFS. Operation at 5 GHz places stringent timing requirements on the feedback counter, so the signal is predivided by 4 before being fed into the counter. The divide by 2 and divide by 4 signals are then concatenated with the counter output, recovering the lost edge information, as shown in Fig. 4. The associated timing issues arising from having a digital signal with three separate driving clock sources are identified and handed by the place and route tool. This scheme enables the digital tool to perform the counter clock routing with relaxed constraints, but still maintain the full counter resolution. The digital tools account for the skew between original and generated clocks. The original full speed clock is used for reference retiming, but is made available at the output to extend the frequency range if the duty cycle requirements of the PLL are not of primary concern. The retiming circuit is a TDC-based path-selection type, implemented in RTL. Fig. 4. DCO counter predivider detail. The digital tool inserts hold fixing to handle clock alignment. Fig. 5. Architecture of the synthesized DCO. The DCO architecture is shown in Fig. 5. The oscillator is based on a switchable pseudo-differential cell optimized for the FinFET process. This cell provides coarse tuning of the oscillator over the 1–5 GHz range. Additional capacitor cells provide fine tuning of the oscillator frequency. These cells are illustrated in Fig. 6. Two capacitors were placed in each cell in order to improve matching in the differential nodes. In order to achieve the desired frequency resolution at very high frequencies, the voltage to the source and drain terminals is switched between VDD and ground, rather than disconnecting the capacitors from the circuit entirely. Using this method, a delay resolution of 300 fs was achieved. All cells were designed Fig. 6. Custom oscillator cells. Coarse driver cell and fine capacitor cell. Fig. 7. Block diagram indicating the functionality of the on-chip data capture system. with SLVT devices, which feature lower variation in this process. Symmetrical cell layout is used to minimize mismatch due to double patterning. Layout dependent effects have a strong impact on device strength in this process, so the diffusion shape inside the cells was optimized to maintain a constant width across multiple neighboring transistors in order to reduce mismatch after physical implementation. Additional dummy fill was included on both sides of the cell to avoid influences from adjacent logic cells. Coarse and fine frequency steps were optimized to meet range and resolution requirements with the minimum total cell area. As a result, the full tuning range is covered by 128 coarse and 128 fine cells. In order to achieve the desired oscillator frequency while taking advantage of the digital place-and-route flow, additional layout constraints were leveraged in the digital flow. First, an automated method was used to apply placement guides to each stage of the oscillator, ensuring that cells from adjacent stages would be placed near each other. Additionally, the digital router was configured to use wider routing and include additional spacing between phase signals. These two constraints were fully scripted and automated, using capabilities built into the digital backend tools. This substantially reduced parasitic resistance and capacitance in the oscillator phase routing, enabling the oscillator to function at the frequencies of interest. Using an all-digital architecture and leveraging the digital flow also greatly eases implementation of design-for-test features. To leverage this facet of the PLL architecture, an on-chip test-measurement system was implemented. This fully-synthesized system functions Fig. 8. Measured TDC nonlinearity with 5 GHz oscillator frequency, divided to 2.5 GHz, measured using the embedded oscilloscope. like a multichannel digital oscilloscope embedded in the controller. It includes a memory, and is able to select and capture multiplexed digital signals based off of a selectable trigger signal with a programmable delay. After the arrival of the trigger signal, which may be the chip reset, a delay counter counts a specified number of reference cycles. After this delay, the signal(s) selected for measurement are recorded into the memory on the rising edge of the reference clock each cycle for a fixed number of cycles. This oscilloscope enables measurement of TDC linearity, lock time, and in-band phase noise (as measured by the error signal), without external test equipment. This functionality is highlighted in Fig. 7. This system allows verification of all in-band behavior of the PLL without the use of an analog tester, in addition to its use during debug. ### III. MEASUREMENT RESULTS The ADPLL was constructed in a 14-nm FinFET process. The block occupies 0.0195 mm², of which more than half is used by the integrated oscilloscope. The area of the PLL itself is 0.009 mm². The free-running DCO consumes 7.6 mW from an 0.95-V supply, while the controller consumes 2.1 mW. The ADPLL output frequency tunes from 1.0 GHz to 5.5 GHz across process corners. TDC nonlinearity was measured using the on-chip measurement system. Measurements of TDC INL are shown in Fig. 8. Due to the inherent relationship between the DCO phases and the TDC value in Fig. 9. Measured phase noise and output spectrum with 5 GHz oscillator frequency, divided to 2.5 GHz. | TABLE I | |------------------------| | PERFORMANCE COMPARISON | | | This<br>Work | Deng,<br>ISSCC<br>2015 | Kong<br>VLSI<br>2016 | Jang<br>ISSCC<br>2017 | Cho<br>ISSCC<br>2017 | |-----------------------------|------------------|------------------------|----------------------|-----------------------|----------------------| | Process | 14nm<br>FinFET | 65nm | 45nm | 28nm<br>SOI | 28nm | | Freq. [GHz]<br>(Freq. Span) | 1.0-5.5<br>(4.5) | 0.8-1.7<br>(0.9) | 2.3-2.6<br>(0.3) | 0.8-3.2<br>(2.4) | 0.25-1.0<br>(0.75) | | Ref. [MHz] | 50 | 50-400 | 22.6 | 50 | 250 | | Power [mW] | 9.7 | 3.0 | 6.4 | 5.0 | 15.2 | | Area [mm <sup>2</sup> ] | 0.009 | 0.048 | 0.03 | 0.049 | 0.0047 | | Period Jitter<br>[ps] | 1.29 | N/A | N/A | N/A | N/A | | Integ. Jitter<br>[ps] | 4.71 | 3.6 | 1.68 | 2.52 | 3.3 | | FoM [dB] | -216.8 | -224.2 | -227.4 | -226.5 | -218 | | Synthesized? | Yes | Yes | No | No | Yes | | Type | Int-N | Frac-N | Frac-N | Int-N | Frac-N | an embedded TDC, the INL values remain small despite the usage of APR for physical implementation. The maximum DNL value is 0.84 LSBs. The phase noise spectrum is shown in Fig. 9. An intermittent logic error degraded phase noise at low frequencies. Nevertheless, integrated jitter at the maximum frequency was measured to be 4.71 ps, while period jitter was measured at 1.29 ps. The phase noise is $-101~\mathrm{dBc/Hz@10MHz}$ offset, which agrees with simulated values for in-band noise in the absence of the logic error. Measurements were taken with a 50 MHz reference. The PLL achieves an FoM of -216.84 at 2.5 GHz (divided) output frequency, where the FoM is defined as $10~\mathrm{log}\,[(\sigma_t/1s)^2~(P_{DC}/1~\mathrm{mW})]$ . Table I summarizes the ADPLL performance and compares it against similar works. The tuning range and max frequency of the proposed design exceed the specifications of many published ring oscillators, while maintaining an extremely small area, and specifications suitable for processor clocking. Fig. 10. Partial die photograph showing PLL dimensions. ## IV. CONCLUSION In this letter, we have demonstrated an area-optimized fully-synthesized ADPLL targeting multi-GHz clocking applications. A prototype featuring an integrated measurement system was fabricated in 14-nm FinFET CMOS. The test chip achieved 1.29 ps of period jitter with an FoM of -216.8 dB. The ADPLL achieves the smallest area among PLLs with > 1 GHz output. A die photograph is shown in Fig. 10. This design represents a step forward for synthesized PLLs, demonstrating that they can be implemented at practical frequencies in the advanced processes where design time savings will be the most significant. #### REFERENCES - [1] Y. Park and D. D. Wentzloff, "An all-digital PLL synthesized from a digital standard cell library in 65nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, USA, 2011, pp. 1–4. - [2] W. Deng et al., "A 0.048mm<sup>2</sup> 3mW synthesizable fractional-N PLL with a soft injection-locking technique," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2015, pp. 252–253. - [3] W. Deng et al., "A 0.0066mm<sup>2</sup> 780μW fully synthesizable PLL with a current-output DAC and an interpolative phase-coupled oscillator using edge-injection technique," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 266–267. - [4] W. Deng et al., "A 0.022mm<sup>2</sup> 970μW dual-loop injection-locked PLL with -243dB FOM using synthesizable all-digital PVT calibration circuits," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2013, pp. 248-249. - [5] S. Kim et al., "A 2 GHz synthesized fractional-N ADPLL with dual-referenced interpolating TDC," IEEE J. Solid-State Circuits, vol. 51, no. 2, pp. 391–400, Feb. 2016. - [6] R. B. Staszewski and P. T. Balsara, "Phase-domain all-digital phase-locked loop," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 52, no. 3, pp. 159–163, Mar. 2005. - [7] M. S.-W. Chen, D. Su, and S. Mehta, "A calibration-free 800 MHz fractional-N digital PLL with embedded TDC," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2819–2827, Dec. 2010. - [8] T. Jang et al., "A 2.5ps 0.8-to-3.2GHz bang-bang phase-and frequency-detector-based all-digital PLL with noise self-adjustment," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2017, pp. 148–150. - [9] H. Cho et al., "A 0.0047mm<sup>2</sup> highly synthesizable TDC-and DCO-less fractional-N PLL with a seamless lock range of f<sub>REF</sub> to 1GHz," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2017, pp. 154–155. - [10] L. Kong and B. Razavi, "A 2.4-GHz 6.4-mW fractional-N inductorless RF synthesizer," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Honolulu, HI, USA, 2016, pp. 9–10.