Non Uniform On Chip Power Delivery Network Synthesis Methodology

Similar documents
A Machine Learning Framework for Register Placement Optimization in Digital Circuit Design

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Low-Power Technology for Image-Processing LSIs

Department of Electrical and Computer Engineering, University of Rochester, Computer Studies Building,

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

Adaptive Power Blurring Techniques to Calculate IC Temperature Profile under Large Temperature Variations

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Fast, Accurate A Priori Routing Delay Estimation

Cluster-based approach eases clock tree synthesis

Receiver Modeling for Static Functional Crosstalk Analysis

Multi-Voltage Domain Clock Mesh Design

SYNTHESIS FOR ADVANCED NODES

CHAPTER 1 INTRODUCTION

Multilayer Routing on Multichip Modules

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Crosstalk Aware Static Timing Analysis Environment

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

Physical Design Closure

A Transistor-level Symmetrical Layout Generation for Analog Device

FPGA Power Management and Modeling Techniques

EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

Comprehensive Place-and-Route Platform Olympus-SoC

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Regularity for Reduced Variability

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

CAD Algorithms. Placement and Floorplanning

Planning for Local Net Congestion in Global Routing

Experience in Critical Path Selection For Deep Sub-Micron Delay Test and Timing Validation

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

CMOS Logic Gate Performance Variability Related to Transistor Network Arrangements

Advanced Modeling and Simulation Strategies for Power Integrity in High-Speed Designs

Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools

Exploring Logic Block Granularity for Regular Fabrics

Full-chip Multilevel Routing for Power and Signal Integrity

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Routability-Driven Bump Assignment for Chip-Package Co-Design

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

On Using Machine Learning for Logic BIST

Congestion-Aware Power Grid. and CMOS Decoupling Capacitors. Pingqiang Zhou Karthikk Sridharan Sachin S. Sapatnekar

On GPU Bus Power Reduction with 3D IC Technologies

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

940 IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 62, NO. 3, MARCH 2015

Effects of FPGA Architecture on FPGA Routing

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

ASIC design flow considering lithography-induced effects

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Hardware-Software Codesign. 1. Introduction

EE 434 ASIC & Digital Systems

EE582 Physical Design Automation of VLSI Circuits and Systems

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

Placement Algorithm for FPGA Circuits

Minimum Implant Area-Aware Placement and Threshold Voltage Refinement

Embedded SRAM Technology for High-End Processors

Tree Structure and Algorithms for Physical Design

Power Optimization in FPGA Designs

The Design and Implementation of a Low-Latency On-Chip Network

Diagonal Routing in High Performance Microprocessor Design

A Framework for Systematic Evaluation and Exploration of Design Rules

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

Post-Route Gate Sizing for Crosstalk Noise Reduction

Rimon IKENO, Takashi MARUYAMA, Tetsuya IIZUKA, Satoshi KOMATSU, Makoto IKEDA, and Kunihiro ASADA

By Charvi Dhoot*, Vincent J. Mooney &,

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison

On Increasing Signal Integrity with Minimal Decap Insertion in Area-Array SoC Floorplan Design

RTL2GDS Low Power Convergence for Chip-Package-System Designs. Aveek Sarkar VP, Technology Evangelism, ANSYS Inc.

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

Hierarchical Simulation and Optimization of 3D Power Delivery Network

10. Interconnects in CMOS Technology

HS3-DPG: Hierarchical Simulation for 3-D P/G Network

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

How Much Logic Should Go in an FPGA Logic Block?

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ

VERY large scale integration (VLSI) design for power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

Basic Idea. The routing problem is typically solved using a twostep

Interaction and Balance of Mask Write Time and Design RET Strategies

Fast Computation of Temperature Profiles of VLSI ICs with High Spatial Resolution

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Constructive floorplanning with a yield objective

Full Chip False Timing Path Identification: Applications to the PowerPC TM Microprocessors

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

An Interconnect-Centric Design Flow for Nanometer Technologies

Metal-Density Driven Placement for CMP Variation and Routability

Buffer Design and Assignment for Structured ASIC *

A General Sign Bit Error Correction Scheme for Approximate Adders

Transcription:

Non Uniform On Chip Power Delivery Network Synthesis Methodology Patrick Benediktsson Institution of Telecomunications, University of Lisbon, Portugal Jon A. Flandrin Institution of Telecomunications, University of Lisbon, Portugal Chen Zheng Intel Corp. USA Abstract In this paper, we proposed a non-uniform power delivery network (PDN) synthesis methodology. It first constructs initial PDN using uniform approach. Then preliminary power integrity analysis is performed to derive IR-safe candidate window. Congestion map is obtained based global route congestion estimation. A self-adaptive non-uniform PDN synthesis is then performed to globally and locally optimize PDN over selected regions. The PDN synthesis is congestion-driven and IRguarded. Experimental results show significant timing important in trade-off small PDN length reduction with no EM/IR impact. We further explored potential power savings using our non-uniform PDN synthesis methodology. Keywords power delivery network, nonuniform, power integrity, congestion node. Failure to satisfy power integrity will lead to reliability issues or chip malfunction [2]. Meanwhile, wire RC delay has become increasingly significant and dominants the propagation delay of a large portion of critical paths [3]; therefore, either more and more routing layers are added to increase routing resource or PDN resource needs to be carefully assigned to save space for signal routing. Thus, efficient PDN design and optimization is the key to deliver successful higher performance chip in current and future technology nodes. In this paper we propose a non-uniform PDN design methodology that optimizes PDN both globally and locally that occupies less routing resource compared to traditional PDN design. The saved routing space helps to improve signal routing in congested area and achieve better timing results. Further exploration shows that potential power saving can be obtained by leakage reduction through cell threshold voltage optimization. 1. Introduction Morden semiconductor integrated circuit design has been scaling drastically into nanometer feature size range. To achieve better performance, higher power density and more routing resource is required. Both post challenges to on-chip power delivery network (PDN) design, which needs to deliver power supply to every transistor on die. The PDN needs to meet electromigration (EM) and IR limit to guarantee power integrity of a chip [1]. EM limit requires current density through each PDN wire segment and via to be lower than a threshold value, while IR limit requires the voltage drop when reaching transistor source 2. Motivation In traditional PDN design, designers first estimate the chip power density, current density and IR drop based on technology information, such as cell power, sheet resistance, etc. Then a uniform PDN is designed to meet the power integrity requirement based on estimation. A few iterations are performed to modify the initial version to eventually close EM/IR limit [4]. PDN design and optimization technique has been widely investigated by researchers. In [5], S. Kose proposed a power grid design based on effective resistance. Z. S. Zheng provided a tradeoff optimization with voltage regulation in [6]. While

in [7], T. Hayashi investigated power grid optimization algorithm based on manufacturing cost restriction. However, there are a few limitations to previous conventional approach: (1) since cell density is not uniformly distributed, the power density estimation is usually over pessimistic to cover possible local power hotspots; (2) since PDN is designed first before place and route happens, PDN designers have no knowledge how the actual power distribution look like, thus it is impossible to apply non-uniform style PDN to optimize for different design blocks. Figure 1 shows a typical IR drop distribution and how PDN designed for high percentile power density could be over-designed for low percentile power density. Often, temperature and variation effects are included in the analysis to obtain more realistic results [9]. The two limitations implies inevitable over-design of traditional style PDN. To overcome this shortage, we propose an EM/IRaware and congestion-driven non-uniform PDN design methodology that globally meets power integrity requirement and locally optimizes signal routing resource. To our best knowledge, this is the first systematic work to address the nonuniform style PDN synthesis. low percentile over-designed Fig 1. Typical IR distribution high percentile The non-uniform PDN methodology starts with traditional uniform PDN construction. The uniform PDN is fed into regular place, clock synthesis and post-clock optimization. Then a preliminary EM/IR analysis is run on a pre-route database. Based on the analysis results, we identify design regions that are IR safe as potential non-uniform PDN modification candidate. A routing congestion map based on global route engine is used to drive the actual non-uniform PDN synthesis. In later sections, we discuss in detail the candidate selection, nonuniform PDN synthesis and timing/power benefits using our proposed methodology. 3. Candidate Selection In our approach, the entire design is divided into unit size windows (e.g. 20um x 20um). Preliminary EM/IR result and routing congestion map for each window is derived and used as basic metrics for candidate selection. The reasoning for running a preliminary EM/IR analysis on a pre-route database is that cell placement are largely determined by the time post-clock optimization is done, so the result correlates well with the final sign-off EM/IR analysis. It is critical to avoid reduce PDN over EM/IR critical regions as reliability is a key factor to guarantee quality design [8].Besides, although non-uniform PDN synthesis can be introduced after route, during engineering-change-order (ECO) stage, it will be best to interfere before route stage for router to best leverage its benefits. Later in this section, we will discuss how fast IR analysis and congestion estimation is done for each unit size window. 3.1 Fast IR analysis, guard-band window To determining if a window is IR-safe, we use single hotspot filtering and average margin threshold mechanism. Due to the continuity of voltage drop distribution [10], it is not sufficient to claim a window is IRsafe even if all instances voltage drop is below required limit. For example, if window A is IRsafe, while window B adjacent to window A is potentially IR-sensitive, then modifying PDN over window A may affect window B and cause IR failure in it. To address this concern, we introduce guard-band window, which is a larger window that extends beyond the original unit size window. In our experiment, we found a guardband length that is same as unit step serves the purpose of estimating IR-drop margin well. Figure 2 illustrate how guard-band window and unit size window work together to identify IRsafe candidate. 3.2 Congestion Estimation Since our target is to reduce PDN occupancy and give back resource to signal routing, it is

unit size window guard-band Fig 2. Guard-band window important to estimate the routing congestion within a given unit size window. Global router will start with a minimum spanning tree (MST) and converted it to a steiner tree [11]. Depending on the horizontal/vertical traffic and capacity across all layers, routing congestion is estimated. Figure 3 illustrates the global route congestion estimation model. High pin density usually causes routing congestion hotspot and result in overflow issue. Detailed router either needs to detour or to end up with high density routing that introduces more parasitic loads and causes cross-talk noise issue. Reducing PDN occupancy over those regions will help improve the routing results. 4. Non-Uniform PDN Synthesis The non-uniform PDN synthesis is a self-adaptive process. It works on the selected candidate windows, and calculates the ratio of PDN reduction that needs to be applied to the given candidate window. We define a target function F for the non-uniform PDN synthesis:! 1 F = α congestion + β (0 F 50%) (1) IR drop To avoid over optimistic PDN reduction, we set the maximum allowed reduction rate to be 50%. It is necessary to leave certain margin as variation effect can degrade EM/IR results significantly [12]. In equation (1), α! is the coefficient for congestion score and β! is the weight for IR-drop. Equation (1) implies the congestion-driven and IR-drop guarded characteristic of the proposed non-uniform PDN synthesis. Figure 4 illustrates how the non-uniform PDN synthesis works on a candidate window. Due to the redundancy property of PDN, we are able to maintain the power integrity with non-uniform PDN synthesis [13]. It should be noted that the PDN reduction should not impact regular signal wires as their EM-sensitivity is low [14]. In [15], a systematic solution to achieve EM reliable design is also provided. 33% PDN reduction on candidate Fig 3. Global route congestion In our experiments, we rely on the global router congestion map to select the candidate for nonuniform PDN synthesis for most benefits gain. Fig 4. Non-uniform PDN synthesis

5. Experimental Results We used two blocks from the open source design or1200_fpu_fcmp and or1200_genpc [16]. Physical design is implemented using Synopsis ICC2 [17]. We used RedHawk [18] for preliminary EM/IR analysis. True worst case current analysis [19] is also included to account for EM analysis. After non-uniform PDN synthesis, layout aware analysis [20] is performed to ensure power integrity is not affected. Figure 5 shows the IR distribution before and after the non-uniform PDN synthesis. It shows that IR impact from non-uniform PDN synthesis is minimal and the outlier part remains the same, thus no extra effort is introduced on designer s side to close IR. It also implies EM reliability is satisfied as voltage drop and current density is closely correlated [21]. Table 1 shows the PDN length reduction on some of the metal layers. The larger amount of PDN length reduction on vertical layers indicate that the design the vertically congested. Table 2 shows that timing improvement after route between baseline and the proposed non-uniform PDN synthesis. Notable improvements are marked with gray. It can be observed that with small portion of PDN reduction, timing analysis results can improve significantly. This is because the congestiondriven non-uniform PDN synthesis targets at the critical windows which give the most benefits gain. Besides timing benefits, the non-uniform PDN synthesis can also provide potentially power benefits. One such example is to remove the congestion-driven feature, but brute-force apply the non-uniform PDN synthesis. By doing this, some positive paths can have chance to swap certain cells to higher threshold voltage, thus saving leakage power. Table 3 shows cell count for different threshold voltages and overall power savings. 6. Conclusions In this paper, we discuss the limitations of traditional uniform PDN synthesis. The uniform PDN is usually pessimistic and over-designed. We propose a self-adjust non-uniform PDN synthesis methodology. It uses preliminary IR analysis and congestion map as candidate selection criteria. (a) or1200_fpu_fcmp baseline (b) or1200_fpu_fcmp non-uniform PDN (c) or1200_genpc (d) or1200_genpc non-uniform PDN Fig 5. Comparison of non-uniform PDN synthesis

Table 1. PDN length comparisons or1200_fpu_fcmp or1200_genpc PDN length base non-uniform delta base non-uniform delta M2 7357.763 7317.763 0.00% 649286.5775 597906.5775 7.91% M3 336815.783 315505.463 6.33% 264770.718 238230.718 10.02% M4 193545.941 190465.941 1.42% 159921.1195 157821.1195 1.31% M5 139799.8615 130459.8615 5.99% 114523.972 105023.972 8.30% M6 214127.1025 211027.1025 1.43% 160059.527 157959.527 1.31% M7 167574.516 158114.516 6.03% 117514.9265 107894.9265 8.19% Guard-band window is introduced to minimize IR impact. Target function is defined to selfadaptively reduce PDN over candidate windows. Results show promising timing improvement with zero IR impact. We further explored potential power saving with brute-force PDN reduction. Results show average of 2.95% total power saving. As power deliver network has become more and more challenge to design and close power integrity as well as save sufficient space for routing, we believe there are more optimization space for non-uniform PDN design that can be explored further. This technique can also combine with other existing power grid optimization Table 2. Comparison summary fpu base fpu exp delta pc base pc exp delta TNS 93.147 78.916 15.28% 423.535 384.095 9.31% TNS (REG) 37.675 26.202 30.45% 274.671 247.898 9.75% MAX (IR) 64.4 64.3 57.7 57.9 MEDIAN 24.1 24.8 22.3 22.5 PTILE90 35.0 35.5 31.2 31.4 Table 3. Power comparisons fpu base fpu exp pc base pc exp SVT 498872 598298 240943 282149 LVT 309843 323078 118274 135539 ULVT 573954 535897 259384 224948 ELVT 15493 13049 10394 6525 Power 1784.3 1712.9 432.7 408.9

techniques [22-25] to further improve PDN design. References [1] L. Yang, J. H. Kim, D. Oh, H. Lan and R. Schmitt, "Power integrity characterization and correlation of 3D package systems using on-chip measurements," 19th Topical Meeting on Electrical Performance of Electronic Packaging and Systems, Austin, TX, 2010, pp. 221-224. [2] D. Li, et al. Variation-aware electromigration analysis of power/ground networks. Proceedings of the International Conference on Computer-Aided Design, 2011. [3] R. Gupta, B. Tutuianu and L. T. Pileggi, "The Elmore delay as a bound for RC trees with generalized input signals," in IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, vol. 16, no. 1, pp. 95-104, Jan 1997. [4] R. Panda, D. Blaauw, R. Chaudhry, V. Zolotov, B. Young and R. Ramaraju, "Model and analysis for combined package and on-chip power grid simulation," Low Power Electronics and Design, 2000. ISLPED '00. Proceedings of the 2000 International Symposium on, Rapallo, Italy, 2000, pp. 179-184. [5] S. Köse and E. G. Friedman, "Fast algorithms for power grid analysis based on effective resistance," Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, 2010, pp. 3661-3664. [6] Z. Zeng, X. Ye, Z. Feng and P. Li, "Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation," Design Automation Conference, Anaheim, CA, 2010, pp. 831-836. [7] T. Hayashi, M. Fukui and S. Tsukiyama, "A new power grid optimization algorithm based on manufacturing cost restriction," 2009 European Conference on Circuit Theory and Design, Antalya, 2009, pp. 703-706. [8] R. Reis. Circuit Design for Reliability, Springer, New York, 2015. [9] D. A. Li, M. Marek-Sadowska and S. R. Nassif, "T- VEMA: A Temperature- and Variation-Aware Electromigration Power Grid Analysis Tool," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp. 2327-2331, Oct. 2015. [10] J. Wang, X. A. Xiong and X. B. Zheng, "Deterministic Random Walk A New Preconditioner for Power Grid Analysis," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 11, pp. 2606-2616, Nov. 2015. [11] T. Taghavi, et al. Tutorial on congestion prediction. Proceedings of the international workshop on System level interconnect prediction, 2007. [12] D. A. Li, M. Marek-Sadowska and S. R. Nassif, "A Method for Improving Power Grid Resilience to Electromigration-Caused via Failures," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 1, pp. 118-130, Jan. 2015. [13] D. Li, et al. Variation-aware electromigration analysis of power/ground networks. Proceedings of the International Conference on Computer-Aided Design, 2011. [14] D.Li, et al.on-chip em-sensitive interconnect structures. Proceedings of the international workshop on system level interconnect prediction, 2010. [15] D. Li. CAD Solutions for Preventing Electromigration on Power Grid Interconnects. https:// www.alexandria.ucsb.edu/lib/ark:/48907/f3z31wrh, 2013. [16] https://github.com/openrisc/or1200 [17] https://www.synopsys.com/implementation-andsignoff/physical-implementation/ic-compiler-ii.html [18] http://www.ansys.com/electronics/products/ semiconductors/ansys-redhawk [19] D. Li and M. Marek-Sadowska, "Estimating true worst currents for power grid electromigration analysis," Fifteenth International Symposium on Quality Electronic Design, Santa Clara, CA, 2014, pp. 708-714. [20] D. Li, et al. Layout Aware Electromigration Analysis of Power/Ground Networks. Circuit Design for Reliability, 2015. [21] F. Chen et al., "Investigation of emerging middleof-line poly gate-to-diffusion contact reliability issues," 2012 IEEE International Reliability Physics Symposium (IRPS), Anaheim, CA, 2012, pp. 6A. 4.1-6A.4.9. [22] J. N. Kozhaya, S. R. Nassif and F. N. Najm, "A multigrid-like technique for power grid analysis," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 10, pp. 1148-1160, Oct 2002. [23] C. Zhuo, J. Hu, M. Zhao and K. Chen, "Power Grid Analysis and Optimization Using Algebraic Multigrid," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 4, pp. 738-751, April 2008. [24] Z. Feng and P. Li, "Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms," 2008 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, 2008, pp. 647-654. [25] H. Qian and S. S. Sapatnekar, "Hierarchical random-walk algorithms for power grid analysis," ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No. 04EX753), 2004, pp. 499-504.