Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Similar documents
CSE241 VLSI Digital Circuits UC San Diego

ECE260B CSE241A Winter Logic Synthesis

L2: Design Representations

ECE260B CSE241A Winter Logic Synthesis

ECE 3060 VLSI and Advanced Digital Design

VLSI CAD Flow: Logic Synthesis, Placement and Routing Lecture 5. Guest Lecture by Srini Devadas

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS

1/28/2013. Synthesis. The Y-diagram Revisited. Structural Behavioral. More abstract designs Physical. CAD for VLSI 2

Don t expect to be able to write and debug your code during the lab session.

Delay Estimation for Technology Independent Synthesis

ECE 5745 Complex Digital ASIC Design Topic 12: Synthesis Algorithms

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

IMPLEMENTATION DESIGN FLOW

Lab 3 Verilog Simulation Mapping

COE 561 Digital System Design & Synthesis Introduction

Advances In Industrial Logic Synthesis

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT?

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Synthesis. Other key files. Standard cell (NAND, NOR, Flip-Flop, etc.) FPGA CLB

Unit 4: Formal Verification

Field Programmable Gate Arrays

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

What is Verilog HDL? Lecture 1: Verilog HDL Introduction. Basic Design Methodology. What is VHDL? Requirements

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programmable Logic Devices II

Digital Design with FPGAs. By Neeraj Kulkarni

ECE 4514 Digital Design II. Spring Lecture 20: Timing Analysis and Timed Simulation

ECE 565: VLSI CAD Flow + Brief Intro to HLS, Logic Optimization & Technology Mapping. Shantanu Dutt ECE Dept., UIC

Exploring Logic Block Granularity for Regular Fabrics

problem maximum score 1 10pts 2 8pts 3 10pts 4 12pts 5 7pts 6 7pts 7 7pts 8 17pts 9 22pts total 100pts

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

CS137: Electronic Design Automation

3. Implementing Logic in CMOS

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

BUILDING BLOCKS OF A BASIC MICROPROCESSOR. Part 1 PowerPoint Format of Lecture 3 of Book

ICS 252 Introduction to Computer Design

Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering India Institute of Technology, Kharagpur.

CAD for VLSI Design - I. Lecture 21 V. Kamakoti and Shankar Balachandran

CS211 Digital Systems/Lab. Introduction to VHDL. Hyotaek Shim, Computer Architecture Laboratory

Linking Layout to Logic Synthesis: A Unification-Based Approach

Review. EECS Components and Design Techniques for Digital Systems. Lec 05 Boolean Logic 9/4-04. Seq. Circuit Behavior. Outline.

More Course Information

HIGH-LEVEL SYNTHESIS

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Spiral 2-8. Cell Layout

EECS150 - Digital Design Lecture 10 Logic Synthesis

problem maximum score 1 8pts 2 6pts 3 10pts 4 15pts 5 12pts 6 10pts 7 24pts 8 16pts 9 19pts Total 120pts

Chapter 1 Overview of Digital Systems Design

TSEA44 - Design for FPGAs

Optimal Latch Mapping and Retiming within a Tree

PINE TRAINING ACADEMY

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Eliminating Routing Congestion Issues with Logic Synthesis

INTRODUCTION TO FPGA ARCHITECTURE

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis

101-1 Under-Graduate Project Digital IC Design Flow

N-input EX-NOR gate. N-output inverter. N-input NOR gate

Why Should I Learn This Language? VLSI HDL. Verilog-2

Speaker: Kayting Adviser: Prof. An-Yeu Wu Date: 2009/11/23

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Schematic design. Gate level design. 0 EDA (Electronic Design Assistance) 0 Classical design. 0 Computer based language

scription of the circuit using a high-level design language (HDL) such as VHDL or Verilog. The behavior of this description is checked using simulatio

Synthesis and Optimization of Digital Circuits

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Spring 2010 May 10, 2010

EECS150 - Digital Design Lecture 10 Logic Synthesis

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

EECS150 - Digital Design Lecture 8 - Hardware Description Languages

Overview of EECS 244:

ABC basics (compilation from different articles)

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis

Verilog Tutorial. Introduction. T. A.: Hsueh-Yi Lin. 2008/3/12 VLSI Digital Signal Processing 2

Gate-Level Minimization

Design Methodologies

CONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii)

Synthesis at different abstraction levels

CS429: Computer Organization and Architecture

Logic Synthesis. EECS150 - Digital Design Lecture 6 - Synthesis

Programmable Memory Blocks Supporting Content-Addressable Memory

Based on slides/material by. Topic Design Methodologies and Tools. Outline. Digital IC Implementation Approaches

Music. Numbers correspond to course weeks EULA ESE150 Spring click OK Based on slides DeHon 1. !

Question Total Possible Test Score Total 100

SP3Q.3. What makes it a good idea to put CRC computation and error-correcting code computation into custom hardware?

Injntu.com Injntu.com Injntu.com R16

VLSI Design 13. Introduction to Verilog

Verilog for High Performance

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

EE-382M VLSI II. Early Design Planning: Front End

Digital Design Methodology

Institute of Engineering & Management

Topics of this Slideset. CS429: Computer Organization and Architecture. Digital Signals. Truth Tables. Logic Design

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

6.004 Computation Structures Spring 2009

RTL Synthesis using Design Compiler. Dr Basel Halak

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

VLSI Test Technology and Reliability (ET4076)

ARM 64-bit Register File

Lecture 15: System Modeling and Verilog

Transcription:

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist logic optimization a b 0 1 s d clk q netlist a b 0 1 d q physical design layout s clk Kurt Keutzer 1

Logic Optimization netlist a b 0 1 s d clk q Perform a variety of transformations and optimizations Library logic optimization netlist a b 0 1 s d clk q Structural graph transformations Boolean transformations Mapping into a physical library smaller, faster less power Kurt Keutzer 3 Combinational Logic Optimization Input: Output: Initial Boolean network Timing characterization for the module - input arrival times and drive factors - output loading factors Optimization goals - output required times Target library description Minimum-area net-list of library gates which meets timing constraints A very difficult optimization problem! Kurt Keutzer 4

Modern Approach to Logic Optimization Divide logic optimization into two subproblems: Technology-independent optimization - determine overall logic structure - estimate costs (mostly) independent of technology - simplified cost modeling Technology-dependent optimization (technology mapping) - binding onto the gates in the library - detailed technology-specific cost model Orchestration of various optimization/transformation techniques for each subproblem Kurt Keutzer 5 Logic Optimization -level Logic opt netlist Library logic optimization tech independent tech dependent multilevel Logic opt netlist Library Timing Constraints Kurt Keutzer 6 3

Closed Book Technology Library A standard cell technology or library may contain many hundreds of cells Typical cells are NAND, NOR, NOT, AOI (ANDor-Invert), OAI (Or-And-Invert) etc. A B A A C A A C AB+C B Kurt Keutzer 7 Library Contains for each cell: Functional information: cell = a *b * c Timing information: function of input slew intrinsic delay output capacitance non-linear models used in tabular approach Physical footprint (area) Power characteristics Wire-load models - function of Library Block size Wiring Kurt Keutzer 8 4

Elements of a library - 1 Element/Area Cost INVERTER NAND 3 NAND3 4 NAND4 5 Kurt Keutzer 9 Elements of a library - AOI1 4 Element/Area Cost AOI 5 Kurt Keutzer 10 5

Reasonable Library Inverter, Buffer ND-ND4; NOR-NOR4; AND- AND4; AOI1 - AOI333; OAI1 - OAI333 XOR, XNOR MUX, Full Adder Neg-Edge Triggered D-Flip-Flop Pos-Edge Triggered D-FF J-K FF Above with various clears, enables Scan versions of each of the above Most of the above in 6 different power sizes: 1x, x, 4x, 6x, 8x, 16x Kurt Keutzer 11 Input Circuit Netlist ``subject DAG Kurt Keutzer 1 6

Problem statement Find an ``optimal (in area, delay, power) mapping of a circuit into the technology library (simple example below): Kurt Keutzer 13 Is there a problem? Trivial Covering #1 subject DAG 7 NAND (3) = 1 5 INV () = 10 Area cost 31 Kurt Keutzer 14 7

Covering # INV = 4 NAND = 6 1 NAND3 = 4 1 NAND4 = 5 Area cost 19 Kurt Keutzer 15 Covering #3 Costs: 31, 19, 17 Yes, there s a problem! 1 INV = 1 NAND = 3 NAND3 = 8 1 AOI1 = 4 Area Cost 17 Kurt Keutzer 16 8

History of the Problem - 1 Technology mapping in 1986 was a big problem Almost every design group (e.g. AT&T) had their own library ASIC 400 cells Microprocessor/DSP 00 base cells Government 00+ cells Every group had their own approach to mapping ``Do what you have to do! handcrafted mappers tied to particular libraries and optimization tools ``Rule-based systems e.g. GE Socrates very slow ``expert systems that made no guarantee on final quality of result Kurt Keutzer 17 History of the Problem - Yes, there are two problems: Technology mapping can significant affect the area, speed, and power dissipation of a circuit There are over 00 different semiconductors each with multiple internal libraries how to create a tool that can utilize a diverse set of libraries?? Kurt Keutzer 18 9

A similar problem code generation Example of code generation in compilers using tree-covering Handles complex instruction sets Handles complex libraries Easily portable to other instruction sets Easily portable to Kurt Keutzer 19 Problem Formulation: DAG Covering Represent input netlist in normal form subject DAG Represent each library gate with normal forms for the logic function primitive DAGs Each primitive DAG has a cost Goal: Find a minimum cost covering of the subject DAG by the primitive DAGs Normal form: -input NAND gates and inverters K. Keutzer, DAGON: Technology Binding and Local Optimization by DAG Matching, in Proceedings of the 4th Design Automation Conference, 1987 and 5 Years of Design Automation Kurt Keutzer 0 10

Step 1: Extract Combinational Logic B Flip-flops inputs Combinational Logic outputs Since FF s don t need to be optimized with surrounding combinational logic we can partition them out Kurt Keutzer 1 Step : Normalize Circuit Netlist ``subject DAG Reduce the netlist into ND gates Kurt Keutzer 11

Step 3a: Normalize library Element/Area Cost Tree Representation (normal form) INVERTER NAND 3 NAND3 4 NAND4 5 Kurt Keutzer 3 Step 3b: Normalize library Element/Area Cost Tree Representation (normal form) AOI1 4 AOI 5 Kurt Keutzer 4 1

Step 4: DAG Covering Sound Algorithmic approach NP-hard optimization problem K. Keutzer, D. Richards, Computation Complexity of Logic Synthesis and Optimization, in Proceedings of the International Workshop on Logic Synthesis, 1989 multiple fanout Tree covering heuristic: If subject and primitive DAGs are trees, efficient algorithm can find optimum cover dynamic programming formulation Kurt Keutzer 5 Solution formulation 1) Partition input netlist into forest of trees ) Solve each tree optimally using tree covering 3) Stitch trees back together Kurt Keutzer 6 13

Resulting Trees Break at multiple fanout points Kurt Keutzer 7 For each tree - Dynamic Programming Principle of optimality: Optimal cover for a tree consists of a best match at the root of the tree plus the optimal cover for the sub-trees starting at each input of the match K. Keutzer, DAGON: Technology Binding and Local Optimization by DAG Matching, in Proceedings of the 4th Design Automation Conference, 1987 y x Kurt Keutzer 8 p z Best cover for this match uses best covers for p, z Best cover for this match uses best covers for x, y, z Choose least cost tree-cover at root 14

Example of Optimal Tree Covering INV 11 + = 13 AOI1 4 + 3 = 7 NAND + 6 + 3 = 11 INV NAND 3 + 3 = 6 NAND 3 NAND 3 Kurt Keutzer 9 DAG covering in detail 1) partition DAG into a forest of trees ) normalize netlist 3) optimally cover each tree a) generate all candidate matches b) find the optimal match using dynamic programming Kurt Keutzer 30 15

Partition DAG into Forest of trees Each gate with fanout >1 becomes root of a new tree Kurt Keutzer 31 Normalize netlist Re-express netlist into -input Nand gates and Inverters Make each tree left-oriented Kurt Keutzer 3 16

Generate candidate matches - 1 subject tree At the end of this segment each gate in the subject tree is annotated with every possible library cell that could be rooted at that gate What are some ways we can generate matches? Kurt Keutzer 33 Generating candidate matches - Naïve approach - try to match each cell in the library with each node of the tree (libraries can be large! - beware of large constants!!) Better approach build tables such that only potential candidate matches are checked Best approach fancy string matching - pp. 86-869 Introduction to Algorithms, T. Cormen, C. Lesierson, R. Rivest, The MIT Press, Second Printing, 1996. - pp. 86-869 What s the complexity of each approach? Kurt Keutzer 34 17

Optimal tree covering - 1 3 3 ``subject tree Kurt Keutzer 35 Optimal tree covering - 3 8 3 5 ``subject tree Kurt Keutzer 36 18

Optimal tree covering - 3 3 8 13 3 5 ``subject tree Cover with ND or ND3? 1 NAND 3 + subtree 5 Area cost 8 1 NAND3 = 4 Kurt Keutzer 37 Optimal tree covering 3b 3 8 13 3 5 4 ``subject tree Label the root of the sub-tree with optimal match and cost Kurt Keutzer 38 19

Optimal tree covering 4a 3 Cover with INV or AO1? 8 13 5 4 ``subject tree 1 Inverter + subtree 13 Area cost 15 1 AO1 4 + subtree 1 3 + subtree Area cost 9 Kurt Keutzer 39 Optimal tree covering 4b 3 8 13 9 5 4 ``subject tree Label the root of the sub-tree with optimal match and cost Kurt Keutzer 40 0

Optimal tree covering - 5 9 Cover with ND or ND3? 8 4 ``subject tree NAND subtree 1 9 subtree 4 1 NAND 3 Area cost 16 subtree 1 8 subtree subtree 3 4 1 NAND3 4 Area cost 18 NAND3 Kurt Keutzer 41 Optimal tree covering 5b 9 8 16 4 ``subject tree Label the root of the sub-tree with optimal match and cost Kurt Keutzer 4 1

Optimal tree covering - 6 Cover with INV or AOI1? 13 16 5 ``subject tree INV subtree 1 16 1 INV AOI1 subtree 1 13 subtree 5 1 AOI1 4 Area cost 18 Area cost Kurt Keutzer 43 Optimal tree covering 6b 13 18 16 5 ``subject tree Label the root of the sub-tree with optimal match and cost Kurt Keutzer 44

Optimal tree covering - 7 Cover with ND or ND3 or ND4? ``subject tree Kurt Keutzer 45 Cover 1 - NAND Cover with ND? 9 18 16 4 ``subject tree subtree 1 18 subtree 0 1 NAND 3 Area cost 1 Kurt Keutzer 46 3

Cover - NAND3 Cover with ND3? 9 4 ``subject tree subtree 1 9 subtree 4 subtree 3 0 1 NAND3 4 Area cost 17 Kurt Keutzer 47 Cover - 3 Cover with ND4? 8 4 ``subject tree subtree 1 8 subtree subtree 3 4 subtree 4 0 1 NAND4 5 Area cost 19 Kurt Keutzer 48 4

Optimal Cover was Cover ND AOI1 Cover with ND3? ND3 INV ND3 ``subject tree Clear that greedy doesn t work well What s the complexity? INV ND 3 ND3 8 AOI1 4 Area cost 17 Kurt Keutzer 49 Computational Complexity To determine the optimal cover for a tree we only need to consider a best cost match at the root of the tree This is constant-time in the number of matched cells Plus the optimal cover for the sub-trees starting at each input of the match This is constant-time in the indegree/fan-in of each match What s the complexity? O(n) - amazing! y x Kurt Keutzer 50 p z Best cover for this match uses best covers for p, z Best cover for this match uses best covers for x, y, z Choose least cost tree-cover at root 5

Enhancements to DAG covering Many enhancements incorporated over the last decade Timing optimization incorporating load-dependent delays Rudell - UCB Optimization for low power Application to FPGAs J. Rose - Chortle J. Cong - Flowmap Optimal direct DAG covering without tree covering approximation (didn t net much) Kurt Keutzer 51 Summary of Technology Mapping DAG covering formulation Separated library issues from mapping algorithm Heuristics based on tree covering for area and delay surprisingly efficient final result - for technology/library dependent reasons Very efficient linear time Very flexible approach applicable to wide range of libraries (standard cell, gate array) and technologies (FPGAS) Best enhancement is integration of technology decomposition Also requires ``follow up rule based approaches for best final circuit efficiency Kurt Keutzer 5 6

Why does this approximation work well? Each gate with fanout >1 becomes root of a new tree Kurt Keutzer 53 Why does this approximation work well? Few non-tree cells XOR, MUX one-level deep Kurt Keutzer 54 7

Why does this approximation work well? Non-tree matching usually requires duplication rarely a benefit for area Kurt Keutzer 55 Kurt Keutzer 56 8

Retrospective DAG covering by tree-covering is effective for four reasons separates library definition and characterization from mapping algorithm Duplication of logic not a win in terms of area optimization. Advantage of duplication of logic for timing is very (physical) context dependent provided an efficient mapping in what appears to be a relatively flat solution space Very computationally efficient so suitable to VLSI scale (millions of gates) netlist Principal weaknesses Problems handling multiplexor-trees, full-adders, other DAG patterns Problems in performing performance optimization tricks in tight pipelined logic Kurt Keutzer 57 Extra Slides Kurt Keutzer 58 9

Typical library costs 3 4 3 3 7 Kurt Keutzer 59 But what if? 3 4 3 3 4 Kurt Keutzer 60 30

Strong (or Boolean) Division Given a function f to be strong divided by g Add an extra input to f corresponding to g, namely G and obtain function h as follows h DC = G g + G g h ON = f ON h DC h OFF = f ON + h DC Minimize h using two-level minimizer Kurt Keutzer 61 Typical library costs 3 4 3 3 7 Kurt Keutzer 6 31

But what if? 3 4 3 3 4 Kurt Keutzer 63 3