Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Similar documents
ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Basic Idea. The routing problem is typically solved using a twostep

6. Latches and Memories

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

3. Implementing Logic in CMOS

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Chapter 6. CMOS Functional Cells

Place and Route for FPGAs

VLSI Physical Design: From Graph Partitioning to Timing Closure

Introduction VLSI PHYSICAL DESIGN AUTOMATION

ECE 4514 Digital Design II. Spring Lecture 20: Timing Analysis and Timed Simulation

Cluster-based approach eases clock tree synthesis

Physical Implementation

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998

CMOS Process Flow. Layout CAD Tools

Congestion-Driven Clock Tree Routing with Via Minimization

Graph Models for Global Routing: Grid Graph

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

The Gold Standard for Parasitic Extraction and Signal Integrity Solutions

Computer-Based Project on VLSI Design Co 3/7

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

EE582 Physical Design Automation of VLSI Circuits and Systems

CAD Algorithms. Placement and Floorplanning

10. Interconnects in CMOS Technology

ECE 449 OOP and Computer Simulation Lecture 09 Logic Simulation

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation

Chapter 5 Global Routing

IN VLSI CIRCUITS. MUL'I'-PADs, SINGLE LAYER POWER NETT ROUTING

Unit 7: Maze (Area) and Global Routing

Project Timing Analysis

Buffered Routing Tree Construction Under Buffer Placement Blockages

Cell Libraries and Design Hierarchy. Instructor S. Demlow ECE 410 February 1, 2012

Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004

In this lecture, we will focus on two very important digital building blocks: counters which can either count events or keep time information, and

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Laboratory 6. - Using Encounter for Automatic Place and Route. By Mulong Li, 2013

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

Design of Asynchronous Interconnect Network for SoC

THE latest generation of microprocessors uses a combination

Pad Ring and Floor Planning

(Refer Slide Time: 1:43)

EECS150 Homework 2 Solutions Fall ) CLD2 problem 2.2. Page 1 of 15

An Overview of Standard Cell Based Digital VLSI Design

L14 - Placement and Routing

On-Chip Variation (OCV) Kunal Ghosh

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion

The Design of the KiloCore Chip

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Interfacing Techniques in Embedded Systems

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS

SigmaRAM Echo Clocks

Spiral 2-8. Cell Layout

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

Performance-Preserved Analog Routing Methodology via Wire Load Reduction

App Note Application Note: Addressing Multiple FPAAs Using a SPI Interface

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings

TABLE OF CONTENTS 1.0 PURPOSE INTRODUCTION ESD CHECKS THROUGHOUT IC DESIGN FLOW... 2

Single-Strip Static CMOS Layout

ECE425: Introduction to VLSI System Design Machine Problem 3 Due: 11:59pm Friday, Dec. 15 th 2017

ICS 252 Introduction to Computer Design

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Summer 2003 Lecture 21 07/15/03

EE-382M VLSI II. Early Design Planning: Front End

VLSI Physical Design: From Graph Partitioning to Timing Closure

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Digital Integrated Circuits (83-313) Lecture 2: Technology and Standard Cell Layout

Concurrent/Parallel Processing

Module 5 - CPU Design

Lab 2. Standard Cell layout.

The Memory Component

Chapter 6 Detailed Routing

FPGA Programming Technology

This Part-A course discusses techniques that are used to reduce noise problems in the design of large scale integration (LSI) devices.

1. Internal Architecture of 8085 Microprocessor

Columbia Univerity Department of Electrical Engineering Fall, 2004

Estimation of Wirelength

Memory Supplement for Section 3.6 of the textbook

TUTORIAL II ECE 555 / 755 Updated on September 11 th 2006 CADENCE LAYOUT AND PARASITIC EXTRACTION

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

CAD Technology of the SX-9

An Interconnect-Centric Design Flow for Nanometer. Technologies

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution

Laboratory Exercise 3

Lay ay ut Design g R ules

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

From Datasheets to Digital Logic. synthesizing an FPGA SPI slave from the gates

On GPU Bus Power Reduction with 3D IC Technologies

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Virtuoso Layout Editor

A Multi-Layer Router Utilizing Over-Cell Areas

2 Introduction Chap. 1

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Prerequisites for Rou4ng

Transcription:

Clock Routing

Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets. Do not over-complicate the general router. In many designs, both these nets are manually routed. Sophisticated and accurate clock routing tools are a must for high-performance designs. CAD for VLSI 2

Clock Routing Clock synchronization is one of the most critical considerations in designing high-performance VLSI circuits. Data transfer between functional elements is synchronized by the clock. It is desirable to design a circuit with the fastest possible clock. The clock signal is typically generated external to the chip. Provided to the chip through clock pin. CAD for VLSI 3

Contd. Each functional unit which needs the clock is connected to clock pin by the clock net. Ideally, the clock must arrive at all the functional units precisely at the same time. In practice, clock skew exists. Maximum difference in the arrival time of a clock at two different components. Forces the designer to be conservative. Use a large time period between clock pulses, i.e. lower clock frequency. CAD for VLSI 4

Clocking Schemes The clock is a simple pulsating signal alternating between 0 and 1. CLK Clock period t Digital systems use a number of clocking schemes: 1. Single-phase clocking with latches 2. Single-phase clocking with flip-flops 3. Two-phase clocking CAD for VLSI 5

Single-phase Clocking with Latches The latch opens when the clock goes high. Data are accepted continuously while the clock is high. The latch closes when the clock goes down. Not commonly used due to their complicated timing requirements. Some high-performance circuits use this scheme. D CLK L Q CAD for VLSI 6

Single-phase Clocking with Flipflops Data are accepted only on the rising or falling edge of the clock. D CLK FF Q CAD for VLSI 7

Two-phase Clocking Use two latches, one is called the master and the other the slave. D1 MASTER Q1 CL D2 SLAVE Q2 CLK Δ CLK' CAD for VLSI 8

Clocking Schemes:: Contd. As a rule of thumb, most systems cannot tolerate a clock skew of more than 10% of the system clock period. A good clock distribution strategy is necessary. Also a requirement for designing high-performance circuits. CAD for VLSI 9

Clock Buffering Mechanisms Clock signal is global in nature. Clock lines are typically very long. Long wires have large capacitances, which limit the performance of the system. RC delay plays a big factor. RC delay cannot be reduced by making the wires wider. Resistance reduces, but capacitance increases. To reduce RC delay, buffers are used. Also helps to preserve the clock waveform. Significantly reduces the delay. May occupy as much as 5% of the total chip area. CAD for VLSI 10

Clock Buffering:: Approach 1 Use a big, centralized buffer. Better from skew minimization point of view. CAD for VLSI 11

Clock Buffering:: Approach 2 Distribute buffers in the branches of the clock tree. Use identical buffers so that the delay introduced by the buffers is equal in all branches. Regular layout of the clock tree, and equalization of the buffer loads help to reduce clock skew. CAD for VLSI 12

CAD for VLSI 13

Clock Routing Algorithms How to minimize skew? Distribute the clock signal in such a way that the interconnections carrying the clock signal to functional sub-blocks are equal in length. Several clock routing algorithms exist which try to achieve this goal. H-tree based algorithm X-tree based algorithm MMM algorithm Weighted center algorithm Zero clock skew routing CAD for VLSI 14

H-tree based Algorithm Consider that all clock terminals are arranged in a symmetrical fashion, as in the case of gate arrays. CAD for VLSI 15

Contd. In (a), all points are exactly 7 units from the point P 0, and hence the skew is zero. This ensures minimum-delay routing as well. P 0 and P 3 are at a distance 7 (rectilinear distance). Can be generalized to n points, where n is a power of 4. CAD for VLSI 16

X-tree based Algorithm An alternate tree structure with a smaller delay. Assuming non-rectilinear routing is possible. Although apparently better than H-trees, this may cause crosstalk due to close proximity of wires. Like H-trees, this is also applicable for very special structures. Not applicable in general. CAD for VLSI 17

Contd. CAD for VLSI 18

Method of Means & Medians (MMM) Follows a strategy very similar to the H-tree algorithm. Recursively partition a circuit into two equal parts. Connects the center of mass of the whole circuit to the centers of masses of the two partitioned sub-circuits. CAD for VLSI 19

Contd. How is the partitioning done? Let L x denote the list of clock points sorted according to their x-coordinates. Let P x be the median in L x. Assign points in list to the left of P x to P L. Assign the remaining points to P R. Next, we go for a horizontal partition, where we partition a set of points into two sets P B and P T. This process is repeated iteratively. CAD for VLSI 20

Contd. The basic algorithm ignores the blockages and produces a non-rectilinear tree. Some wires may also intersect. In the second phase, each wire can be converted so that it consists only of rectilinear segments and avoids blockages. CAD for VLSI 21

CAD for VLSI 22

Zero Skew Clock Routing Based on the Elmore delay model. Delay along an edge is proportional to its length. However, the delay along a path is defined recursively. The point set is recursively partitioned into two subsets, and trees are constructed in a bottom-up manner. Assume, inductively, that every sub-tree has achieved zero skew. Given two zero-skew sub-trees, merge them by an edge to achieve zero skew on the new tree. Necessary to decide the position of the connecting points (taps). Uses Elmore delay model for the purpose. CAD for VLSI 23

Power and Ground Routing

Basic Problem In a design, almost all blocks require power and ground connections. Power and ground nets are usually laid out entirely on the metal layer(s) of the chip. Due to smaller resistivity of metal. Planar single-layer implementation is desirable since contacts (via s) also significantly add to the parasitics. Routing of power (VDD) and ground (GND) nets consists of two main tasks: Construction of interconnection topology. Determination of the widths of the various segments. CAD for VLSI 25

Contd. Requirement: Find two non-intersecting interconnection trees. The width of the trees at any particular point must be proportional to the amount of current being drawn by the points in that sub-tree. CAD for VLSI 26

Approach 1:: Grid Structure Several rows of horizontal wires for both VDD and GND run parallel to each other on one metal layer. The vertical wires run in another metal layer and connect the horizontal wires. A block simply connects to the nearest VDD and GND wire. CAD for VLSI 27

CAD for VLSI 28

Approach 2:: Using Interdigitated Trees Tends to route nets in an inter-digitated fashion. Extends one net from the left edge of the chip, and the other from the right. Routing order of the connecting points is determined by the horizontal distances of the connecting points from the edge of the chip. CAD for VLSI 29

Contd. Nets are determined by a combined Lee and Line Search algorithm. Points of the left net which lie in the left half of the chip are routed using a fast line search algorithm. Similarly, for the right net in the right half of the chip. Next, all other points of the two sets are routed by Lee s algorithm. CAD for VLSI 30

CAD for VLSI 31

Summary Clock routing is one of the factors which determine the throughput of any chip. Power and ground routing needs special attention because of wire widths. Non-uniform wire widths. Careful sizing of wires is required. Routing of power and ground nets is often given first priority. Usually laid out entirely on metal layer(s). Signal nets may share the metal layer(s) with power and ground, but they change layers whenever a power or ground wire is encountered. Choice of layer: Aluminium most widely used. Superconductivity and optical interconnects for future highperformance chips. CAD for VLSI 32

Over-The-Cell Routing

Introduction Used in sophisticated channel routers in standard cell based designs. Basic idea: Use of area outside the channel to obtain reduction in channel height. Routing over the cell rows is possible due to limited use of the second and third metal layers. CAD for VLSI 34

Basic Steps in OTC Routing Step 1: Net decomposition Each multi-terminal net is partitioned into a set of 2- terminal nets (defined based on x-coordinates of their left ends). Step 2: Net classification Each net is classified into one of following types: Type 1: There is a vacant terminal directly opposite to one of the terminals of the net. Type 2: There is a vacant terminal between the two terminals of the net. Type 3: None of the above. CAD for VLSI 35

Contd. Step 3: Vacant terminal assignment Vacant terminals are assigned to each net depending on its type and weight. Weight of a net is defined as the improvement in channel congestion possible if this net can be routed over the cell. Step 4: Over-the-cell routing The selected nets are assigned exact geometric paths for routing in the area over the cells. Step 5: Channel segment assignment & routing Selects the best net segments to be routed in the channel. Route them using any available channel router. CAD for VLSI 36

Example Greedy channel router OTC routing CAD for VLSI 37