Retiming & Pipelining over Global Interconnects

Size: px
Start display at page:

Download "Retiming & Pipelining over Global Interconnects"

Transcription

1 Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles Joint work with C. C. Chang, D. Pan*, and X. Yuan * IBM Research

2 Motivation: How Far Can We Go in Each Clock Cycle 7 clock NTRS um Tech 6 clock 5 clock 5 G Hz across-chip clock 620 mm 2 (24.9mm x 24.9mm) IPEM BIWS estimations Buffer size: 100x Driver/receiver size: 100x From corner to corner: 7 clock cycles 4 clock 1 clock 2 clock 3 clock (mm)

3 Solutions Fully Fully asynchronous designs GALS GALS (global asynchronous locally synchronous designs) Latency-insensitive designs Synchronous designs, with multi-cycle communications Much better understood Supported by the current tool set More energy efficient?

4 Interconnect-Centric IC Design Flow Under Development at UCLA Architecture/Conceptual-level Design Design Specification HDM Interconnect Planning Physical Hierarchy Generation for Multi-Cycle Comm. Physical Hierarchy Generation for Multi-Cycle Comm. Interconnect Architecture Planning Interconnect Performance Estimation Models (IPEM) OWS, SDWS, BISWS Structure view Functional view Physical view Timing view Synthesis and Placement under Physical Hierarchy Interconnect Synthesis Topology genration & wiresizng for delay Wire ordering & spacing for noise control Interconnect Layout Route Planning abstraction Interconnect Optimization (TRIO) Topology Optimization with Buffer Insertion Wire sizing and spacing Simultaneous Buffer Insertion and Wire Sizing Simultaneous Topology Construction with Buffer Insertion and Wire Sizing Point-to-Point Gridless Routing Final Layout

5 Physical Hierarchy Generation Physical Hierarchy Generation Problem Formulation Logical Hierarchy Physical Hierarchy = Placement bins + module locations Hard IP Soft module Same color for modules of the same logic hierarchy Assign modules to physical hierarchy Defines global interconnects Optimization objectives: wire length minimization routing congestion minimization clock period, latency, performance (with consideration of multi-cycle comm.)

6 Need of Considering Retiming/Pipelining during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!!

7 Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!! After retiming, φ = 4.0

8 Difficulties How to consider retiming/pipelining over global interconnects Flip-flop boundaries are not fixed during placement, difficult to do static timing analysis Answer: Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) How to handle the high complexity of the combined problem Answer: Use the multi-level optimization technique

9 Simultaneous Coarse Placement with Retiming on Interconnects Our Our solution Compute the labels of all nodes under c-retiming c for a given placement solution and perform sequential timing analysis (Seq( Seq- TA) Minimize the longest sequential path by improving the placement solution Alternative solution [Brayton[ Brayton,, et al] Enforcing all loop constraints during placement

10 Static Timing Analysis (STA) a Sequential circuit example: PI: a, b. PO: g. c d e g b f a a c d e g Suppose d(v)=1, d(e)=2 a b g f c d e AT: Suppose clock cycle φ =11 RT: f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.

11 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) Definition [Pan et al, TCAD98] Given a clock period φ, transfer circuit C into an edge-weighted vertex weighted graph G, Label vertex v as l(v) l ) = the weight of longest path from PIs to v = max{l(u) - φ w(u,v) ) + d(u,v) ) + d(v)}, l(v) ) is also called SAT(v). Theorem: C can be retimed to φ + max{d(v)} iff l(pos) φ Relation to retiming: r(v) ) = l(v) ) / φ - 1 Complexity is O(VE) a b w(a,c)=1 w(b.c)=0 c l(a) = 7 l(b) = 3 d(a) a b d(b) w l (a,c)= d(e (a,c) )-φ w(a,c) d(c) c d(a)=d(b) = 1, d(a,c) = d(b,c)= 2, φ = 5 l(c) = max{ , 3+2+1} = 6 w l (b,c)= d(e (b,c) )-φ w(b,c)

12 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) a a b b c c Sequential circuit f f d Retimed circuit d e g d(v)=1, d(e)=2 Is φ = 4.5 possible? e g a b Retiming graph (not a DAG) 2 d -2.5 c e f Iter# a b c d e f g Cycle time 4.5 is possible because l(g) 4.5 g

13 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) (cont d) Sequential circuit a d c e g a Retiming graph (not a DAG) d c e g b f d(v)=1, d(e)=2 Is φ = 2.5 feasible? b f Iter# a b c d e f g Cycle time 2.5 is not feasible because l(g) > 2.5

14 Multi-Level Optimization Framework Levels Coarsening Problem sizes Uncoarsening & Refinement (optimization) Multi-level coarsening generates smaller problem sizes for top levels faster optimization on top levels May explore different aspects of the solution space at different levels Gradual refinement on good solutions from coarser levels is very efficient Successful in many applications Originally developed for PDE Recent success in VLSICAD: partitioning, placement, routing

15 Challenges Previous Previous Seq-TA can only handle single-output gate In reality multi-output modules exist IP block, MUX, adders Clusters in the multi-level level optimization process How How to integrate Seq-TA into multi-level level coarse placement efficiently Need Need to consider congestion and routability

16 Generalize c-retiming c for Complex Combinational Modules l 1 -value labeling for each vertex l 1 (v)=weight of the longest path from PIs to v using d (v) d as uniform gate delay Each vertex has a l 1 -value label. Upper bound of the labeling Reduce the non-uniformed gate delay to uniform gate delay by taking the max. Internal delay as the gate delay d (v) = max { d(v (i, j) ) } v I0 4 v O0 v I v v O1 I2 complex module (combinational logic) with multi-output and non-uniform propagation delay Flatten/Decompose the complex module by treating each pin of the module as vertex with zero delay. v I0 v I1 d (v)=11 v O0 v v O1 I2 v I0 4 v I v I2 3 v O0 v O1 l 2 -value labeling for each output of a vertex l 2 (vo t )=weight of the longest path from PIs to output o t Each output of a vertex has a l 2 -value label. Lower bound of the labeling of v

17 Properties of Generalized c-retiming c for Complex Combinational Modules Theorem: If a PO t with l 2 (PO t ) > Φ, then the circuit can not be retimed to a clock period of Φ. Theorem: If for every POi, l 1 (PO i ) Φ, then the circuit can be retimed to a clock period less than Φ+k, where k is max. input-output delay of all gates. Theorem: For any module v and its out-pin vo t, l 2 (vo t ) l 1 (v). Theorem: Given a circuit C, Φ is the min. clock period achieved by retiming on circuit C, if C c is derived from C by performing clustering,and the min. clock period achieved by retiming on C c is Φ c, then Φ Φ c.

18 Integrate Seq-TA with a Multi-level level SA-based Coarse Placement In coarsening phase, FFs can only be clustered after a certain level k Level L 0. From level L n to L k+1 perform static timing analysis (where FFs are clusterd) From level L k to L 0 perform Seq-TA (where FFs are not clustered). Level L k Level L n. Initial Placement. Refinement by timing-driven SA-based coarse placement

19 Area Density Problems in Multi-level level Coarse Placement Traditional area density control: Cell area in each bin < bin area utilization with a small percentage of overflow Does not work when cluster sizes may have significant variations and may be bigger than a bin How about use different grid sizes for different levels of clustering? Hard to find fixed percentages that works Significant placement cost jump when switch grid sizes

20 Hierarchical Area Density Control Use the same grid structure for placement for all clustering levels Impose hierarchy on bin structure for area density control Each cluster move must satisfy the area constraints on each level in the bin hierarchy Area constraint for moving a cell of size A Allowed overflow on each level in the bin hierarchy = ka, k is a small constant (usually 1 or 2) Work well in multi-level framework: Area constraints gradually tightened during optimization

21 Fast Incremental A-tree A Routing for Multi-pin Nets Root(source pin) Simple incremental A-tree Recursively Quad-partition grids Each pin recursively connects to lower left corner of each level of partition For net with bounding box length B, at most 2 *log B edge updates for each pin move, except the root. Each edge routed by LZ-router First Quadrant

22 Fast LZ-routing for Two-pin Connections HVH Left region VHV Right region Decide HVH or VHV: Select the less congested layer Binary search on V-stem (or H-stem) Initial left region and right region to cover bounding box Repeat Query wire usage on both regions Select region with less congestion Wire usage query can be done in O(log grid_size)

23 Placement Cost Functions Wire length driven: Summation of net bounding boxes of all nets Congestion driven: Wire usages estimated from the fast global router Cost = Summation of square of wire usages in all bins For fixed wire width cost equivalent to summation of weighted wire length, weight on a bin = wire usage of the bin For congestion driven run: only turns on congestion driven cost at the finest placement level W1 W2 W3 W4 W5 W6 Congestion cost = W1 2 + W W9 2 W7 W8 W9

24 Experimental Results on Wire Length Minimization Multi-level simulated annealing coarse placement Wire length comparison with GORDIAN-L: Our engine only turns on wire length optimization Legalized by DOMINO for wire length comparison mpg+dom/gor+dom Wire Length Comparison mpg+dom/gor+dom CPU Time Comparison 100% 99% 98% 97% 96% 95% 94% 97% 100% 96% 90% 80% 70% 60% 50% 40% 30% 20% 10% 81% 43% 22% 93% 0% 20k-50k k 100k-210k 20k-50k k 100k-210k 20k-50k test cases: avqlarge, avqsmall, ibm04, ibm07 50k-100k test cases: ibm09, ibm10 100k-210k test cases: ibm14, ibm15, ibm16, ibm17, ibm18 Our multi-level engine performs well for big circuits

25 Experimental Results on Congestion Control BBOX WL Routed WL Max boundary congestion Total overflow CPU mpg mpg-cg.rd mpg-cg Test cases: ibm01, ibm04, ibm07, ibm11, ibm13, ibm15 mpg: wire length driven mode mpg-cg: congestion driven at finest clustering level mpg-cg.rd: alternative congestion driven + wire length driven at fines clustering level

26 Initial Experimental Result on Impact of Simultaneous Retiming and Placement circuit #gates Grid size WL-driven placement Simultaneous retiming and placement dly dly dly (before retiming) (after retiming) S x Ind x Ind x Ind x Ind x Avg

27 Limitation of Exploring Multi-cycle Interconnect Communication during Logic Synthesis Minimum Minimum clock period can be achieved by logic optimization is bounded by max. delay-to to-register (DR) ratio of the loops in the circuits In a loop, 4 logic cells, 2 registers Cell delay =1ns Interconnect delay=1ns DR ratio = (D logic +D int )/#Registers = (4+4)/2=4ns Clock cycle >= 4ns Require Require consideration of multi-cycle communication during architecture & behavior synthesis

28 Regular Distributed Register Architecture (1) FUC FUC FUC Island Register File. DIV MUX ADD Cluster with area constraint FUC Global Interconnect FUC FUC Function Unit Cluster (FUC) W i H i D intra island = Dlog ic + Dopt int Dlog ic + Dopt int(2w i + 2Hi ) T Distribute registers to each island Local computation and communication in each island can be done in a single clock cycle But registers may need to be inserted along global interconnects for multi-cycle communication (less regular)

29 Regular Distributed Register Architecture (2) FUC FUC FUC 1 cycle Island Register File 2 cycle. k cycle DIV MUX ADD Cluster with area constraint Global Interconnect Function Unit Cluster (FUC) H i FUC FUC FUC W i D intra island = Dlog ic + Dopt int Dlog ic + Dopt int(2w i + 2Hi ) Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, k cycle interconnect communication in each island Highly regular T

30 Example : Regular Distributed Register Architecture for 70nm Technology NTRS 97 70nm Tech Chip dimension: 620 mm 2 (24.9mm x 24.9mm) 5 G Hz across-chip clock Wire can travel up to 7.52mm within 1 clock cycle under interconnect optimization Need 7 clock cycles to cross the chip Each island base dimension Wi = Hi=2.08mm = critical length (longest length that a wire can run without buffer insertion) estimated by IPEM BIWS estimations assuming buffer size: 2x, driver/receiver size: 2x 1/3 of distance a wire can travel in 1 clock cycle Logic volume: 6.76M min-size 2-NAND gates 12X12 island-base array Local registers are partitioned to 7 banks

31 Example: Impact of Interconnect on Scheduling Data flow graph extracted from discrete cosine transformation (DCT) The delay of * operation is 2ns, the delay of + and operation is 1ns. The resources available are 2 multipliers and 2 ALUs. The nodes with the same color are assigned to the same functional unit * 3 * Mul2 3,7,12 Alu1 1,5,10 Alu2 2,6,9 * 7 * 8-9 * 11 * Mul1 4,8,11 FUC Represents long Interconnect delay. The long interconnect delay is 2ns. Represents short Interconnect delay. Short Interconnect delay is 1ns. Wirelength-driven Placement

32 Single-cycle vs. Multi-cycle Interconnect Communication Represents registers. + 2 Cycle Cycle 1-1 Cycle2 * 3 * 4 Cycle2 * 3 * 4 Cycle Cycle Cycle 4 Cycle5 * 11 * 8 Cycle 4 * 7 * 11 Cycle6 * 7 * 12 Cycle5 * 8 * 12 Cycle Cycle6-10 Cycle8-9 Cycle9 Single-cycle interconnect communication Scheduled in 6 clock cycles Clock period is 4ns Total latency is 24ns Multi-cycle interconnect communication Scheduled in 9 clock cycles Clock period is 2ns Total latency is 18ns

33 Enhancement 1: Simultaneous Placement and Scheduling for Performance Optimization Cycle1 * 3 * 4 Cycle2 Mul2 3,7,12 Alu1 1,5, Cycle3 * 7 * 8 Cycle4 Cycle5 * 11 Cycle6 * 12 Mul1 4,8,11 Alu2 2,6,9-9 Cycle7-10 Cycle8 Simultaneous Placement and Scheduling With placement integrated with scheduling, critical path is reduced. The DFG can be scheduled in 8 clock cycles, with clock period of 2ns. The total latency is 16ns.

34 Enhancement 2: Simultaneous Placement, Scheduling and Binding for Performance Optimization * 3 * 4 Cycle1 Cycle2 Mul2 3,7,11 Alu1 1,5, Cycle3 Cycle4 * 7 * 12 Cycle5 Mul1 4,8,12 Alu2 2,6,9 * 8 * 11 Cycle Cycle7 Simultaneous Placement, Scheduling and Binding With placement integrated with scheduling and binding, the critical path is further reduced. The DFG can be scheduled in 7 clock cycles, with clock period of 2ns. The total latency is 14ns

35 Example: Multicluster Architectures of DEC Alpha Source: The Multicluster Architecture: Reducing Cycle Time Through Partitioning by Keith I. Farkas, et al

36 Conclusions Multi-cycle communication is needed for gigahertz designs Sequential timing analysis + multilevel optimization enables efficient retiming/pipelining over global interconnects Regular Regular distributed register (RDR) fabric provides regularity to support Multicycle communication Integrated resource binding, scheduling, and physical planning

Regular Fabrics for Retiming & Pipelining over Global Interconnects

Regular Fabrics for Retiming & Pipelining over Global Interconnects Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs cs.ucla.edu http://cadlab cadlab.cs.ucla.edu/~cong

More information

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

Pilot: A Platform-based HW/SW Synthesis System

Pilot: A Platform-based HW/SW Synthesis System Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The

More information

Multilevel Global Placement With Congestion Control

Multilevel Global Placement With Congestion Control IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 4, APRIL 2003 395 Multilevel Global Placement With Congestion Control Chin-Chih Chang, Jason Cong, Fellow, IEEE,

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong UCLA Computer Science Department Los Angeles, CA 90095 http://cadlab.cs.ucla.edu/~ /~cong

More information

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Thermal-Aware 3D IC Physical Design and Architecture Exploration Thermal-Aware 3D IC Physical Design and Architecture Exploration Jason Cong & Guojie Luo UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Supported by DARPA Outline Thermal-Aware

More information

NANOMETER process technologies allow billions of transistors

NANOMETER process technologies allow billions of transistors 550 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 Architecture and Synthesis for On-Chip Multicycle Communication Jason Cong, Fellow, IEEE, Yiping

More information

Architecture and Synthesis for Multi-Cycle Communication

Architecture and Synthesis for Multi-Cycle Communication Architecture and Synthesis for Multi-Cycle Communication Jason Cong, Yiping Fan, Xun Yang, Zhiru Zhang Computer Science Department University of California, Los Angeles Los Angeles CA 90095 USA {cong,

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

Architecture-Level Synthesis for Automatic Interconnect Pipelining

Architecture-Level Synthesis for Automatic Interconnect Pipelining Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu

More information

DpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm

DpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm DpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm Zhen Cao 1,Tong Jing 1, 2, Jinjun Xiong 2, Yu Hu 2, Lei He 2, Xianlong Hong 1 1 Tsinghua University 2 University of California,

More information

Large Scale Circuit Placement: Gap and Promise

Large Scale Circuit Placement: Gap and Promise Large Scale Circuit Placement: Gap and Promise Jason Cong 1, Tim Kong 2, Joseph R. Shinnerl 1, Min Xie 1 and Xin Yuan 1 UCLA VLSI CAD LAB 1 Magma Design Automation 2 Outline Introduction Gap Analysis of

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

L14 - Placement and Routing

L14 - Placement and Routing L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Planning for Local Net Congestion in Global Routing

Planning for Local Net Congestion in Global Routing Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering

More information

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming. Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

THE continuous increase of the problem size of IC routing

THE continuous increase of the problem size of IC routing 382 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 3, MARCH 2005 MARS A Multilevel Full-Chip Gridless Routing System Jason Cong, Fellow, IEEE, Jie Fang, Min

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Global Clustering-Based Performance-Driven Circuit Partitioning

Global Clustering-Based Performance-Driven Circuit Partitioning Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California at Los Angeles Los Angeles, CA 90095 cong@cs.ucla.edu Chang Wu Aplus Design Technologies, Inc. Los Angeles,

More information

Graph Models for Global Routing: Grid Graph

Graph Models for Global Routing: Grid Graph Graph Models for Global Routing: Grid Graph Each cell is represented by a vertex. Two vertices are joined by an edge if the corresponding cells are adjacent to each other. The occupied cells are represented

More information

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues

More information

CAD Algorithms. Placement and Floorplanning

CAD Algorithms. Placement and Floorplanning CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:

More information

Metal-Density Driven Placement for CMP Variation and Routability

Metal-Density Driven Placement for CMP Variation and Routability Metal-Density Driven Placement for CMP Variation and Routability ISPD-2008 Tung-Chieh Chen 1, Minsik Cho 2, David Z. Pan 2, and Yao-Wen Chang 1 1 Dept. of EE, National Taiwan University 2 Dept. of ECE,

More information

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5) EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

An Efficient Computation of Statistically Critical Sequential Paths Under Retiming

An Efficient Computation of Statistically Critical Sequential Paths Under Retiming An Efficient Computation of Statistically Critical Sequential Paths Under Retiming Mongkol Ekpanyapong, Xin Zhao, and Sung Kyu Lim Intel Corporation, Folsom, California, USA Georgia Institute of Technology,

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR An Introduction to FPGA Placement Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR

More information

ECE260B CSE241A Winter Logic Synthesis

ECE260B CSE241A Winter Logic Synthesis ECE260B CSE241A Winter 2007 Logic Synthesis Website: /courses/ece260b-w07 ECE 260B CSE 241A Static Timing Analysis 1 Slides courtesy of Dr. Cho Moon Introduction Why logic synthesis? Ubiquitous used almost

More information

UCLA 3D research started in 2002 under DARPA with CFDRC

UCLA 3D research started in 2002 under DARPA with CFDRC Coping with Vertical Interconnect Bottleneck Jason Cong UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/ cs edu/~cong Outline Lessons learned Research challenges and opportunities

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 09: Routing Introduction to Routing Global Routing Detailed Routing 2

More information

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for

More information

ECE260B CSE241A Winter Placement

ECE260B CSE241A Winter Placement ECE260B CSE241A Winter 2005 Placement Website: / courses/ ece260b- w05 ECE260B CSE241A Placement.1 Slides courtesy of Prof. Andrew B. Slides courtesy of Prof. Andrew B. Kahng VLSI Design Flow and Physical

More information

Synthesis at different abstraction levels

Synthesis at different abstraction levels Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

Temperature-Aware Routing in 3D ICs

Temperature-Aware Routing in 3D ICs Temperature-Aware Routing in 3D ICs Tianpei Zhang, Yong Zhan and Sachin S. Sapatnekar Department of Electrical and Computer Engineering University of Minnesota 1 Outline Temperature-aware 3D global routing

More information

OpenAccess In 3D IC Physical Design

OpenAccess In 3D IC Physical Design OpenAccess In 3D IC Physical Design Jason Cong, Jie Wei,, Yan Zhang VLSI CAD Lab Computer Science Department University of California, Los Angeles Supported by DARPA and CFD Research Corp Outline 3D IC

More information

Constraint Driven I/O Planning and Placement for Chip-package Co-design

Constraint Driven I/O Planning and Placement for Chip-package Co-design Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los Angeles Rio Design Automation, Inc. Agenda Motivation

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Logic Design Process Combinational logic networks Functionality. Other requirements: Size. Power. Primary inputs Performance.

More information

Niyati Shah Department of ECE University of Toronto

Niyati Shah Department of ECE University of Toronto Niyati Shah Department of ECE University of Toronto shahniya@eecg.utoronto.ca Jonathan Rose Department of ECE University of Toronto jayar@eecg.utoronto.ca 1 Involves connecting output pins of logic blocks

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007 EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

Lecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits

Lecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits Lecture 21: Combinational Circuits Integrated Circuits Combinational Circuits Multiplexer Demultiplexer Decoder Adders ALU Integrated Circuits Circuits use modules that contain multiple gates packaged

More information

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital

More information

An Exact Algorithm for the Statistical Shortest Path Problem

An Exact Algorithm for the Statistical Shortest Path Problem An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation

More information

EE582 Physical Design Automation of VLSI Circuits and Systems

EE582 Physical Design Automation of VLSI Circuits and Systems EE582 Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Preliminaries Table of Contents Semiconductor manufacturing Problems to solve Algorithm complexity

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: January 2, 2018 at 11:23 CS429 Slideset 5: 1 Topics of this Slideset

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist

More information

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 10: Repeater (Buffer) Insertion Introduction to Buffering Buffer Insertion

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

ICS 252 Introduction to Computer Design

ICS 252 Introduction to Computer Design ICS 252 Introduction to Computer Design Lecture 16 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred (none required) [Mic94] G. De Micheli Synthesis and Optimization

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

CSE241 VLSI Digital Circuits UC San Diego

CSE241 VLSI Digital Circuits UC San Diego CSE241 VLSI Digital Circuits UC San Diego Winter 2003 Lecture 05: Logic Synthesis Cho Moon Cadence Design Systems January 21, 2003 CSE241 L5 Synthesis.1 Kahng & Cichy, UCSD 2003 Outline Introduction Two-level

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

ASIC Physical Design Top-Level Chip Layout

ASIC Physical Design Top-Level Chip Layout ASIC Physical Design Top-Level Chip Layout References: M. Smith, Application Specific Integrated Circuits, Chap. 16 Cadence Virtuoso User Manual Top-level IC design process Typically done before individual

More information

ECE260B CSE241A Winter Routing

ECE260B CSE241A Winter Routing ECE260B CSE241A Winter 2005 Routing Website: / courses/ ece260bw05 ECE 260B CSE 241A Routing 1 Slides courtesy of Prof. Andrew B. Kahng Physical Design Flow Input Floorplanning Read Netlist Floorplanning

More information

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface. Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on

More information

Processing Rate Optimization by Sequential System Floorplanning

Processing Rate Optimization by Sequential System Floorplanning Processing Rate Optimization by Sequential System Floorplanning Jia Wang Ping-Chih Wu Hai Zhou EECS Department Northwestern University Evanston, IL 60208, U.S.A. {jwa112, haizhou}@ece.northwestern.edu

More information

Chapter 5 Global Routing

Chapter 5 Global Routing Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of Routing Regions 5.5 The Global Routing Flow 5.6 Single-Net Routing 5.6. Rectilinear

More information

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms ECE 7 Complex Digital ASIC Design Topic : Physical Design Automation Algorithms Christopher atten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece7

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

Memory, Area and Power Optimization of Digital Circuits

Memory, Area and Power Optimization of Digital Circuits Memory, Area and Power Optimization of Digital Circuits Laxmi Gupta Electronics and Communication Department Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Ankita Bharti Electronics

More information

Design of a Low Density Parity Check Iterative Decoder

Design of a Low Density Parity Check Iterative Decoder 1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Abacus: Fast Legalization of Standard Cell Circuits with Minimal Movement

Abacus: Fast Legalization of Standard Cell Circuits with Minimal Movement EDA Institute for Electronic Design Automation Prof. Ulf Schlichtmann Abacus: Fast Legalization of Standard Cell Circuits with Minimal Movement Peter Spindler, Ulf Schlichtmann and Frank M. Johannes Technische

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Additional Slides to De Micheli Book

Additional Slides to De Micheli Book Additional Slides to De Micheli Book Sungho Kang Yonsei University Design Style - Decomposition 08 3$9 0 Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating

More information

A Framework for Systematic Evaluation and Exploration of Design Rules

A Framework for Systematic Evaluation and Exploration of Design Rules A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Multi-level Quadratic Placement for Standard Cell Designs

Multi-level Quadratic Placement for Standard Cell Designs CS258f Project Report Kenton Sze Kevin Chen 06.10.02 Prof Cong Multi-level Quadratic Placement for Standard Cell Designs Project Description/Objectives: The goal of this project was to provide an algorithm

More information

Sequential/Parallel Global Routing Algorithms for VLSI Standard. Cells

Sequential/Parallel Global Routing Algorithms for VLSI Standard. Cells Sequential/Parallel Global Routing Algorithms for VLSI Standard Cells A Thesis Presented to The Faculty of Graduate Studies of The University of Guelph by HAO SUN In partial fulfilment of requirements

More information

Memory System Design. Outline

Memory System Design. Outline Memory System Design Chapter 16 S. Dandamudi Outline Introduction A simple memory block Memory design with D flip flops Problems with the design Techniques to connect to a bus Using multiplexers Using

More information

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1 HIERARCHICAL DESIGN Chapter 13 1 Outline 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical

More information

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design Outline HIERARCHICAL DESIGN 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 1 Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,

More information

Hardware Design with VHDL PLDs IV ECE 443

Hardware Design with VHDL PLDs IV ECE 443 Embedded Processor Cores (Hard and Soft) Electronic design can be realized in hardware (logic gates/registers) or software (instructions executed on a microprocessor). The trade-off is determined by how

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Umadevi.S #1, Vigneswaran.T #2 # Assistant Professor [Sr], School of Electronics Engineering, VIT University, Vandalur-

More information