Retiming & Pipelining over Global Interconnects
|
|
- Beatrix Lane
- 5 years ago
- Views:
Transcription
1 Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles Joint work with C. C. Chang, D. Pan*, and X. Yuan * IBM Research
2 Motivation: How Far Can We Go in Each Clock Cycle 7 clock NTRS um Tech 6 clock 5 clock 5 G Hz across-chip clock 620 mm 2 (24.9mm x 24.9mm) IPEM BIWS estimations Buffer size: 100x Driver/receiver size: 100x From corner to corner: 7 clock cycles 4 clock 1 clock 2 clock 3 clock (mm)
3 Solutions Fully Fully asynchronous designs GALS GALS (global asynchronous locally synchronous designs) Latency-insensitive designs Synchronous designs, with multi-cycle communications Much better understood Supported by the current tool set More energy efficient?
4 Interconnect-Centric IC Design Flow Under Development at UCLA Architecture/Conceptual-level Design Design Specification HDM Interconnect Planning Physical Hierarchy Generation for Multi-Cycle Comm. Physical Hierarchy Generation for Multi-Cycle Comm. Interconnect Architecture Planning Interconnect Performance Estimation Models (IPEM) OWS, SDWS, BISWS Structure view Functional view Physical view Timing view Synthesis and Placement under Physical Hierarchy Interconnect Synthesis Topology genration & wiresizng for delay Wire ordering & spacing for noise control Interconnect Layout Route Planning abstraction Interconnect Optimization (TRIO) Topology Optimization with Buffer Insertion Wire sizing and spacing Simultaneous Buffer Insertion and Wire Sizing Simultaneous Topology Construction with Buffer Insertion and Wire Sizing Point-to-Point Gridless Routing Final Layout
5 Physical Hierarchy Generation Physical Hierarchy Generation Problem Formulation Logical Hierarchy Physical Hierarchy = Placement bins + module locations Hard IP Soft module Same color for modules of the same logic hierarchy Assign modules to physical hierarchy Defines global interconnects Optimization objectives: wire length minimization routing congestion minimization clock period, latency, performance (with consideration of multi-cycle comm.)
6 Need of Considering Retiming/Pipelining during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!!
7 Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!! After retiming, φ = 4.0
8 Difficulties How to consider retiming/pipelining over global interconnects Flip-flop boundaries are not fixed during placement, difficult to do static timing analysis Answer: Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) How to handle the high complexity of the combined problem Answer: Use the multi-level optimization technique
9 Simultaneous Coarse Placement with Retiming on Interconnects Our Our solution Compute the labels of all nodes under c-retiming c for a given placement solution and perform sequential timing analysis (Seq( Seq- TA) Minimize the longest sequential path by improving the placement solution Alternative solution [Brayton[ Brayton,, et al] Enforcing all loop constraints during placement
10 Static Timing Analysis (STA) a Sequential circuit example: PI: a, b. PO: g. c d e g b f a a c d e g Suppose d(v)=1, d(e)=2 a b g f c d e AT: Suppose clock cycle φ =11 RT: f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.
11 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) Definition [Pan et al, TCAD98] Given a clock period φ, transfer circuit C into an edge-weighted vertex weighted graph G, Label vertex v as l(v) l ) = the weight of longest path from PIs to v = max{l(u) - φ w(u,v) ) + d(u,v) ) + d(v)}, l(v) ) is also called SAT(v). Theorem: C can be retimed to φ + max{d(v)} iff l(pos) φ Relation to retiming: r(v) ) = l(v) ) / φ - 1 Complexity is O(VE) a b w(a,c)=1 w(b.c)=0 c l(a) = 7 l(b) = 3 d(a) a b d(b) w l (a,c)= d(e (a,c) )-φ w(a,c) d(c) c d(a)=d(b) = 1, d(a,c) = d(b,c)= 2, φ = 5 l(c) = max{ , 3+2+1} = 6 w l (b,c)= d(e (b,c) )-φ w(b,c)
12 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) a a b b c c Sequential circuit f f d Retimed circuit d e g d(v)=1, d(e)=2 Is φ = 4.5 possible? e g a b Retiming graph (not a DAG) 2 d -2.5 c e f Iter# a b c d e f g Cycle time 4.5 is possible because l(g) 4.5 g
13 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) (cont d) Sequential circuit a d c e g a Retiming graph (not a DAG) d c e g b f d(v)=1, d(e)=2 Is φ = 2.5 feasible? b f Iter# a b c d e f g Cycle time 2.5 is not feasible because l(g) > 2.5
14 Multi-Level Optimization Framework Levels Coarsening Problem sizes Uncoarsening & Refinement (optimization) Multi-level coarsening generates smaller problem sizes for top levels faster optimization on top levels May explore different aspects of the solution space at different levels Gradual refinement on good solutions from coarser levels is very efficient Successful in many applications Originally developed for PDE Recent success in VLSICAD: partitioning, placement, routing
15 Challenges Previous Previous Seq-TA can only handle single-output gate In reality multi-output modules exist IP block, MUX, adders Clusters in the multi-level level optimization process How How to integrate Seq-TA into multi-level level coarse placement efficiently Need Need to consider congestion and routability
16 Generalize c-retiming c for Complex Combinational Modules l 1 -value labeling for each vertex l 1 (v)=weight of the longest path from PIs to v using d (v) d as uniform gate delay Each vertex has a l 1 -value label. Upper bound of the labeling Reduce the non-uniformed gate delay to uniform gate delay by taking the max. Internal delay as the gate delay d (v) = max { d(v (i, j) ) } v I0 4 v O0 v I v v O1 I2 complex module (combinational logic) with multi-output and non-uniform propagation delay Flatten/Decompose the complex module by treating each pin of the module as vertex with zero delay. v I0 v I1 d (v)=11 v O0 v v O1 I2 v I0 4 v I v I2 3 v O0 v O1 l 2 -value labeling for each output of a vertex l 2 (vo t )=weight of the longest path from PIs to output o t Each output of a vertex has a l 2 -value label. Lower bound of the labeling of v
17 Properties of Generalized c-retiming c for Complex Combinational Modules Theorem: If a PO t with l 2 (PO t ) > Φ, then the circuit can not be retimed to a clock period of Φ. Theorem: If for every POi, l 1 (PO i ) Φ, then the circuit can be retimed to a clock period less than Φ+k, where k is max. input-output delay of all gates. Theorem: For any module v and its out-pin vo t, l 2 (vo t ) l 1 (v). Theorem: Given a circuit C, Φ is the min. clock period achieved by retiming on circuit C, if C c is derived from C by performing clustering,and the min. clock period achieved by retiming on C c is Φ c, then Φ Φ c.
18 Integrate Seq-TA with a Multi-level level SA-based Coarse Placement In coarsening phase, FFs can only be clustered after a certain level k Level L 0. From level L n to L k+1 perform static timing analysis (where FFs are clusterd) From level L k to L 0 perform Seq-TA (where FFs are not clustered). Level L k Level L n. Initial Placement. Refinement by timing-driven SA-based coarse placement
19 Area Density Problems in Multi-level level Coarse Placement Traditional area density control: Cell area in each bin < bin area utilization with a small percentage of overflow Does not work when cluster sizes may have significant variations and may be bigger than a bin How about use different grid sizes for different levels of clustering? Hard to find fixed percentages that works Significant placement cost jump when switch grid sizes
20 Hierarchical Area Density Control Use the same grid structure for placement for all clustering levels Impose hierarchy on bin structure for area density control Each cluster move must satisfy the area constraints on each level in the bin hierarchy Area constraint for moving a cell of size A Allowed overflow on each level in the bin hierarchy = ka, k is a small constant (usually 1 or 2) Work well in multi-level framework: Area constraints gradually tightened during optimization
21 Fast Incremental A-tree A Routing for Multi-pin Nets Root(source pin) Simple incremental A-tree Recursively Quad-partition grids Each pin recursively connects to lower left corner of each level of partition For net with bounding box length B, at most 2 *log B edge updates for each pin move, except the root. Each edge routed by LZ-router First Quadrant
22 Fast LZ-routing for Two-pin Connections HVH Left region VHV Right region Decide HVH or VHV: Select the less congested layer Binary search on V-stem (or H-stem) Initial left region and right region to cover bounding box Repeat Query wire usage on both regions Select region with less congestion Wire usage query can be done in O(log grid_size)
23 Placement Cost Functions Wire length driven: Summation of net bounding boxes of all nets Congestion driven: Wire usages estimated from the fast global router Cost = Summation of square of wire usages in all bins For fixed wire width cost equivalent to summation of weighted wire length, weight on a bin = wire usage of the bin For congestion driven run: only turns on congestion driven cost at the finest placement level W1 W2 W3 W4 W5 W6 Congestion cost = W1 2 + W W9 2 W7 W8 W9
24 Experimental Results on Wire Length Minimization Multi-level simulated annealing coarse placement Wire length comparison with GORDIAN-L: Our engine only turns on wire length optimization Legalized by DOMINO for wire length comparison mpg+dom/gor+dom Wire Length Comparison mpg+dom/gor+dom CPU Time Comparison 100% 99% 98% 97% 96% 95% 94% 97% 100% 96% 90% 80% 70% 60% 50% 40% 30% 20% 10% 81% 43% 22% 93% 0% 20k-50k k 100k-210k 20k-50k k 100k-210k 20k-50k test cases: avqlarge, avqsmall, ibm04, ibm07 50k-100k test cases: ibm09, ibm10 100k-210k test cases: ibm14, ibm15, ibm16, ibm17, ibm18 Our multi-level engine performs well for big circuits
25 Experimental Results on Congestion Control BBOX WL Routed WL Max boundary congestion Total overflow CPU mpg mpg-cg.rd mpg-cg Test cases: ibm01, ibm04, ibm07, ibm11, ibm13, ibm15 mpg: wire length driven mode mpg-cg: congestion driven at finest clustering level mpg-cg.rd: alternative congestion driven + wire length driven at fines clustering level
26 Initial Experimental Result on Impact of Simultaneous Retiming and Placement circuit #gates Grid size WL-driven placement Simultaneous retiming and placement dly dly dly (before retiming) (after retiming) S x Ind x Ind x Ind x Ind x Avg
27 Limitation of Exploring Multi-cycle Interconnect Communication during Logic Synthesis Minimum Minimum clock period can be achieved by logic optimization is bounded by max. delay-to to-register (DR) ratio of the loops in the circuits In a loop, 4 logic cells, 2 registers Cell delay =1ns Interconnect delay=1ns DR ratio = (D logic +D int )/#Registers = (4+4)/2=4ns Clock cycle >= 4ns Require Require consideration of multi-cycle communication during architecture & behavior synthesis
28 Regular Distributed Register Architecture (1) FUC FUC FUC Island Register File. DIV MUX ADD Cluster with area constraint FUC Global Interconnect FUC FUC Function Unit Cluster (FUC) W i H i D intra island = Dlog ic + Dopt int Dlog ic + Dopt int(2w i + 2Hi ) T Distribute registers to each island Local computation and communication in each island can be done in a single clock cycle But registers may need to be inserted along global interconnects for multi-cycle communication (less regular)
29 Regular Distributed Register Architecture (2) FUC FUC FUC 1 cycle Island Register File 2 cycle. k cycle DIV MUX ADD Cluster with area constraint Global Interconnect Function Unit Cluster (FUC) H i FUC FUC FUC W i D intra island = Dlog ic + Dopt int Dlog ic + Dopt int(2w i + 2Hi ) Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, k cycle interconnect communication in each island Highly regular T
30 Example : Regular Distributed Register Architecture for 70nm Technology NTRS 97 70nm Tech Chip dimension: 620 mm 2 (24.9mm x 24.9mm) 5 G Hz across-chip clock Wire can travel up to 7.52mm within 1 clock cycle under interconnect optimization Need 7 clock cycles to cross the chip Each island base dimension Wi = Hi=2.08mm = critical length (longest length that a wire can run without buffer insertion) estimated by IPEM BIWS estimations assuming buffer size: 2x, driver/receiver size: 2x 1/3 of distance a wire can travel in 1 clock cycle Logic volume: 6.76M min-size 2-NAND gates 12X12 island-base array Local registers are partitioned to 7 banks
31 Example: Impact of Interconnect on Scheduling Data flow graph extracted from discrete cosine transformation (DCT) The delay of * operation is 2ns, the delay of + and operation is 1ns. The resources available are 2 multipliers and 2 ALUs. The nodes with the same color are assigned to the same functional unit * 3 * Mul2 3,7,12 Alu1 1,5,10 Alu2 2,6,9 * 7 * 8-9 * 11 * Mul1 4,8,11 FUC Represents long Interconnect delay. The long interconnect delay is 2ns. Represents short Interconnect delay. Short Interconnect delay is 1ns. Wirelength-driven Placement
32 Single-cycle vs. Multi-cycle Interconnect Communication Represents registers. + 2 Cycle Cycle 1-1 Cycle2 * 3 * 4 Cycle2 * 3 * 4 Cycle Cycle Cycle 4 Cycle5 * 11 * 8 Cycle 4 * 7 * 11 Cycle6 * 7 * 12 Cycle5 * 8 * 12 Cycle Cycle6-10 Cycle8-9 Cycle9 Single-cycle interconnect communication Scheduled in 6 clock cycles Clock period is 4ns Total latency is 24ns Multi-cycle interconnect communication Scheduled in 9 clock cycles Clock period is 2ns Total latency is 18ns
33 Enhancement 1: Simultaneous Placement and Scheduling for Performance Optimization Cycle1 * 3 * 4 Cycle2 Mul2 3,7,12 Alu1 1,5, Cycle3 * 7 * 8 Cycle4 Cycle5 * 11 Cycle6 * 12 Mul1 4,8,11 Alu2 2,6,9-9 Cycle7-10 Cycle8 Simultaneous Placement and Scheduling With placement integrated with scheduling, critical path is reduced. The DFG can be scheduled in 8 clock cycles, with clock period of 2ns. The total latency is 16ns.
34 Enhancement 2: Simultaneous Placement, Scheduling and Binding for Performance Optimization * 3 * 4 Cycle1 Cycle2 Mul2 3,7,11 Alu1 1,5, Cycle3 Cycle4 * 7 * 12 Cycle5 Mul1 4,8,12 Alu2 2,6,9 * 8 * 11 Cycle Cycle7 Simultaneous Placement, Scheduling and Binding With placement integrated with scheduling and binding, the critical path is further reduced. The DFG can be scheduled in 7 clock cycles, with clock period of 2ns. The total latency is 14ns
35 Example: Multicluster Architectures of DEC Alpha Source: The Multicluster Architecture: Reducing Cycle Time Through Partitioning by Keith I. Farkas, et al
36 Conclusions Multi-cycle communication is needed for gigahertz designs Sequential timing analysis + multilevel optimization enables efficient retiming/pipelining over global interconnects Regular Regular distributed register (RDR) fabric provides regularity to support Multicycle communication Integrated resource binding, scheduling, and physical planning
Regular Fabrics for Retiming & Pipelining over Global Interconnects
Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs cs.ucla.edu http://cadlab cadlab.cs.ucla.edu/~cong
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies. Outline
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device
More informationPilot: A Platform-based HW/SW Synthesis System
Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The
More informationMultilevel Global Placement With Congestion Control
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 4, APRIL 2003 395 Multilevel Global Placement With Congestion Control Chin-Chih Chang, Jason Cong, Fellow, IEEE,
More informationInterconnect Delay and Area Estimation for Multiple-Pin Nets
Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong UCLA Computer Science Department Los Angeles, CA 90095 http://cadlab.cs.ucla.edu/~ /~cong
More informationThermal-Aware 3D IC Physical Design and Architecture Exploration
Thermal-Aware 3D IC Physical Design and Architecture Exploration Jason Cong & Guojie Luo UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Supported by DARPA Outline Thermal-Aware
More informationNANOMETER process technologies allow billions of transistors
550 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 Architecture and Synthesis for On-Chip Multicycle Communication Jason Cong, Fellow, IEEE, Yiping
More informationArchitecture and Synthesis for Multi-Cycle Communication
Architecture and Synthesis for Multi-Cycle Communication Jason Cong, Yiping Fan, Xun Yang, Zhiru Zhang Computer Science Department University of California, Los Angeles Los Angeles CA 90095 USA {cong,
More informationAn Interconnect-Centric Design Flow for Nanometer. Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled
More informationArchitecture-Level Synthesis for Automatic Interconnect Pipelining
Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu
More informationDpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm
DpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm Zhen Cao 1,Tong Jing 1, 2, Jinjun Xiong 2, Yu Hu 2, Lei He 2, Xianlong Hong 1 1 Tsinghua University 2 University of California,
More informationLarge Scale Circuit Placement: Gap and Promise
Large Scale Circuit Placement: Gap and Promise Jason Cong 1, Tim Kong 2, Joseph R. Shinnerl 1, Min Xie 1 and Xin Yuan 1 UCLA VLSI CAD LAB 1 Magma Design Automation 2 Outline Introduction Gap Analysis of
More informationS 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d
Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu
More informationL14 - Placement and Routing
L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist
More informationSymmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment
Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network
More informationBasic Idea. The routing problem is typically solved using a twostep
Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a
More informationPlanning for Local Net Congestion in Global Routing
Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering
More informationRetiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.
Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum
More informationCluster-based approach eases clock tree synthesis
Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network
More informationTHE continuous increase of the problem size of IC routing
382 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 3, MARCH 2005 MARS A Multilevel Full-Chip Gridless Routing System Jason Cong, Fellow, IEEE, Jie Fang, Min
More informationLinking Layout to Logic Synthesis: A Unification-Based Approach
Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationGlobal Clustering-Based Performance-Driven Circuit Partitioning
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California at Los Angeles Los Angeles, CA 90095 cong@cs.ucla.edu Chang Wu Aplus Design Technologies, Inc. Los Angeles,
More informationGraph Models for Global Routing: Grid Graph
Graph Models for Global Routing: Grid Graph Each cell is represented by a vertex. Two vertices are joined by an edge if the corresponding cells are adjacent to each other. The occupied cells are represented
More informationImplementing Tile-based Chip Multiprocessors with GALS Clocking Styles
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues
More informationCAD Algorithms. Placement and Floorplanning
CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:
More informationMetal-Density Driven Placement for CMP Variation and Routability
Metal-Density Driven Placement for CMP Variation and Routability ISPD-2008 Tung-Chieh Chen 1, Minsik Cho 2, David Z. Pan 2, and Yao-Wen Chang 1 1 Dept. of EE, National Taiwan University 2 Dept. of ECE,
More informationEN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)
EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationAn Efficient Computation of Statistically Critical Sequential Paths Under Retiming
An Efficient Computation of Statistically Critical Sequential Paths Under Retiming Mongkol Ekpanyapong, Xin Zhao, and Sung Kyu Lim Intel Corporation, Folsom, California, USA Georgia Institute of Technology,
More informationFloorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence
Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,
More informationAn Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid
RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR An Introduction to FPGA Placement Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR
More informationECE260B CSE241A Winter Logic Synthesis
ECE260B CSE241A Winter 2007 Logic Synthesis Website: /courses/ece260b-w07 ECE 260B CSE 241A Static Timing Analysis 1 Slides courtesy of Dr. Cho Moon Introduction Why logic synthesis? Ubiquitous used almost
More informationUCLA 3D research started in 2002 under DARPA with CFDRC
Coping with Vertical Interconnect Bottleneck Jason Cong UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/ cs edu/~cong Outline Lessons learned Research challenges and opportunities
More informationPhysical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006
Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 09: Routing Introduction to Routing Global Routing Detailed Routing 2
More informationMapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for
More informationECE260B CSE241A Winter Placement
ECE260B CSE241A Winter 2005 Placement Website: / courses/ ece260b- w05 ECE260B CSE241A Placement.1 Slides courtesy of Prof. Andrew B. Slides courtesy of Prof. Andrew B. Kahng VLSI Design Flow and Physical
More informationSynthesis at different abstraction levels
Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationTemperature-Aware Routing in 3D ICs
Temperature-Aware Routing in 3D ICs Tianpei Zhang, Yong Zhan and Sachin S. Sapatnekar Department of Electrical and Computer Engineering University of Minnesota 1 Outline Temperature-aware 3D global routing
More informationOpenAccess In 3D IC Physical Design
OpenAccess In 3D IC Physical Design Jason Cong, Jie Wei,, Yan Zhang VLSI CAD Lab Computer Science Department University of California, Los Angeles Supported by DARPA and CFD Research Corp Outline 3D IC
More informationConstraint Driven I/O Planning and Placement for Chip-package Co-design
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los Angeles Rio Design Automation, Inc. Agenda Motivation
More informationProblem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.
Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationReference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering
FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Logic Design Process Combinational logic networks Functionality. Other requirements: Size. Power. Primary inputs Performance.
More informationNiyati Shah Department of ECE University of Toronto
Niyati Shah Department of ECE University of Toronto shahniya@eecg.utoronto.ca Jonathan Rose Department of ECE University of Toronto jayar@eecg.utoronto.ca 1 Involves connecting output pins of logic blocks
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What
More informationBeyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs
Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this
More informationLecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits
Lecture 21: Combinational Circuits Integrated Circuits Combinational Circuits Multiplexer Demultiplexer Decoder Adders ALU Integrated Circuits Circuits use modules that contain multiple gates packaged
More informationFPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1
FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital
More informationAn Exact Algorithm for the Statistical Shortest Path Problem
An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation
More informationEE582 Physical Design Automation of VLSI Circuits and Systems
EE582 Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Preliminaries Table of Contents Semiconductor manufacturing Problems to solve Algorithm complexity
More informationPlace and Route for FPGAs
Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: January 2, 2018 at 11:23 CS429 Slideset 5: 1 Topics of this Slideset
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationTechnology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas
Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist
More informationEvolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic
ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:
More informationPhysical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006
Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 10: Repeater (Buffer) Insertion Introduction to Buffering Buffer Insertion
More informationARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs
ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt
More informationICS 252 Introduction to Computer Design
ICS 252 Introduction to Computer Design Lecture 16 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred (none required) [Mic94] G. De Micheli Synthesis and Optimization
More informationVdd Programmable and Variation Tolerant FPGA Circuits and Architectures
Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance
More informationCSE241 VLSI Digital Circuits UC San Diego
CSE241 VLSI Digital Circuits UC San Diego Winter 2003 Lecture 05: Logic Synthesis Cho Moon Cadence Design Systems January 21, 2003 CSE241 L5 Synthesis.1 Kahng & Cichy, UCSD 2003 Outline Introduction Two-level
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationASIC Physical Design Top-Level Chip Layout
ASIC Physical Design Top-Level Chip Layout References: M. Smith, Application Specific Integrated Circuits, Chap. 16 Cadence Virtuoso User Manual Top-level IC design process Typically done before individual
More informationECE260B CSE241A Winter Routing
ECE260B CSE241A Winter 2005 Routing Website: / courses/ ece260bw05 ECE 260B CSE 241A Routing 1 Slides courtesy of Prof. Andrew B. Kahng Physical Design Flow Input Floorplanning Read Netlist Floorplanning
More informationIntroduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.
Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on
More informationProcessing Rate Optimization by Sequential System Floorplanning
Processing Rate Optimization by Sequential System Floorplanning Jia Wang Ping-Chih Wu Hai Zhou EECS Department Northwestern University Evanston, IL 60208, U.S.A. {jwa112, haizhou}@ece.northwestern.edu
More informationChapter 5 Global Routing
Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of Routing Regions 5.5 The Global Routing Flow 5.6 Single-Net Routing 5.6. Rectilinear
More informationECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms
ECE 7 Complex Digital ASIC Design Topic : Physical Design Automation Algorithms Christopher atten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece7
More informationSUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.
SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA
More informationGraphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis
Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All
More informationComputer Architecture
Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two
More informationMemory, Area and Power Optimization of Digital Circuits
Memory, Area and Power Optimization of Digital Circuits Laxmi Gupta Electronics and Communication Department Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Ankita Bharti Electronics
More informationDesign of a Low Density Parity Check Iterative Decoder
1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationAbacus: Fast Legalization of Standard Cell Circuits with Minimal Movement
EDA Institute for Electronic Design Automation Prof. Ulf Schlichtmann Abacus: Fast Legalization of Standard Cell Circuits with Minimal Movement Peter Spindler, Ulf Schlichtmann and Frank M. Johannes Technische
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationHow Much Logic Should Go in an FPGA Logic Block?
How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca
More informationVdd Programmability to Reduce FPGA Interconnect Power
Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important
More informationAdditional Slides to De Micheli Book
Additional Slides to De Micheli Book Sungho Kang Yonsei University Design Style - Decomposition 08 3$9 0 Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating
More informationA Framework for Systematic Evaluation and Exploration of Design Rules
A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationFast Dual-V dd Buffering Based on Interconnect Prediction and Sampling
Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),
More informationClock Tree Resynthesis for Multi-corner Multi-mode Timing Closure
Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationMulti-level Quadratic Placement for Standard Cell Designs
CS258f Project Report Kenton Sze Kevin Chen 06.10.02 Prof Cong Multi-level Quadratic Placement for Standard Cell Designs Project Description/Objectives: The goal of this project was to provide an algorithm
More informationSequential/Parallel Global Routing Algorithms for VLSI Standard. Cells
Sequential/Parallel Global Routing Algorithms for VLSI Standard Cells A Thesis Presented to The Faculty of Graduate Studies of The University of Guelph by HAO SUN In partial fulfilment of requirements
More informationMemory System Design. Outline
Memory System Design Chapter 16 S. Dandamudi Outline Introduction A simple memory block Memory design with D flip flops Problems with the design Techniques to connect to a bus Using multiplexers Using
More informationHIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1
HIERARCHICAL DESIGN Chapter 13 1 Outline 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationOutline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design
Outline HIERARCHICAL DESIGN 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 1 Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationOn GPU Bus Power Reduction with 3D IC Technologies
On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The
More informationOverview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions
Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,
More informationHardware Design with VHDL PLDs IV ECE 443
Embedded Processor Cores (Hard and Soft) Electronic design can be realized in hardware (logic gates/registers) or software (instructions executed on a microprocessor). The trade-off is determined by how
More information160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp
Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing
More informationFull Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing
Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Umadevi.S #1, Vigneswaran.T #2 # Assistant Professor [Sr], School of Electronics Engineering, VIT University, Vandalur-
More information