Regular Fabrics for Retiming & Pipelining over Global Interconnects
|
|
- Arron Jeffry Newton
- 6 years ago
- Views:
Transcription
1 Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cs.ucla.edu cadlab.cs.ucla.edu/~cong FCRP Interconnect Workshop, June 28, 2002 DUSD(Labs)
2 Overarching GSRC Research Emphasis [Jan Rabaey,, June 2002] A broadened focus on application-oriented embedded systems under tight cost, PDA, and time-to-market constraints Founded on One Basic Principle From Ad-Hoc System-on on-a-chip Design to Disciplined, Platform-Based Design
3 The Discipline of Platform-Based Design Application Programming Model: Models/Estimators Kernels/Benchmarks Architecture(s) Architectural Platform Microarchitecture(s) Cycle-speed, power, area Functional Blocks, Interconnect V S G S V S Circuit Fabric(s) Silicon Implementation Platform Manfacturing Interface V S G S V S V S S S V G Delay, variation, SPICE models Basic device & interconnect structures Silicon Implementation
4 The Discipline of Platform-Based Design Programmable Systems Comp and Comm Based Design Constructive Fabrics Programming Model: Models/Estimators Cycle-speed, power, area Application Architecture(s) Architectural Platform Microarchitecture(s) Circuit Fabric(s) Silicon Implementation Platform Manfacturing Interface Kernels/Benchmarks Functional Blocks, Interconnect Test, Verification, Energy&Power Calibrating Achievable Design Delay, variation, SPICE models Basic device & interconnect structures Silicon Implementation
5 From Architecture to Silicon Implementation Platform Different Different targets employ different intermediate platforms, hence different layers of regularity and design-space space constraints Design Design space may actually be smaller than with large steps! Large-step predictions/abstractions may misguide the optimizations Architecture Logic Regularity Component Regularity and Reuse Regular Fabrics Geometrical Regularity Silicon Implementation Constructive Fabrics Th [Source: Larry Pileggi]
6 Sample Work from the GSRC Fabric Theme Bob Brayton: : Topologically Constrained Logic Synthesis Malgorzata Marek-Sadowska Sadowska: : Interconnecting Regular Fabrics Wojtek Maly: : Geometrical Regularity Herman Schmit: : Regular Communication Fabrics Jason Cong: Regular Fabrics for Retiming and Pipelining over Global Interconnects
7 Motivation: How Far Can We Go in Each Clock Cycle 7 clock NTRS um Tech 6 clock 5 clock 5 G Hz across-chip clock 620 mm 2 (24.9mm x 24.9mm) IPEM BIWS estimations Buffer size: 100x Driver/receiver size: 100x From corner to corner: 7 clock cycles 4 clock 1 clock 2 clock 3 clock (mm)
8 Solutions Fully Fully asynchronous designs GALS GALS (global asynchronous locally synchronous designs) Latency-insensitive designs Synchronous designs, with multi-cycle communications Much better understood Supported by the current tool set More energy efficient?
9 Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!!
10 Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects Multiple clock cycles are needed to cross the chip Proper placement allows retiming to hide global interconnect delays. Placement 1 Placement 2 a b c d a d b c d(v)=1, WL=6, d(e) WL Before retiming, φ = 5.0 After retiming, φ = 3.0 d(v)=1, WL=6, d(e) WL Before retiming, φ = 4.0 Better Initial Placement!! After retiming, φ = 4.0
11 Difficulties How to consider retiming/pipelining over global interconnects Flip-flop boundaries are not fixed during placement, difficult to do static timing analysis Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) How to handle the high complexity of the combined problem Use the multi-level optimization technique
12 Static Timing Analysis (STA) a Sequential circuit example: PI: a, b. PO: g. c d e g b f a a c d e g Suppose d(v)=1, d(e)=2 a b g f c d e AT: Suppose clock cycle φ =11 RT: f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.
13 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) Definition [Pan et al, TCAD98] Given a clock period φ, transfer circuit C into an edge-weighted vertex weighted graph G, Label vertex v as l(v) l ) = the weight of longest path from PIs to v = max{l(u) - φ w(u,v) ) + d(u,v) ) + d(v)}, l(v) ) is also called SAT(v). Theorem: C can be retimed to φ + max{d(v)} iff l(pos) φ Relation to retiming: r(v) ) = l(v) ) / φ - 1 Complexity is O(VE) a b w(a,c)=1 w(b.c)=0 c l(a) = 7 l(b) = 3 d(a) a b d(b) w l (a,c)= d(e (a,c) )-φ w(a,c) d(c) c d(a)=d(b) = 1, d(a,c) = d(b,c)= 2, φ = 5 l(c) = max{ , 3+2+1} = 6 w l (b,c)= d(e (b,c) )-φ w(b,c)
14 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) a a b b c c Sequential circuit f f d Retimed circuit d e g d(v)=1, d(e)=2 Is φ = 4.5 possible? e g a b Retiming graph (not a DAG) 2 d -2.5 c e f Iter# a b c d e f g Cycle time 4.5 is possible because l(g) 4.5 g
15 Continuous Retiming (c-retiming) and Sequential Arrival Time (SAT) (cont d) Sequential circuit a d c e g a Retiming graph (not a DAG) d c e g b f d(v)=1, d(e)=2 Is φ = 2.5 feasible? b f Iter# a b c d e f g Cycle time 2.5 is not feasible because l(g) > 2.5
16 Sequential Timing Analysis (Seq( Seq-TA) With loops, problem is difficult Topological order does not exist! Start with a min l-value for each node and iteratively improve it Convergence is guaranteed in O(n) iterations if the circuit can be retimed to the target cycle time Outline of Seq-TA Binary search the min. feasible clock period Given a clock period φ, check if φ is feasible l(pi) = 0, l(others) = - Relax one vertex at a time and update l-values If a l(po) > φ, φ is not feasible; if relaxation converge, φ is feasible Complexity is O(VE)
17 Multi-Level Optimization Framework Levels Coarsening Problem sizes Uncoarsening & Refinement (optimization) Multi-level coarsening generates smaller problem sizes for top levels faster optimization on top levels May explore different aspects of the solution space at different levels Gradual refinement on good solutions from coarser levels is very efficient Successful in many applications Originally developed for PDE Recent success in VLSICAD: partitioning, placement, routing
18 Challenges Previous Previous Seq-TA can only handle single-output gate In reality multi-output modules exist IP block, MUX, adders Clusters in the multi-level level optimization process How How to integrate Seq-TA into multi-level level coarse placement efficiently
19 Generalize c-retiming c for Complex Combinational Modules l 1 -value labeling for each vertex l 1 (v)=weight of the longest path from PIs to v using d (v) d as uniform gate delay Each vertex has a l 1 -value label. Upper bound of the labeling Reduce the non-uniformed gate delay to uniform gate delay by taking the max. Internal delay as the gate delay d (v) = max { d(v (i, j) ) } v I0 4 v O0 v I v v O1 I2 complex module (combinational logic) with multi-output and non-uniform propagation delay Decompose the complex module by treating each pin of the module as vertex with zero delay. v I0 v I1 d (v)=11 v I2 v I0 v I1 v I v O0 v O1 v O0 v O1 l 2 -value labeling for each output of a vertex l 2 (vo t )=weight of the longest path from PIs to output o t Each output of a vertex has a l 2 -value label. Lower bound of the labeling of v
20 Integrate Seq-TA with a Multi-level level SA-based Coarse Placement In coarsening phase, FFs can only be clustered after a certain level k Level L 0. From level L n to L k+1 perform static timing analysis (where FFs are clusterd) From level L k to L 0 perform Seq-TA (where FFs are not clustered). Level L k Level L n. Initial Placement. Refinement by timing-driven SA-based coarse placement
21 Initial Experimental Result on Impact of Simultaneous Retiming and Placement circuit #gates Grid size WL-driven placement Simultaneous retiming and placement dly dly dly (before retiming) (after retiming) S x Ind x Ind x Ind x Ind x Avg
22 Limitation of Exploring Multi-cycle Interconnect Communication during Logic Synthesis Minimum Minimum clock period can be achieved by logic optimization is bounded by max. delay-to to-register (DR) ratio of the loops in the circuits In a loop, 4 logic cells, 2 registers Cell delay =1ns Interconnect delay=1ns DR ratio = (D logic +D int )/#Registers = (4+4)/2=4ns Clock cycle >= 4ns Require Require consideration of multi-cycle communication during architecture & behavior synthesis
23 Regular Distributed Register Architecture FUC FUC FUC 1 cycle Island Register File 2 cycle. k cycle DIV MUX ADD Cluster with area constraint Global Interconnect Function Unit Cluster (FUC) H i FUC FUC FUC W i D intra island = Dlog ic + Dopt int Dlog ic + Dopt int(2w i + 2Hi ) Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, k cycle interconnect communication in each island Highly regular T
24 Example : Regular Distributed Register Architecture for 70nm Technology NTRS 97 70nm Tech Chip dimension: 620 mm 2 (24.9mm x 24.9mm) 5 G Hz across-chip clock Wire can travel up to 7.52mm within 1 clock cycle under interconnect optimization Need 7 clock cycles to cross the chip Each island base dimension Wi = Hi=2.08mm = critical length (longest length that a wire can run without buffer insertion) estimated by IPEM BIWS estimations assuming buffer size: 2x, driver/receiver size: 2x 1/3 of distance a wire can travel in 1 clock cycle Logic volume: 6.76M min-size 2-NAND gates 12X12 island-base array Local registers are partitioned to 7 banks
25 Example: Impact of Interconnect on Scheduling Data flow graph extracted from discrete cosine transformation (DCT) The delay of * operation is 2ns, the delay of + and operation is 1ns. The resources available are 2 multipliers and 2 ALUs. The nodes with the same color are assigned to the same functional unit * 3 * Mul2 3,7,12 Alu1 1,5,10 Alu2 2,6,9 * 7 * 8-9 * 11 * Mul1 4,8,11 FUC Represents long Interconnect delay. The long interconnect delay is 2ns. Represents short Interconnect delay. Short Interconnect delay is 1ns. Wirelength-driven Placement
26 Single-cycle vs. Multi-cycle Interconnect Communication Represents registers. + 2 Cycle Cycle 1-1 Cycle2 * 3 * 4 Cycle2 * 3 * 4 Cycle Cycle Cycle 4 Cycle5 * 11 * 8 Cycle 4 * 7 * 11 Cycle6 * 7 * 12 Cycle5 * 8 * 12 Cycle Cycle6-10 Cycle8-9 Cycle9 Single-cycle interconnect communication Scheduled in 6 clock cycles Clock period is 4ns Total latency is 24ns Multi-cycle interconnect communication Scheduled in 9 clock cycles Clock period is 2ns Total latency is 18ns
27 Enhancement 1: Simultaneous Placement and Scheduling for Performance Optimization Cycle1 * 3 * 4 Cycle2 Mul2 3,7,12 Alu1 1,5, Cycle3 * 7 * 8 Cycle4 Cycle5 * 11 Cycle6 * 12 Mul1 4,8,11 Alu2 2,6,9-9 Cycle7-10 Cycle8 Simultaneous Placement and Scheduling With placement integrated with scheduling, critical path is reduced. The DFG can be scheduled in 8 clock cycles, with clock period of 2ns. The total latency is 16ns.
28 Enhancement 2: Simultaneous Placement, Scheduling and Binding for Performance Optimization * 3 * 4 Cycle1 Cycle2 Mul2 3,7,11 Alu1 1,5, Cycle3 Cycle4 * 7 * 12 Cycle5 Mul1 4,8,12 Alu2 2,6,9 * 8 * 11 Cycle Cycle7 Simultaneous Placement, Scheduling and Binding With placement integrated with scheduling and binding, the critical path is further reduced. The DFG can be scheduled in 7 clock cycles, with clock period of 2ns. The total latency is 14ns
29 Example: Multicluster Architectures of DEC Alpha Source: The Multicluster Architecture: Reducing Cycle Time Through Partitioning by Keith I. Farkas, et al
30 Conclusions Multi-cycle communication is needed for gigahertz designs Sequential timing analysis + multilevel optimization enables efficient retiming/pipelining over global interconnects Regular Regular distributed register (RDR) fabric provides regularity to support Multicycle communication Integrated resource binding, scheduling, and physical planning
31 From Architecture to Silicon Implementation Platform Different Different targets employ different intermediate platforms, hence different layers of regularity and design-space space constraints Design Design space may actually be smaller than with large steps! Large-step predictions/abstractions may misguide the optimizations Architecture Logic Regularity Component Regularity and Reuse Regular Fabrics Geometrical Regularity Silicon Implementation Constructive Fabrics Th [Source: Larry Pileggi]
Retiming & Pipelining over Global Interconnects
Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Joint work with C. C. Chang,
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies. Outline
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects
More informationPilot: A Platform-based HW/SW Synthesis System
Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The
More informationNANOMETER process technologies allow billions of transistors
550 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 Architecture and Synthesis for On-Chip Multicycle Communication Jason Cong, Fellow, IEEE, Yiping
More informationArchitecture and Synthesis for Multi-Cycle Communication
Architecture and Synthesis for Multi-Cycle Communication Jason Cong, Yiping Fan, Xun Yang, Zhiru Zhang Computer Science Department University of California, Los Angeles Los Angeles CA 90095 USA {cong,
More informationArchitecture-Level Synthesis for Automatic Interconnect Pipelining
Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong UCLA Computer Science Department Los Angeles, CA 90095 http://cadlab.cs.ucla.edu/~ /~cong
More informationInterconnect Delay and Area Estimation for Multiple-Pin Nets
Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation
More informationRetiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.
Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum
More informationThermal-Aware 3D IC Physical Design and Architecture Exploration
Thermal-Aware 3D IC Physical Design and Architecture Exploration Jason Cong & Guojie Luo UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Supported by DARPA Outline Thermal-Aware
More informationSymmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment
Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network
More informationImplementing Tile-based Chip Multiprocessors with GALS Clocking Styles
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationGlobal Clustering-Based Performance-Driven Circuit Partitioning
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California at Los Angeles Los Angeles, CA 90095 cong@cs.ucla.edu Chang Wu Aplus Design Technologies, Inc. Los Angeles,
More informationSynthesis at different abstraction levels
Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationCalibrating Achievable Design GSRC Annual Review June 9, 2002
Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design
More informationAn Efficient Computation of Statistically Critical Sequential Paths Under Retiming
An Efficient Computation of Statistically Critical Sequential Paths Under Retiming Mongkol Ekpanyapong, Xin Zhao, and Sung Kyu Lim Intel Corporation, Folsom, California, USA Georgia Institute of Technology,
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationMapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationCluster-based approach eases clock tree synthesis
Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network
More informationS 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d
Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu
More informationDelay and Power Optimization of Sequential Circuits through DJP Algorithm
Delay and Power Optimization of Sequential Circuits through DJP Algorithm S. Nireekshan Kumar*, J. Grace Jency Gnannamal** Abstract Delay Minimization and Power Minimization are two important objectives
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationEN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)
EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationAn Exact Algorithm for the Statistical Shortest Path Problem
An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation
More informationECE260B CSE241A Winter Logic Synthesis
ECE260B CSE241A Winter 2007 Logic Synthesis Website: /courses/ece260b-w07 ECE 260B CSE 241A Static Timing Analysis 1 Slides courtesy of Dr. Cho Moon Introduction Why logic synthesis? Ubiquitous used almost
More informationAdditional Slides to De Micheli Book
Additional Slides to De Micheli Book Sungho Kang Yonsei University Design Style - Decomposition 08 3$9 0 Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating
More informationPhysical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006
Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 09: Routing Introduction to Routing Global Routing Detailed Routing 2
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationOverview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions
Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,
More informationMetal-Density Driven Placement for CMP Variation and Routability
Metal-Density Driven Placement for CMP Variation and Routability ISPD-2008 Tung-Chieh Chen 1, Minsik Cho 2, David Z. Pan 2, and Yao-Wen Chang 1 1 Dept. of EE, National Taiwan University 2 Dept. of ECE,
More informationHIGH-LEVEL SYNTHESIS
HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms
More informationVdd Programmable and Variation Tolerant FPGA Circuits and Architectures
Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance
More informationLecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits
Lecture 21: Combinational Circuits Integrated Circuits Combinational Circuits Multiplexer Demultiplexer Decoder Adders ALU Integrated Circuits Circuits use modules that contain multiple gates packaged
More informationVdd Programmability to Reduce FPGA Interconnect Power
Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important
More informationLecture 20: High-level Synthesis (1)
Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling
More informationMemory, Area and Power Optimization of Digital Circuits
Memory, Area and Power Optimization of Digital Circuits Laxmi Gupta Electronics and Communication Department Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Ankita Bharti Electronics
More informationArchitectural Synthesis Integrated with Global Placement for Multi-Cycle Communication *
Architectural Synthesis Integrated with Global Placement for Multi-Cycle Communication Jason Cong, Yiping Fan, Guoling Han, Xun Yang, Zhiru Zhang Computer Science Department, University of California,
More informationProcessing Rate Optimization by Sequential System Floorplanning
Processing Rate Optimization by Sequential System Floorplanning Jia Wang Ping-Chih Wu Hai Zhou EECS Department Northwestern University Evanston, IL 60208, U.S.A. {jwa112, haizhou}@ece.northwestern.edu
More informationHIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1
HIERARCHICAL DESIGN Chapter 13 1 Outline 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationOutline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design
Outline HIERARCHICAL DESIGN 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 1 Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationExploring Logic Block Granularity for Regular Fabrics
1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu
More informationCSE241 VLSI Digital Circuits UC San Diego
CSE241 VLSI Digital Circuits UC San Diego Winter 2003 Lecture 05: Logic Synthesis Cho Moon Cadence Design Systems January 21, 2003 CSE241 L5 Synthesis.1 Kahng & Cichy, UCSD 2003 Outline Introduction Two-level
More informationLinking Layout to Logic Synthesis: A Unification-Based Approach
Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationDesign of a Low Density Parity Check Iterative Decoder
1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,
More informationVHDL simulation and synthesis
VHDL simulation and synthesis How we treat VHDL in this course You will not become an expert in VHDL after taking this course The goal is that you should learn how VHDL can be used for simulation and synthesis
More informationGraphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis
Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All
More informationFILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas
FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given
More informationComputer Architecture
Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two
More informationDesign and Synthesis for Test
TDTS 80 Lecture 6 Design and Synthesis for Test Zebo Peng Embedded Systems Laboratory IDA, Linköping University Testing and its Current Practice To meet user s quality requirements. Testing aims at the
More informationTopics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)
Topics Verilog Hardware modeling and simulation Event-driven simulation Basics of register-transfer design: data paths and controllers; ASM charts. High-level synthesis Initially a proprietary language,
More informationFloorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence
Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,
More informationTECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS
TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS Zoltan Baruch E-mail: Zoltan.Baruch@cs.utcluj.ro Octavian Creţ E-mail: Octavian.Cret@cs.utcluj.ro Kalman Pusztai E-mail: Kalman.Pusztai@cs.utcluj.ro Computer
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What
More informationTHE continuous increase of the problem size of IC routing
382 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 3, MARCH 2005 MARS A Multilevel Full-Chip Gridless Routing System Jason Cong, Fellow, IEEE, Jie Fang, Min
More informationSilicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design
Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationFast Dual-V dd Buffering Based on Interconnect Prediction and Sampling
Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),
More informationWorst-Case Performance Prediction Under Supply Voltage and Temperature Noise
Worst-Case Performance Prediction Under Supply Voltage and Temperature Noise Chung-Kuan Cheng, Andrew B. Kahng, Kambiz Samadi and Amirali Shayan June 13, 2010 CSE and ECE Departments University of California,
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationSequential Logic Synthesis
Sequential Logic Synthesis Logic Circuits Design Seminars WS2010/2011, Lecture 9 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology Czech Technical University in Prague
More informationTechnology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas
Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist
More informationAn Interconnect-Centric Design Flow for Nanometer. Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled
More informationSUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.
SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA
More informationProgrammable Logic Devices II
São José February 2015 Prof. Hoeller, Prof. Moecke (http://www.sj.ifsc.edu.br) 1 / 28 Lecture 01: Complexity Management and the Design of Complex Digital Systems Prof. Arliones Hoeller arliones.hoeller@ifsc.edu.br
More informationHow Much Logic Should Go in an FPGA Logic Block?
How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca
More informationSimultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture
Simultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture JASON CONG University of California, Los Angeles YIPING FAN AutoESL Inc. and JUNJUAN
More informationPlanning for Local Net Congestion in Global Routing
Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering
More informationBasic Idea. The routing problem is typically solved using a twostep
Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a
More informationEvolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic
ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:
More informationPrinciples of Digital Techniques PDT (17320) Assignment No State advantages of digital system over analog system.
Assignment No. 1 1. State advantages of digital system over analog system. 2. Convert following numbers a. (138.56) 10 = (?) 2 = (?) 8 = (?) 16 b. (1110011.011) 2 = (?) 10 = (?) 8 = (?) 16 c. (3004.06)
More informationA Framework for Systematic Evaluation and Exploration of Design Rules
A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work
More informationHardware Design with VHDL PLDs IV ECE 443
Embedded Processor Cores (Hard and Soft) Electronic design can be realized in hardware (logic gates/registers) or software (instructions executed on a microprocessor). The trade-off is determined by how
More informationELCT201: DIGITAL LOGIC DESIGN
ELCT201: DIGITAL LOGIC DESIGN Dr. Eng. Haitham Omran, haitham.omran@guc.edu.eg Dr. Eng. Wassim Alexan, wassim.joseph@guc.edu.eg Lecture 3 Following the slides of Dr. Ahmed H. Madian ذو الحجة 1438 ه Winter
More informationL2: Design Representations
CS250 VLSI Systems Design L2: Design Representations John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) Engineering Challenge Application Gap usually too large to bridge in one step,
More informationMinimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007
Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional
More informationECE 4514 Digital Design II. Spring Lecture 20: Timing Analysis and Timed Simulation
ECE 4514 Digital Design II Lecture 20: Timing Analysis and Timed Simulation A Tools/Methods Lecture Topics Static and Dynamic Timing Analysis Static Timing Analysis Delay Model Path Delay False Paths Timing
More informationPhysical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006
Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 08: Interconnect Trees Introduction to Graphs and Trees Minimum Spanning
More informationL14 - Placement and Routing
L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist
More informationFast Timing Closure by Interconnect Criticality Driven Delay Relaxation
Fast Timing Closure by Interconnect Criticality Driven Delay Relaxation Love Singhal and Elaheh Bozorgzadeh Donald Bren School of Information and Computer Sciences University of California, Irvine, California
More informationFPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1
FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationClock Tree Resynthesis for Multi-corner Multi-mode Timing Closure
Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin
More informationGraph Models for Global Routing: Grid Graph
Graph Models for Global Routing: Grid Graph Each cell is represented by a vertex. Two vertices are joined by an edge if the corresponding cells are adjacent to each other. The occupied cells are represented
More informationIntroduction. Sungho Kang. Yonsei University
Introduction Sungho Kang Yonsei University Outline VLSI Design Styles Overview of Optimal Logic Synthesis Model Graph Algorithm and Complexity Asymptotic Complexity Brief Summary of MOS Device Behavior
More informationOpenAccess In 3D IC Physical Design
OpenAccess In 3D IC Physical Design Jason Cong, Jie Wei,, Yan Zhang VLSI CAD Lab Computer Science Department University of California, Los Angeles Supported by DARPA and CFD Research Corp Outline 3D IC
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationHigh-Level Synthesis
High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction
More informationArchitecture Evaluation for
Architecture Evaluation for Power-efficient FPGAs Fei Li*, Deming Chen +, Lei He*, Jason Cong + * EE Department, UCLA + CS Department, UCLA Partially supported by NSF and SRC Outline Introduction Evaluation
More informationNoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods
1 NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods V. Venkatraman, A. Laffely, J. Jang, H. Kukkamalla, Z. Zhu & W. Burleson Interconnect Circuit
More informationMark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs
EE 352 Unit 8 HW Constructs Logic Circuits Combinational logic Perform a specific function (mapping of 2 n input combinations to desired output combinations) No internal state or feedback Given a set of
More informationCMOS VLSI Design. MIPS Processor Example. Outline
COS VLSI Design IPS Processor Example Outline Design Partitioning IPS Processor Example Architecture icroarchitecture Logic Design Circuit Design Physical Design Fabrication, Packaging, Testing Slide 2
More information