HIGH-LEVEL SYNTHESIS

Similar documents
High-Level Synthesis (HLS)

Unit 2: High-Level Synthesis

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

High-Level Synthesis

COE 561 Digital System Design & Synthesis Introduction

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

High Level Synthesis

Synthesis at different abstraction levels

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

Lecture 20: High-level Synthesis (1)

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III

CPE 426/526 Chapter 12 - Synthesis Algorithms for Design Automation (Supplement) Dr. Rhonda Kay Gaede UAH

High-Level Synthesis Creating Custom Circuits from High-Level Code

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Digital Design with FPGAs. By Neeraj Kulkarni

System Level Design For Low Power. Yard. Doç. Dr. Berna Örs Yalçın

EE382M.20: System-on-Chip (SoC) Design

Lecture Compiler Backend

Based on slides/material by. Topic Design Methodologies and Tools. Outline. Digital IC Implementation Approaches

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

ECE 669 Parallel Computer Architecture

Computer Architecture

CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns

VHDL simulation and synthesis

Chapter 4. The Processor

Writing Circuit Descriptions 8

: : (91-44) (Office) (91-44) (Residence)

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining

Verilog for High Performance

FPGA: What? Why? Marco D. Santambrogio

CHAPTER - 2 : DESIGN OF ARITHMETIC CIRCUITS

Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol.

Chapter 4. The Processor

The Need of Datapath or Register Transfer Logic. Number 1 Number 2 Number 3 Number 4. Numbers from 1 to million. Register

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Chapter 4. The Processor

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

EE 3170 Microcontroller Applications

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

BEHAVIORAL SYNTHESIS: AN OVERVIEW

Previous Exam Questions System-on-a-Chip (SoC) Design

Sequential Circuits. inputs Comb FFs. Outputs. Comb CLK. Sequential logic examples. ! Another way to understand setup/hold/propagation time

CS294-48: Hardware Design Patterns Berkeley Hardware Pattern Language Version 0.5. Krste Asanovic UC Berkeley Fall 2009

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering India Institute of Technology, Kharagpur.

Chapter 9. Design for Testability

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

Luleå University of Technology Kurskod SMD098 Datum Skrivtid

RTL Power Estimation and Optimization

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Spring 2010 May 10, 2010

Unit 5F: Layout Compaction

Unit 3: Layout Compaction

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Controller FSM Design Issues : Correctness and Efficiency. Lecture Notes # 10. CAD Based Logic Design ECE 368

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

L2: Design Representations

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Chapter 2 Logic Gates and Introduction to Computer Architecture

High Level Synthesis. Shankar Balachandran Assistant Professor, Dept. of CSE IIT Madras

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999

EKT 422/4 COMPUTER ARCHITECTURE. MINI PROJECT : Design of an Arithmetic Logic Unit

CAD of VLSI Systems. d Virendra Singh Indian Institute of Science Bangalore E0-285: CAD of VLSI Systems

Introduction to CMOS VLSI Design (E158) Lecture 7: Synthesis and Floorplanning

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Lecture #1: Introduction

Lecture 12 VHDL Synthesis

Design Methodologies and Tools. Full-Custom Design

Control and Datapath 8

Additional Slides to De Micheli Book

Announcements. This week: no lab, no quiz, just midterm

System Level Design Flow

Lab 16: Data Busses, Tri-State Outputs and Memory

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

Philadelphia University Department of Computer Science. By Dareen Hamoudeh

Field Programmable Gate Array (FPGA)

VLSI Test Technology and Reliability (ET4076)

Folding. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Constraint Analysis and Heuristic Scheduling Methods

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

REGISTER TRANSFER LANGUAGE

Computer Organization

VHDL for Synthesis. Course Description. Course Duration. Goals

Levels in Processor Design

Dec Hex Bin ORG ; ZERO. Introduction To Computing

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

SoC Design for the New Millennium Daniel D. Gajski

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

Transcription:

HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms Processors Register transfers ALU s, RAM, etc. Logic Gates, flip-flops, etc. Transfer Transistors functions PHYSICAL D. Transistor layout Cell layout Module layout Floor plans Physical partitions High-level synthesis visualized on Gajski s Y- chart

HIGH-LEVEL SYNTHESIS Page 2 HARDWARE MODELS FOR HIGH-LEVEL SYNTHESIS All HLS systems need to restrict the target hardware. The search space is too large, otherwise. All synthesis systems have their own peculiarities; but most systems generate synchronous hardware and build it with the following parts: * functional units: they can perform one or more computations, e.g. addition, multiplication, comparison, ALU. i 1 i 2 f o f (i 1, i 2 )

HIGH-LEVEL SYNTHESIS Page 3 HARDWARE MODELS (Continued) * registers: they store inputs, intermediate results and outputs; sometimes several registers are taken together to form a register file. * multiplexers: from several inputs, one is passed to the output. i 0 i 1 i 2 i 3 c 1 c 0 o i k, k 2c 1 c 0

HIGH-LEVEL SYNTHESIS Page 4 HARDWARE MODELS (Continued) * busses: a connection shared between several hardware elements, such that only one element can write data at a specific time. * three-state (tri-state) drivers control the exclusive writing on the bus. i enable o

HIGH-LEVEL SYNTHESIS Page 5 HARDWARE MODELS (Continued) Parameters defining the hardware model for the synthesis problem: * clocking strategy: e.g. one- or two-phase clocks. * interconnect: e.g. allowing or disallowing buses. * clocking of functional units: allowing or disallowing of: + multicycle operations; + chaninig; + pipelined units.

HIGH-LEVEL SYNTHESIS Page 6 HARDWARE CONCEPTS: DATA PATH + CONTROL Hardware is normally partitioned into two parts: * the data path: a network of functional units, registers, multiplexers and buses. The actual computation takes place in the data path. * control: the part of the hardware that takes care of having the data present at the right place at a specific time, of presenting the right instructions to a programmable unit, etc. Often high-level synthesis concentrates on datapath synthesis. The control part is then realized as a finite state machine or in microcode.

HIGH-LEVEL SYNTHESIS Page 7 INPUT FORMAT The algorithm, that is the input for a high-level synthesis system, is often provided in textual form either: * in a conventional programming language, such as Pascal, or * in a hardware description language, which is more suitable to express the parallelism present in hardware. The description has to be parsed and transformed into an internal representation; here conventional compiler techniques can be used.

HIGH-LEVEL SYNTHESIS Page 8 INTERNAL REPRESENTATION Most systems use some form of a data-flow graph. A DFG may or may not contain information on control flow. A data-flow graph is built from: * vertices (nodes): representing computation, and * edges: representing precedence relations.

HIGH-LEVEL SYNTHESIS Page 9 DATA FLOW x := a * b; y := c + d; z := x + y; a b c d x y z

HIGH-LEVEL SYNTHESIS Page 10 TOKEN FLOW IN A DFG * A node in a DFG fires when tokens are present at its inputs. * The input tokens are consumed and an output token is produced. a b c d a b c d x y x y z a b c d z a b c d x y x y z z

HIGH-LEVEL SYNTHESIS Page 11 CONDITIONAL DATA FLOW By means of two special nodes: * selector T selector F * distributor distributor T F

HIGH-LEVEL SYNTHESIS Page 12 EXPLICIT ITERATIVE DATA FLOW Selector and distributor nodes can be used to describe iteration. a b T sel. F T distr. F a while a > b do a := a b;

HIGH-LEVEL SYNTHESIS Page 13 Loops require careful placement of initial tokens on edges.

HIGH-LEVEL SYNTHESIS Page 14 IMPLICIT ITERATIVE DATA FLOW * Iteration implied by stream of input tokens arriving at regular points in time. * Initial tokens act as buffers. a b a b a b a[0] a[1] a[0] b[1] a[1] c[1] c c c * Delay elements instead of initial tokens. a b T 0 c

HIGH-LEVEL SYNTHESIS Page 15 ITERATIVE DFG EXAMPLE + m 1 d 1 m 2 c 8 o 1 T 0 + c 2 c 4 c 6 + c 7 m 3 m 4 T 0 d 2 i 1 + c 1 c 3 c 5 A second-order filter section.

HIGH-LEVEL SYNTHESIS Page 16 HIGH-LEVEL TRANSFORMATIONS Restructuring data and control flow graphs prior to the actual mapping onto hardware. Examples: * Replacing division by 2 by shift. * Loop unrolling. * Replacing chain of adders by a tree. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y + + + + + + + + x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 + + + + + + + y

HIGH-LEVEL SYNTHESIS Page 17 TERMINOLOGY Subtasks in high-level synthesis: * Scheduling: determine for each operation the time at which it should be performed such that no precedence constraint is violated. For an edge (v i, v j ): t j t i i * Allocation: specify the hardware resources that will be necessary. * Assignment: provide a mapping from each operation to a specific functional unit and from each variable to a register. Remarks: * The subproblems are strongly interrelated; they are, however, often solved separately. * Scheduling (except for a few versions) is NPcomplete heuristics have to be used.

HIGH-LEVEL SYNTHESIS Page 18 EXAMPLE The second-order filter section made acyclic: + o 2 (d 1 ) c 8 o 1 m 1 m 2 + c 2 c 4 i 2 (d 1 ) c 6 + c 7 m 3 o 3 (d 2 ) m 4 i 1 + c 1 c 3 i 3 (d 2 ) c 5 The schedule and operation assignment with an allocation of one adder and one multiplier: c 3 c 4 c 5 c 6 + c 1 c 2 c 7 c 8 0 5 10

HIGH-LEVEL SYNTHESIS Page 19 EXAMPLE (Continued) The resulting data path after register assignment: ROM d 1 i d 1 r 1 r 2 r 3 r 4 2 + o 1

HIGH-LEVEL SYNTHESIS Page 20 OPTIMIZATION CRITERIA Not surprisingly, the objective function to be optimized is similar to the one for the design of VLSI circuits in general. It is a combination of: * speed: how long does it take to perform the intended computation, * area: the addition of the areas of all functional units and registers, sometimes taking the area of interconnection into account. Often optimization is constrained: * Optimize area when the minimum speed is given time-constrained synthesis. * Optimize speed when maximum area is given area-constrained synthesis. * Optimize speed when a maximum for each resource type is given resource-constrained synthesis.

HIGH-LEVEL SYNTHESIS Page 21 SCHEDULING METHODS * As soon as possible (ASAP): + an operation can be scheduled when all its predecessors have been scheduled. + very simple, as data-flow graph need only to be traversed from inputs(s) to output(s). * As late as possible (ALAP): + similar to ASAP, but scheduling is now performed from output(s) to input(s). * List scheduling: + similar to ASAP; however, now an extra criterion is used for choosing between all operations that have all their predecessors scheduled.

HIGH-LEVEL SYNTHESIS Page 22 SCHEDULING METHODS (Continued) * Critical-path list scheduling: + sorting criterion is the length of the longest path from operation to output. + gives good results in practice! * Freedom-based scheduling: + compute both ASAP and ALAP schedules (longest-path computations from inputs to operation and from operation to outputs). + the difference in scheduling position gives the freedom or mobility of an operation (operations in the critical path have mobility zero). + take advantage of mobility to find a good position within scheduling range.

HIGH-LEVEL SYNTHESIS Page 23 FORCE-DIRECTED SCHEDULING * extension of freedom-based methods. * computes so-called distribution graph to model the global effects on hardware resources. a 1 + a 2 + a 1 a 2 + a 3 a 3 * a partial schedule ^, is a schedule where some transfers have a fixed time positions and others have a scheduling range. * the resource utilization for resource r of a partial schedule ^ at time t is given by the distribution function ^r(^, t).

HIGH-LEVEL SYNTHESIS Page 24 FORCE-DIRECTED SCHEDULING (Ctd.) * When fixing an operation s time, one makes a transition from partial schedule ^ to partial schedule ^. * For every operation a force F r (^, ^), is computed for every possible position within an operation s range. * Basic force function: F r (^, ^) trange(op) ^ r(^, t)(^ r(^, t) ^r(^, t)) * The lowest force determines which operation is scheduled and at which position.

HIGH-LEVEL SYNTHESIS Page 25 THE ASSIGNMENT PROBLEM Subtasks in assignment: * operation-to-fu assignment * value grouping * value-to-register assignment * transfer-to-wire assignment * wire to FU-port assignment In general: task-to-agent assignment Assumption: assignment follows scheduling. * Then the claim of a task on an agent is an interval minimum resource utilization can be found by left-edge algorithm. * In case of iterative algorithm, interval graph becomes circular-arc graph optimization is NP-complete.

HIGH-LEVEL SYNTHESIS Page 26 ASSIGNMENT BY CLIQUE PARTITIONING * Tasks are compatible when they can be executed on the same agent simultaneously compatiblity graph. * A clique is a maximal complete subgraph. * Clique partitioning divides the vertices of a graph into complete subgraphs. v 1 v 1 v 0 v 4 v 5 v 0 v 4 v 5 C 3 C 1 C 4 v 2 v 7 C 0 v 7 v 2 v 3 v 6 v3 C 2 v 6 (a) (b) A graph and its cliques

HIGH-LEVEL SYNTHESIS Page 27 TSENG AND SIEWOREK S CLIQUE-PARTITIONING ALGORITHM k := 0; G k c (V c k ;Ek c ) := G c(v c ;E c ); while Ec k 6= ; do begin end \nd v i ;v j with largest set of common neighbors"; N := \set of common neighbors of v i and v j "; s := i [ j; V k+1 c := V k c [fv sgnfv i ;v j g; Ec k+1 := ;; for each (v m ;v n ) 2 Ec k do if v m 6= v i ^ v m 6= v j ^ v n 6= v i ^ v n 6= v j then Ec k+1 := Ec k+1 [f(v m ;v n )g; for each v n 2 N do Ec k+1 := Ec k+1 [f(v n ;v s )g; k := k +1

HIGH-LEVEL SYNTHESIS Page 28 CLIQUE-PARTITIONING EXAMPLE v 1 v 0,1 v 0 v 4 v 5 v 4 v 5 v 2 v 7 v 2 v 7 v 3 v 6 v 3 v 6 (a) (b) v 0,1,2 v 0,1,2 v 4 v 5 v 4 v 5,6 v 7 v 7 v 3 v 6 v 3 (c) (d) v 0,1,2,3 v 0,1,2,3 v 4 v 5,6 v 4 v 5,6,7 v 7 (e) (f)

HIGH-LEVEL SYNTHESIS Page 29 HIGH-LEVEL TRANSFORMATIONS Restructuring data and control flow graphs prior to the actual mapping onto hardware. Examples: * Replacing division by 2 by shift. * Loop unrolling. * Replacing chain of adders by a tree. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 + + + + + + + + y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 + + + + + + + y