High-Level Synthesis (HLS)

Similar documents
Unit 2: High-Level Synthesis

HIGH-LEVEL SYNTHESIS

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

High-Level Synthesis

High Level Synthesis

Lecture 20: High-level Synthesis (1)

Synthesis at different abstraction levels

COE 561 Digital System Design & Synthesis Introduction

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

High-Level Synthesis Creating Custom Circuits from High-Level Code

Verilog for High Performance

VHDL for Synthesis. Course Description. Course Duration. Goals

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

BEHAVIORAL SYNTHESIS: AN OVERVIEW

High Level Synthesis. Shankar Balachandran Assistant Professor, Dept. of CSE IIT Madras

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

System Level Design For Low Power. Yard. Doç. Dr. Berna Örs Yalçın

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

CS294-48: Hardware Design Patterns Berkeley Hardware Pattern Language Version 0.5. Krste Asanovic UC Berkeley Fall 2009

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

EE382M.20: System-on-Chip (SoC) Design

Chapter 6: Folding. Keshab K. Parhi

VHDL simulation and synthesis

Previous Exam Questions System-on-a-Chip (SoC) Design

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

Lecture Compiler Backend

ECE 669 Parallel Computer Architecture

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

A Process Model suitable for defining and programming MpSoCs

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

Digital Design with FPGAs. By Neeraj Kulkarni

Based on slides/material by. Topic Design Methodologies and Tools. Outline. Digital IC Implementation Approaches

EEL 4783: HDL in Digital System Design

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

Introduction to Real-Time Systems ECE 397-1

SPARK: A Parallelizing High-Level Synthesis Framework

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

קורס VHDL for High Performance. VHDL

TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation

An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining

Appendix C: Pipelining: Basic and Intermediate Concepts

Sequential Circuits. inputs Comb FFs. Outputs. Comb CLK. Sequential logic examples. ! Another way to understand setup/hold/propagation time

RTL Coding General Concepts

Writing Circuit Descriptions 8

Folding. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns

SCHEDULING II Giovanni De Micheli Stanford University

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Chapter 4. The Processor

Advanced ALTERA FPGA Design

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining

HDL. Operations and dependencies. FSMs Logic functions HDL. Interconnected logic blocks HDL BEHAVIORAL VIEW LOGIC LEVEL ARCHITECTURAL LEVEL

Scheduling in RTL Design Methodology

Hardware/Software Codesign

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

ECE260B CSE241A Winter Logic Synthesis

System Synthesis of Digital Systems

Lecture 12 VHDL Synthesis

Review Article Formal ESL Synthesis for Control-Intensive Applications

Floating Point/Multicycle Pipelining in DLX

Homework index. Processing resource description. Goals for lecture. Communication resource description. Graph extensions. Problem definition

Physical-Aware High Level Synthesis Congestion resolution for the realization of high-density and low-power

ii Copyright c Harikrishna Samala 2009 All Rights Reserved

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA: What? Why? Marco D. Santambrogio

Luleå University of Technology Kurskod SMD098 Datum Skrivtid

Design and Implementation of 3-D DWT for Video Processing Applications

MOST computations used in applications, such as multimedia

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

Evaluating Inter-cluster Communication in Clustered VLIW Architectures

Computer Architecture

CSE241 VLSI Digital Circuits UC San Diego

Chapter 4. The Processor

Constraint Analysis and Heuristic Scheduling Methods

Processor (I) - datapath & control. Hwansoo Han

Sequential Logic Synthesis

EEL 4783: HDL in Digital System Design

Advanced Design System DSP Synthesis

Time Constrained Modulo Scheduling with Global Resource Sharing

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Architecture and Partitioning - Architecture

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France.

Synthesis Options FPGA and ASIC Technology Comparison - 1

Transcription:

Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis Starts from a register-transfer level (RTL) description; circuit behavior in each clock cycle is fixed. Uses logic synthesis techniques to optimize the design. Generates a netlist. High-level synthesis (also called architectural synthesis) Starts from an abstract behavioral description. Generates an RTL description. Unit 11 2

Hardware Models for High-level Synthesis All HLS systems need to restrict the target hardware. The search space is too large, otherwise. All synthesis systems have their own peculiarities, but most systems generate synchronous hardware and build it with functional units: A functional unit can perform one or more computations, e.g. addition, multiplication, comparison, ALU. Unit 11 3 Registers: they store inputs, intermediate results and outputs; sometimes several registers are taken together to form a register file. Multiplexers: from several inputs, one is passed to the output. Hardware Models Unit 11 4

Hardware Models (cont d) Buses: a connection shared between several hardware elements, such that only one element can write data at a specific time. Three-state (tri-state) drivers control the exclusive writing on the bus. Unit 11 5 Hardware Models (cont d) Parameters defining the hardware model for the synthesis problem: Clocking strategy: e.g. single or multiple phase clocks. Interconnect: e.g. allowing or disallowing buses. Clocking of functional units: allowing or disallowing of multicycle operations chaining pipelined units. Unit 11 6

Example of a HLS Hardware Model Unit 11 7 Hardware Concepts: Data Path + Control Hardware is normally partitioned into two parts: Data path: a network of functional units, registers, multiplexers and buses. The actual computation takes place in the data path. Control: the part of the hardware that takes care of having the data present at the right place at a specific time, of presenting the right instructions to a programmable unit, etc. Often high-level synthesis concentrates on data-path synthesis. The control part is then realized as a finite state machine or in microcode. Unit 11 8

Input Format The algorithm, that is the input for a high-level synthesis system, is often provided in textual form either in a conventional programming language, such as C, or in a hardware description language (HDL), which is more suitable to express the parallelism present in hardware. The description has to be parsed and transformed into an internal representation and thus conventional compiler techniques can be used. Unit 11 9 Internal Representation Most systems use some form of a data-flow graph (DFG). A DFG may or may not contain information on control flow. A data-flow graph is built from Vertices (nodes): representing computation, and Edges: representing precedence relations. Unit 11 10

Token Flow in a DFG A node in a DFG fires when tokens are present at its inputs. The input tokens are consumed and an output token is produced. A token Fire a node Generate a token after firing Unit 11 11 Conditional Data Flow By means of two special nodes: Unit 11 12

Explicit Iterative Data Flow Selector and distributor nodes can be used to describe iterations. Example: while (a > b) a a b; Loops require careful placement of initial tokens on edges Unit 11 13 Implicit Iterative Data Flow Iteration implied by a regular input stream of tokens. Initial tokens act as buffers. Delay elements instead of initial tokens. Unit 11 14

Iterative DFG Example A second-order filter section. Unit 11 15 HLS Subtasks Subtasks in high-level synthesis Scheduling: determine for each operation the time at which it should be performed such that no precedence constraint is violated. Allocation: specify the hardware resources that will be necessary. Assignment: provide a mapping from each operation to a specific functional unit and from each variable to a register. Remarks: Though the subproblems are strongly interrelated, they are often solved separately. Most scheduling problems are NP-complete heuristics are used. Unit 11 16

Example Data-Flow Graph The second-order filter section made acyclic: Unit 11 17 Example Scheduling The schedule and operation assignment with an allocation of one adder and one multiplier: Unit 11 18

Example Data Path The resulting data path after register assignment The specification of a controller completes the design Unit 11 19 Optimization Criteria Typically, speed, area, and power consumption. Often optimization is constrained Optimize area when the minimum speed is given time-constrained synthesis. Optimize speed when a maximum for each resource type is given resource-constrained synthesis. Unit 11 20

Problem Formulation Input consists of a DFG G(V, E) and a library of resource types. There is a fixed mapping from each v V to some r ; the execution delay δ(v) for each operation is therefore known. The problem is time-constrained; the available execution times are in the set A schedule maps each operation to its starting time; for each edge (v i, v j ) E, a schedule should respect: σ(v j ) σ(v i ) + δ(v i ). Given the resource type cost ω(r) and the requirement function N r (σ), the cost of a schedule σ is given by: Unit 11 21 ASAP Scheduling As soon as possible (ASAP) scheduling maps an operation to the earliest possible starting time not violating the precedence constraints. Properties: It is easy to compute by finding the longest paths in a directed acyclic graph. It does not make any attempt to optimize the resource cost. Unit 11 22

Graph for ASAP Scheduling Unit 11 23 Mobility-Based Scheduling Compute both the ASAP and ALAP (as late as possible) schedules σ S and σ L. For each v V, determine the scheduling range [σ S (v), σ L (v)]. σ L (v) - σ S (v) is called the mobility of v. Mobility-based scheduling tries to find the best position within its scheduling range for each operation. A partial schedule assigns a scheduling range to each Finding a schedule can be seen as the generation of a sequence of partial schedules: Unit 11 24

Simple Mobility-Based Scheduling A partial schedule assigns a scheduling range to each Finding a schedule can be seen as the generation of a sequence of partial schedules Unit 11 25 List Scheduling A resource-constrained scheduling method. Start at time zero and increase time until all operations have been scheduled. Consider the precedence constraint. The ready list L t contains all operations that can start their execution at time t or later. If more operations are ready than there are resources available, use some priority function to choose, e.g. the longest-path to the output node critical-path list scheduling. Unit 11 26

List Scheduling Example Unit 11 27 The Assignment Problem Subtasks in assignment: operation-to-fu assignment value grouping value-to-register assignment transfer-to-wire assignment wire to FU-port assignment In general: task-to-agent assignment Unit 11 28

Compatibility and Conflict Graphs Clique partitioning gives an assignment in a compatibility graph. Graph coloring gives an assignment in the complementary conflict graph. Unit 11 29 The Assignment Problem Assumption: assignment follows scheduling. The claim of a task on an agent is an interval minimum resource utilization can be found by left-edge algorithm. In case of iterative algorithm, interval graph becomes circular-arc graph optimization is NP-complete. Unit 11 30

Tseng and Sieworek s Algorithm Unit 11 31 Clique-Partitioning Example Unit 11 32

High-Level Transformations Restructuring data and control flow graphs prior to the actual mapping onto hardware. Examples: Replacing division by 2 by shift. Loop unrolling. Replacing chain of adders by a tree. Unit 11 33 Example High-Level Optimization Apply the distributivity law to reduce resource requirement. Unit 11 34

High-Level Optimization: Behavior Retiming By moving registers through logic and hierarchical boundaries, behavior retiming reduces the clock period with the minimum area impact. Unit 11 35