HDL. Operations and dependencies. FSMs Logic functions HDL. Interconnected logic blocks HDL BEHAVIORAL VIEW LOGIC LEVEL ARCHITECTURAL LEVEL

Similar documents
Architectural-Level Synthesis. Giovanni De Micheli Integrated Systems Centre EPF Lausanne

High Level Synthesis. Shankar Balachandran Assistant Professor, Dept. of CSE IIT Madras

Unit 2: High-Level Synthesis

ICS 252 Introduction to Computer Design

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

COE 561 Digital System Design & Synthesis Introduction

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

High-Level Synthesis Creating Custom Circuits from High-Level Code

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Writing Circuit Descriptions 8

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

SPARK: A Parallelizing High-Level Synthesis Framework

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi

EE382M.20: System-on-Chip (SoC) Design

Synthesis at different abstraction levels

Administrivia. What is Synthesis? What is architectural synthesis? Approximately 20 1-hour lectures An assessed mini project.

Compiler Code Generation COMP360

Advanced Design System 1.5. DSP Synthesis

High Level Synthesis

EEL 4783: HDL in Digital System Design

High-Level Synthesis

RTL Power Estimation and Optimization

A Simple Syntax-Directed Translator

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

High-Level Synthesis (HLS)

Advanced Design System DSP Synthesis

Synthesis and Optimization of Digital Circuits

Compiler Optimization

Lecture 20: High-level Synthesis (1)

VHDL for Synthesis. Course Description. Course Duration. Goals

Introduction to Compilers

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

System Level Design For Low Power. Yard. Doç. Dr. Berna Örs Yalçın

EEL 4783: HDL in Digital System Design

COP5621 Exam 4 - Spring 2005

SCHEDULING II Giovanni De Micheli Stanford University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Digital Design with FPGAs. By Neeraj Kulkarni

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

EN2911X: Reconfigurable Computing Topic 03: Reconfigurable Computing Design Methodologies

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

Linking Layout to Logic Synthesis: A Unification-Based Approach

Lecture 2 Hardware Description Language (HDL): VHSIC HDL (VHDL)

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

LECTURE 3. Compiler Phases

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

CMSC 611: Advanced Computer Architecture

HIGH-LEVEL SYNTHESIS

MIDTERM EXAM (Solutions)

Lecture Notes on Loop Optimizations

Nikhil Gupta. FPGA Challenge Takneek 2012

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Programming Languages, Summary CSC419; Odelia Schwartz

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Compiler, Assembler, and Linker

Analysis of Different Multiplication Algorithms & FPGA Implementation

Sunburst Design - Verilog-2001 Design & Best Coding Practices by Recognized Verilog & SystemVerilog Guru, Cliff Cummings of Sunburst Design, Inc.

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Introduction to Field Programmable Gate Arrays

Languages and Compiler Design II IR Code Optimization

CAD for VLSI Design - I. Lecture 21 V. Kamakoti and Shankar Balachandran

Control Flow Analysis

CST-402(T): Language Processors

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

Principles of Compiler Construction ( )

Verilog for High Performance

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Binary Adders. Ripple-Carry Adder

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Software-Level Power Estimation

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

81920**slide. 1Developing the Accelerator Using HLS

Announcements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project

Design of Embedded DSP Processors

Principles of Compiler Construction ( )

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Code Optimization. Code Optimization

Register Transfer Level in Verilog: Part I

ECE 3060 VLSI and Advanced Digital Design

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

V1 - VHDL Language. FPGA Programming with VHDL and Simulation (through the training Xilinx, Lattice or Actel FPGA are targeted) Objectives

An introduction to CoCentric

MARKET demands urge embedded systems to incorporate

UNIT IV INTERMEDIATE CODE GENERATION

4. An interpreter is a program that

Compilers and Interpreters

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

AC59/AT59 OPERATING SYSTEMS & SYSTEMS SOFTWARE DECEMBER 2012

Module 4c: Pipelining

1/28/2013. Synthesis. The Y-diagram Revisited. Structural Behavioral. More abstract designs Physical. CAD for VLSI 2

Transcription:

ARCHITECTURAL-LEVEL SYNTHESIS Motivation. Outline cgiovanni De Micheli Stanford University Compiling language models into abstract models. Behavioral-level optimization and program-level transformations. Architectural synthesis: an overview. Synthesis Transform behavioral into structural view. Synthesis and optimization Architectural-level synthesis: LANGUAGE MODELS ABSTRACT MODELS { Architectural abstraction level. { Determine macroscopic structure. { Example: major building blocks. Logic-level synthesis: { Logic abstraction level. BEHAVIORAL VIEW STRUCTURAL VIEW HDL HDL HDL compilation compilation translation Operations and dependencies (Data flow & sequencing graphs) architectural synthesis & optimization FSMs Logic functions (State diagrams & logic networks) logic synthesis & optimization Interconnected logic blocks (Logic networks) LOGIC LEVEL ARCHITECTURAL LEVEL { Determine microscopic structure. { Example: logic gate interconnection.

Example dieq f read (x y u dx a) repeat f xl = x dx ul = u ; (3 x u dx) ; (3 y dx) yl = y u dx c = x < a x = xl u = ul y = yl g until ( c ) write (y) g Example of structures ALU ALU ALU STEERING & MEMORY STEERING & MEMORY CONTROL UNIT CONTROL UNIT Example Architectural-level synthesis motivation Area 15 13 12 10 8 7 5 (2,2) (2,1) (1,2) (1,1) Raise input abstraction level. { Reduce specication of details. { Extend designer base. { Self-documenting design specications. { Ease modications and extensions. Latency Reduce design time. 1 2 3 4 5 6 7 8 Explore and optimize macroscopic structure: { Series/parallel execution of operations.

Architectural-level synthesis Compilation and behavioral optimization Translate HDL models into sequencing graphs. Behavioral-level optimization: { Optimize abstract models independently from the implementation parameters. Software compilation: { Compile program into intermediate form. { Optimize intermediate form. { Generate target code for an architecture. Architectural synthesis and optimization: { Create macroscopic structure: data-path and control-unit. { Consider area and delay information of the implementation. Hardware compilation: { Compile HDL model into sequencing graph. { Optimize sequencing graph. { Generate gate-level interconnection for a cell library. Compilation Hardware and software compilation. Front-end: front end intermediate form back end { Lexical and syntax analysis. lex parse optimization codegen { Parse-tree generation. (a) { Macro-expansion. front end intermediate form back end { Expansion of meta-variables. lex parse behavioral optimization a synthesis l synthesis l binding Semantic analysis: (b) { Data-ow and control-ow analysis. { Type checking. { Resolve arithmetic and relational operators.

Parse tree example a = p q r Behavioral-level optimization Semantic-preserving transformations aiming at simplifying the model. assignment = identifier a identifier p expression expression Applied to parse-trees or during their generation. identifier q identifier r Taxonomy: { Data-ow based transformations. { Control-ow based transformations. Data-ow based transformations Tree-height reduction Tree-height reduction. Applied to arithmetic expressions. Constant and variable propagation. Common subexpression elimination. Goal: { Split into two-operand expressions to exploit hardware parallelism at best. Dead-code elimination. Operator-strength reduction. Code motion. Techniques: { Balance the expression tree. { Exploit commutativity, associativity and distributivity.

Example of tree-height reduction using commutativity and associativity Example of tree-height reduction using distributivity a b c d a d b c a b c d e a b c d a e (a) (b) (a) (b) x = a b c d ) x = (a d) b c x = a (b c d e) ) x = a b c d a e Examples of propagation Constant propagation: { a = 0 b = a 1 c = 2 b { a = 0 b = 1 c = 2 Variable propagation: { a = x b = a 1 c = 2 a { a = x b = x 1 c = 2 x Sub-expression elimination Logic expressions: { Performed by logic optimization. { Kernel-based methods. Arithmetic expressions: { Search isomorphic patterns in the parse trees. { Example: a = x y b = a 1 c = x y a = x y b = a 1 c = a

Examples of other transformations Dead-code elimination: { a = x b = x 1 c = 2 x { a = x can be removed if not referenced. Operator-strength reduction: { a = x 2 b = 3 x { a = x x t = x << 1 b = x t Control-ow based transformations Model expansion. Conditional expansion. Loop expansion. Block-level transformations. Code motion: { for (i = 1 i a b)f g { t = a b for (i = 1 i t)f g Model expansion Conditional expansion Expand subroutine { atten hierarchy. Useful to expand scope of other optimization techniques. Problematic when routine is called more than once. Example: { x = a b y = a b z = foo(x y) { foo(p q)ft = q ; p return(t) g { By expanding foo: { x = a b y = a b z = y ; x Transform conditional into parallel execution with test at the end. Useful when test depends on late signals. May preclude hardware sharing. Always useful for logic expressions. Example: { y = ab if (a) fx = b d g else fx = bd g { can be expanded to: x = a(b d) a 0 bd { and simplied as: y = ab x = y d(a b)

Loop expansion Applicable to loops with data-independent exit conditions. Useful to expand scope of other optimization techniques. Problematic when loop has many iterations. Architectural synthesis and optimization Synthesize macroscopic structure in terms of building-blocks. Explore area/performance trade-o: { maximum performance implementations subject to area constraints. Example: { x = 0 for (i = 1 i 3 i ) fx = x 1 g { minimum area implementations subject to performance constraints. Determine an optimal implementation. Expanded to: { x = 0 x = x 1 x = x 2 x = x 3 Create logic model for data-path and control. Design space and objectives Design evaluation space Design space: { Set of all feasible implementations. Area Area Cycle time Implementation parameters: { Area. Area Area Max { Performance: Cycle-time. Latency. Latency Cycle time Latency Throughput (for pipelined implementations). { Power consumption Latency Max Latency

Circuit behavior: Hardware modeling Functional resources: Resources { Perform operations on data. { Sequencing graphs. { Example: arithmetic and logic blocks. Building blocks: { Resources. Constraints: { Timing and resource usage. Memory resources: { Store data. { Example: memory and registers. Interface resources: { Example: busses and ports. Functional resources Resources and circuit families Standard resources: { Existing macro-cells. { Well characterized (area/delay). { Example: adders, multipliers,... Resource-dominated circuits. { Area and performance depend on few, well-characterized blocks. { Example: DSP circuits. Application-specic resources: { Circuits for specic tasks. { Yet to be synthesized. { Example: instruction decoder. Non resource-dominated circuits. { Area and performance are strongly inuenced by sparse logic, control and wiring. { Example: some ASIC circuits.

Implementation constraints Timing constraints: { Cycle-time. { Latency of a set of operations. { Time spacing between operation pairs. Resource constraints: { Resource usage (or allocation). { Partial binding. Synthesis in the temporal domain Scheduling: { Associate a start-time with each operation. { Determine latency and parallelism of the implementation. Scheduled sequencing graph: { Sequencing graph with start-time annotation. Example Synthesis in the spatial domain Binding: 0 NOP { Associate a resource with each operation with the same type. TIME 1 1 2 6 8 10 { Determine area of the implementation. TIME 2 3 7 9 < 11 Sharing: TIME 3 4 { Bind a resource to more than one operation. TIME 4 5 { Operations must not execute concurrently. NOP n Bound sequencing graph: { Sequencing graph with resource annotation.

Example Binding specication 0 NOP (1,1) (1,2) (1,3) (1,4) 1 2 6 TIME 1 8 (2,2) 10 Mapping from the vertex set to the set of resource instances, for each given type. TIME 2 TIME 3 (2,1) 3 4 7 9 < 11 Partial binding: { Partialmapping, given as design constraint. TIME 4 5 Compatible binding: n NOP { Binding satisfying the constraints of the partial binding. Example Estimation Resource-dominated circuits. 0 NOP { Area = sum of the area of the resources bound to the operations. TIME 1 TIME 2 1 2 3 7 6 8 < 10 11 Determined by binding. { Latency = start time of the sink operation (minus start time of the source operation). TIME 3 4 9 Determined by scheduling TIME 4 5 NOP n Non resource-dominated circuits. { Area also aected by: registers, steering logic, wiring and control. { Cycle-time also aected by: steering logic, wiring and (possibly) control.

Approaches to architectural optimization Approaches to architectural optimization Multiple-criteria optimization problem: { area, latency, cycle-time. Area/latency trade-o, { for some values of the cycle-time. Determine Pareto optimal points: { Implementations such that no other has all parameters with inferior values. Draw trade-o curves: { discontinuous and highly nonlinear. Cycle-time/latency trade-o, { for some binding (area). Area/cycle-time trade-o, { for some schedules (latency). Area/latency trade-o Rationale: Area-latency trade-o { Cycle-time dictated by system constraints. Area 20 (2,2) (2,1) Resource-dominated circuits: 18 17 15 (3,2) (3,1) (1,2) (1,1) { Area is determined by resource usage. 13 12 10 8 7 (2,1) Cycle time 40 Approaches: 5 1 2 3 4 5 6 7 8 30 Latency { Schedule for minimum latency under resource constraints { Schedule for minimum resource usage under latency constraints for varying constraints.

Summary Behavioral optimization: { Create abstract models from HDL models. { Optimize models without considering implementation parameters. Architectural synthesis and optimization. { Consider resource parameters. { Multiple-criteria optimization problem: area, latency, cycle-time.