Hardware/ Software Partitioning

Similar documents
HW SW Partitioning. Reading. Hardware/software partitioning. Hardware/Software Codesign. CS4272: HW SW Codesign

Standard Optimization Techniques

Mapping of Applications to Multi-Processor Systems

Standard Optimization Techniques

Evaluation and Validation

Evaluation and Validation

Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. Graphics: Alexandra Nolte, Gesine

Optimizations - Compilation for Embedded Processors -

Mapping of Applications to Multi-Processor Systems

Specifications and Modeling

Optimizations - Compilation for Embedded Processors -

FSMs & message passing: SDL

Hardware-Software Codesign

SDL. Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 年 10 月 18 日. technische universität dortmund

Multi-valued logic and standard IEEE 1164

technische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10

Discrete Event Models

Optimizations - Compilation for Embedded Processors -

An Algorithm for Hardware/Software Partitioning. Using Mixed Integer Linear Programming

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany

WCET-Aware C Compiler: WCC

Discrete Event Models

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

Imperative model of computation

System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)

Co-synthesis and Accelerator based Embedded System Design

Hardware-Software Codesign. 1. Introduction

Comparison of models. Peter Marwedel Informatik 12, TU Dortmund, Germany 2010/11/07. technische universität dortmund

Spanning and weighted spanning trees. A different kind of optimization. (graph theory is cool.)

Optimizations - Compilation for Embedded Processors -

Embedded & Real-time Operating Systems Communication Libraries

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

EE382V: System-on-a-Chip (SoC) Design

MPSoC Design Space Exploration Framework

LEGO mindstorm robots

Hardware/Software Co-Design/Co-Verification

Computational Intelligence Winter Term 2017/18

Discrete Event Models

fakultät für informatik informatik 12 technische universität dortmund Modeling levels Peter Marwedel TU Dortmund, Informatik /11/07

Memory-Intensive Computations. Optimized On-Chip Pipelining of. on the CELL BE. Christoph Kessler Jörg Keller

Imperative model of computation

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999

Hardware/Software Codesign

fakultät für informatik informatik 12 technische universität dortmund Specifications Peter Marwedel TU Dortmund, Informatik /11/15

Karthik Narayanan, Santosh Madiraju EEL Embedded Systems Seminar 1/41 1

Hardware/Software Codesign

Hardware-Software Codesign. 1. Introduction

Hardware/Software Partitioning of Digital Systems

Additional compiler optimizations

Hardware/Software Partitioning and Minimizing Memory Interface Traffic

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Lecture 7: Introduction to Co-synthesis Algorithms

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

VLSI System Design Part II : Logic Synthesis (1) Oct Feb.2007

Partitioning for Better Synthesis Results

Specifications Part 3

Built-in Chaining: Introducing Complex Components into Architectural Synthesis

Design Issues in Hardware/Software Co-Design

Hardware Software Codesign of Embedded Systems

II (Sorting and) Order Statistics

Early design phases. Peter Marwedel TU Dortmund, Informatik /10/11. technische universität dortmund. fakultät für informatik informatik 12

Specifications and Modeling

Communication Protocols

Hardware Design and Simulation for Verification

TODAY, new applications, e.g., multimedia or advanced

Models of computation

Partitioning Methods. Outline

High-Level Synthesis

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS

Specifications Part 1

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

1.3. Conditional expressions To express case distinctions like

Design Space Exploration for Hardware/Software Codesign of Multiprocessor Systems

Scalable Performance Scheduling for Hardware-Software Cosynthesis

Hardware Software Codesign of Embedded System

Hardware/Software Co-Design Using Bayesian Belief Networks

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Anand Raghunathan

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

Foundations of Computing

Hardware/Software Co-design

Fast algorithm for generating ascending compositions

Paths, Flowers and Vertex Cover

Information Systems (Informationssysteme)

Flight Computer: Managing the Complexity

Sparse Hypercube 3-Spanners

Unit 7 Number System and Bases. 7.1 Number System. 7.2 Binary Numbers. 7.3 Adding and Subtracting Binary Numbers. 7.4 Multiplying Binary Numbers

Informed search. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016

Software Timing Analysis Using HW/SW Cosimulation and Instruction Set Simulator

CSCE 314 Programming Languages

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design

Design Space Exploration in System Level Synthesis under Memory Constraints

Recursion defining an object (or function, algorithm, etc.) in terms of itself. Recursion can be used to define sequences

RETARGETABLE CODE GENERATION FOR DIGITAL SIGNAL PROCESSORS

Transcription:

Hardware/ Software Partitioning Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

Structure of this course Application Knowledge 2: Specification Design repository Design 3: 8: ES-hardware 6: Application Test mapping 4: system software (RTOS, middleware, ) 7: Optimization 5: Evaluation & validation (energy, cost, performance, ) Numbers denote sequence of chapters - 2 -

Hardware/software partitioning No need to consider special hardware in the future? 100 % Hardware Software 2010 2012 2014 2016 2018 2020 t Correct for fixed functionality, but wrong in general: By the time MPEG-n can be implemented in software, MPEG-n+1 has been invented [de Man] Functionality to be implemented in software or in hardware? - 3 -

Functionality to be implemented in software or in hardware? Decision based on hardware/ software partitioning, a special case of hardware/ software codesign. - 4 -

Codesign Tool (COOL) as an example of HW/SW partitioning Inputs to COOL: 1.Target technology 2.Design constraints 3.Required behavior - 5 -

Hardware/software codesign: approach p Specification Mapping Processor P1 Processor P2 Hardware [Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)] - 6 -

Steps of the COOL partitioning algorithm (1) 1. Translation of the behavior into an internal graph model 2. Translation of the behavior of each node from VHDL into C 3. Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4. Synthesis of hardware components: leaf nodes, application-specific hardware is synthesized. High-level synthesis sufficiently fast. - 7 -

Steps of the COOL partitioning algorithm (2) 5. Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is precomputed 6. Generating and solving a mathematical model of the optimization problem: Integer linear programming ILP model for optimization. Optimal with respect to the cost function (approximates communication time) - 8 -

Steps of the COOL partitioning algorithm (3) 7. Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged. - 9 -

Steps of the COOL partitioning algorithm (4) 8. Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created. - 10 -

An integer linear programming model for HW/SW partitioning Notation: Index set V denotes task graph nodes. Index set L denotes task graph node types e.g. square root, DCT or FFT Index set M denotes hardware component types. e.g. hardware components for the DCT or the FFT. Index set J of hardware component instances Index set KP denotes processors. All processors are assumed to be of the same type - 11 -

An ILP model for HW/SW partitioning X v,m : =1 if node v is mapped to hardware component type m M and 0 otherwise. Y v,k : =1 if node v is mapped to processor k KP and 0 otherwise. NY l,k =1 if at least one node of type l is mapped to processor k KP and 0 otherwise. Type is a mapping from task graph nodes to their types: Type : V L The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware) - 12 -

Constraints Operation assignment constraints v V = 1 : X v, m + Yv, k m M k KP All task graph nodes have to be mapped either in software or in hardware. Variables are assumed to be integers. Additional constraints to guarantee they are either 0 or 1: v V v V : m M :, 1 X v m : k KP :, 1 Y v k - 13 -

Operation assignment constraints (2) l L, v:type(v)=c l, k KP : NY l,k Y v,k For all types l of operations and for all nodes v of this type: if v is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables: l L, k KP : NY l,k 1. - 14 -

Resource & design constraints m M, the cost (area) for components of type m is = sum of the costs of the components of that type. This cost should not exceed its maximum. k KP, the cost for associated data storage area should not exceed its maximum. k KP the cost for storing instructions should not exceed its maximum. The total cost (Σ m M ) of HW components should not exceed its maximum The total cost of data memories (Σ k KP ) should not exceed its maximum The total cost instruction memories (Σ k KP ) should not exceed its maximum - 15 -

Scheduling v 1 v 2 v 3 v 4 e 3 e 4 Processor FIR 1 FIR 2 p 1 ASIC h 1 v 5 v 6 v 7 v 8 Communication channel c 1 v 9 v 10 FIR 2 on h 1 p 1 c 1 v 11... v... 3 v 4... v... 7 v 8... e 3... e 4 or... v 4... v 3 t or... v 8... v 7 t or...... e 4 e 3 t - 16 -

Scheduling / precedence constraints For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 ) (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 ) (start time of v i1 ) if b i1,i2 =0 Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph. Approach fixes the order of execution - 17 -

Other constraints Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted.. - 18 -

Example HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: T H1 H2 H3 P 1 20 100 2 20 100 3 12 10 4 12 10 5 20 100-19 -

Operation assignment constraints (1) T H1 H2 H3 P 1 20 100 2 20 100 3 12 10 4 12 10 5 20 100 v V = 1 : X v, m + Yv, k m KM k KP X 1,1 +Y 1,1 =1 (task 1 mapped to H1 or to P) X 2,2 +Y 2,1 =1 X 3,3 +Y 3,1 =1 X 4,3 +Y 4,1 =1 X 5,1 +Y 5,1 =1-20 -

Operation assignment constraints (2) Assume types of tasks are l =1, 2, 3, 3, and 1. l L, v:type(v)=c l, k KP : NY l,k Y v,k Functionality 3 to be implemented on processor if node 4 is mapped to it. - 21 -

Other equations Time constraints leading to: Application specific hardware required for time constraints 100 time units. T H1 H2 H3 P 1 20 100 2 20 100 3 12 10 4 12 10 5 20 100 Cost function: C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory) - 22 -

Result For a time constraint of 100 time units and cost(p)<cost(h3): T H1 H2 H3 P 1 20 100 2 20 100 3 12 10 4 12 10 5 20 100 Solution (educated guessing) : T1 H1 T2 H2 T3 P T4 P T5 H1-23 -

Separation of scheduling and partitioning Combined scheduling/partitioning very complex; Heuristic: Compute estimated schedule Perform partitioning for estimated schedule Perform final scheduling If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. Actual execution time specification approx. execution time t 1 st Iteration 2 nd Iteration t Actual execution time New specification approx. execution time - 24 -

Application example Audio lab (mixer, fader, echo, equalizer, balance units); slow SPARC processor 1µ ASIC library Allowable delay of 22.675 µs (~ 44.1 khz) SPARC processor ASIC (Compass, 1 µ) External memory Outdated technology; just a proof of concept. - 25 -

Running time for COOL optimization Only simple models can be solved optimally. - 26 -

Deviation from optimal design Hardly any loss in design quality. - 27 -

Running time for heuristic - 28 -

Design space for audio lab Everything in software: 72.9 µs, 0 λ 2 Everything in hardware: 3.06 µs, 457.9x10 6 λ 2 Lowest cost for given sample rate: 18.6 µs, 78.4x10 6 λ 2, - 29 -

Positioning of COOL COOL approach: shows that a formal model of hardware/sw codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search. - 30 -

HW/SW partitioning in the context of mapping applications to processors Handling of heterogeneous systems Handling of task dependencies Considers of communication (at least in COOL) Considers memory sizes etc (at least in COOL) For COOL: just homogeneous processors No link to scheduling theory - 31 -