Hardware/Software Codesign

Similar documents
Hardware-Software Codesign

Design Space Exploration

EE382N: Embedded System Design and Modeling

Partitioning Methods. Outline

EE382V: System-on-a-Chip (SoC) Design

Hardware/Software Partitioning of Digital Systems

System partitioning. System functionality is implemented on system components ASICs, processors, memories, buses

EE382V: System-on-a-Chip (SoC) Design

Standard Optimization Techniques

Co-synthesis and Accelerator based Embedded System Design

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction

Standard Optimization Techniques

Optimization Techniques for Design Space Exploration

Unit 2: High-Level Synthesis

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining

Hardware/Software Codesign

Previous Exam Questions System-on-a-Chip (SoC) Design

Heuristic Optimisation Methods for System Partitioning in HW/SW Co-Design

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Machine Learning for Software Engineering

Hardware-Software Codesign

Genetic Algorithm for Circuit Partitioning

High-Level Synthesis (HLS)

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

Non-deterministic Search techniques. Emma Hart

Evolutionary Algorithm for Embedded System Topology Optimization. Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

BACKEND DESIGN. Circuit Partitioning

TELCOM2125: Network Science and Analysis

Design Space Exploration Using Parameterized Cores

L14 - Placement and Routing

Place and Route for FPGAs

CAD Algorithms. Placement and Floorplanning

Unit 5A: Circuit Partitioning

Hardware Software Partitioning of Multifunction Systems

EEL 5722C Field-Programmable Gate Array Design

Non-convex Multi-objective Optimization

Design Space Exploration for Hardware/Software Codesign of Multiprocessor Systems

A Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems

Algorithm Design (4) Metaheuristics

Research Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

COE 561 Digital System Design & Synthesis Introduction

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany

Optimal Implementation of Simulink Models on Multicore Architectures with Partitioned Fixed Priority Scheduling

VLSI Physical Design: From Graph Partitioning to Timing Closure

Simulated Annealing. Slides based on lecture by Van Larhoven

HW SW Partitioning. Reading. Hardware/software partitioning. Hardware/Software Codesign. CS4272: HW SW Codesign

ECE 5775 (Fall 17) High-Level Digital Design Automation. Fixed-Point Types Analysis of Algorithms

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Hardware Accelerators

Karthik Narayanan, Santosh Madiraju EEL Embedded Systems Seminar 1/41 1

Lecture 7: Introduction to Co-synthesis Algorithms

Artificial Intelligence

MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems

Handling Constraints in Multi-Objective GA for Embedded System Design

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

Constraint-driven System Partitioning

Collaborative Hardware/Software Partition of Coarse-Grained Reconfigurable System Using Evolutionary Ant Colony Optimization

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

General Purpose Methods for Combinatorial Optimization

A Multiobjective Optimization Model for Exploring Multiprocessor Mappings of Process Networks

Synthesis at different abstraction levels

Parallel Simulated Annealing for VLSI Cell Placement Problem

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

EEL 4783: HDL in Digital System Design

Long Term Trends for Embedded System Design

Solving the Maximum Cardinality Bin Packing Problem with a Weight Annealing-Based Algorithm

Embedded Systems CS - ES

Anand Raghunathan

Modularity CMSC 858L

Using Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs

Evolutionary Algorithms. CS Evolutionary Algorithms 1

920 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 10, OCTOBER 1998

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

Markov Chain Analysis Example

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

3 INTEGER LINEAR PROGRAMMING

CAD Algorithms. Circuit Partitioning

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment

Hardware/Software Co-design

Embedded Systems CS - ES

Artificial Intelligence

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors

Embedded System Design Modeling, Synthesis, Verification

CS 137 Part 7. Big-Oh Notation, Linear Searching and Basic Sorting Algorithms. November 10th, 2017

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

High Level Synthesis. Shankar Balachandran Assistant Professor, Dept. of CSE IIT Madras

Digital Hardware-/Softwaresystems Specification

Transcription:

Hardware/Software Codesign 3. Partitioning Marco Platzner Lothar Thiele by the authors 1 Overview A Model for System Synthesis The Partitioning Problem General Partitioning Methods HW/SW-Partitioning Methods Case Studies 2

A Model for System Synthesis Synthesis = Allocation + Binding + Scheduling on the System Level: Partitioning = Allocation + Binding Graph Models Œ Problem Graph» vertices: functional and communication objects» edges: dependencies Œ Architecture Graph» vertices: functional and communication resources» edges: directed communication channels Œ Specification Graph» problem graph + architecture graph + possible mappings 3 Problem Graph task graph problem graph 1 2 1 2 5 3 3 6 4 communication 7 4 4

Architecture Graph target architecture architecture graph RISC v RISC bus v bus HWM1 HWM2 v HWM1 point-to-point link v ptp v HWM2 5 Specification Graph 1 5 3 v RISC v bus 7 v HWM1 2 v ptp 6 v HWM2 4 6

Allocation, Binding 1 5 v RISC v bus 3 7 v HWM1 2 v ptp 6 v HWM2 4 7 Example: homogeneous Multiprocessor Allocation given find Binding and Schedule with Πminimal latency or Πguaranteed deadlines v PE1 v PE2 v PE3 M PE1 M PE2 M PE3 bus v bus 8

Example: Hw/Sw Partitioning in its simplest form only two blocks: SW and HW (bi-partitioning) processor v processor bus v bus ASIC v ASIC 9 Partitioning Levels of Abstractions Œ structural partitioning: at the register transfer (RTL) level, at the netlist level» split a digitial circuit and map it to several devices (FPGAs, ASICs)» system parameters are relatively well-known (area, delay)» no more comparison of design alternatives possible Œ functional partitioning: at the system level» comparison of design alternatives possible (design space exploration)» system parameters are unknown Å estimation (analysis, simulation, rapid prototyping) 10

The Partitioning Problem Definition: The partitioning problem is to assign n objects O ={o 1,..., o n } to m blocks (also called partitions) P={p 1,..., p m }, such that l p 1 p 2... p m = O l l p i p j = { } i,j: i j and cost c(p) are minimized. the general partitioning problem is NP-complete 11 Cost Metrics - Example cost function : f(c, L, P) = k 1 h C (C,C) + k 2 h L (L,L) + k 3 h P (P,P) C system cost in [$] L execution time in [sec] P power consumption in [W] h C, h L, h P functions that determine how much C, L, P violate the design constraints C, L, P (penalty) k 1, k 2, k 3 weighting and normalization 12

General Partitioning Methods exact methods Œ enumeration Œ Integer Linear Programs (ILP) heuristic methods Œ constructive methods» random mapping» hierarchical clustering Œ iterative methods» Kernighan-Lin Algorithm» Simulated Annealing» Evolutionary Algorithms (EA) 13 Integer Linear Programs (1) binary variables x i,k Œ x i,k = 1: object o i in block p k Œ x i,k = 0: object o i not in block p k cost c i,k, if object o i is in block p k integer linear program: x i, k m k= 1 x { 0,1} i, k = 1 minimize 1 i n,1 k m 1 i n m n k= 1i= 1 x i, k c i, k 1 k m,1 i n 14

Integer Linear Programs (2) additional constraints Πexample: maximum number of h k objects in block k n xi, k hk 1 i= 1 k m ILP is NP-complete Πin the worst-case exponential runtime Πsolved by branch&bound algorithms Πformulation gets difficult when constraints are non-linear 15 Constructive Methods random mapping Πeach object is assigned to a block randomly hierarchical clustering Πstepwise grouping of objects Πcloseness function determines how desirable it is to group two objects constructive methods Πare often used to generate a starting partition for iterative methods Πshow the difficulty of finding proper closeness functions 16

Hierarchical Clustering - Example (1) v 5 = v 1 v 3 v 2 10 10 v 1 8 20 v 3 v 2 10 v 5 7 4 6 v 4 4 v 4 closeness function: arithmetic mean of weights 17 Hierarchical Clustering - Example (2) v 6 = v 2 v 5 v 5 10 v 6 v 2 7 5.5 4 v 4 v 4 18

Hierarchical Clustering - Example (3) v 6 5.5 v 7 = v 6 v 4 v 7 v 4 19 Hierarchical Clustering - Example (4) step 3: v 7 = v 6 v 4 step 2: v 6 = v 2 v 5 cut lines (partitions) step 1: v 5 = v 1 v 3 v 1 v 2 v 3 v 4 20

Iterative Methods - Kernighan-Lin (1) Generation of bi-partitions: re-group the object which leads to the largest gain in cost v 6 v 1 v 3 v 2 v 4 v 5 v 7 v 8 v 9 example: cost = number of edges crossing the partitions 21 Iterative Methods - Kernighan-Lin (2) Extensions Œ re-group the object which leads to the largest gain in cost or the smallest loss in cost» as long as a better partition is found: from all n objects, virtually re-group the best, then from the remaining n-1 objects again the best, etc., until all objects have been re-grouped from this n partitions take that with smallest cost and actually perform the corresponding re-group operations» escapes from global minima» asymptotic complexity O(n 3 ) Œ partitioning into m blocks: O(mn 3 ) 22

Iterative Methods - Simulated Annealing (1) from Physics: Œ metal and gas take on a minimal-energy state during cooling down (under certain constraints):» at each temperature, the system reaches a thermodynamic equilibrium» the temperature is decreased sufficiently slowly Œ probability that a particle jumps to a higher-energy state: P( e, e, T ) = e i j ei e k T B j application to Combinatorial Optimization Œ energy = cost of a solution (partition) Œ cost decreases with temperature, sometimes (with a certain probability) increases in cost are accepted 23 Iterative Methods - Simulated Annealing (2) temp = temp_start; cost = c(p); while (Frozen() == FALSE) { while (Equilibrium() == FALSE) { P = RandomMove(P); cost = c(p ); deltacost = cost - cost; if (Accept(deltacost, temp) > random[0,1)) { P = P ; cost = cost ; } } temp = DecreaseTemp (temp); } Accept() = min(1, e deltacost k temp ) 24

Iterative Methods - Simulated Annealing (3) Cooling Down: DecreaseTemp(), Frozen()» temp_start = 1.0» temp = α temp (typical: 0.8 α 0.99)» terminate when temp < temp_min or there is no more improvement Equilibrium: Equilibrium()» after defined number of iterations or when there is no more improvement Complexity Œ from exponential to constant, depending on the implementation of the functions Equilibrium(), DecreaseTemp(), and Frozen() Œ the longer the runtime, the better the quality of results Œ typical: construct functions to get polynomial runtimes 25 Iterative Methods - EA (1) Principles of Evolution å Selection Cross-over ê Mutation 26

Iterative Methods - EA (2) minimize g(x) = x² =one solution fitnesscalculation fitness = 9 selection next generation mutation cross-over 27 Applications Domain of EA Problem is: GLIIXVH FRPSOH[ Œ Examples:» system synthesis» route planning in robotics» container loading Multi criteria optimization Œ multiple criteria that are conflicting» example: performance vs. cost vs. power consumption Œ EA find Pareto-fronts (set of Pareto points) 28

Dominance, Pareto Points (1) Definition: A (design) point J i is dominated by J k, if J k is better or equal than J i in each criteria. Ji f J k Definition: A point is Pareto-optimal or a Pareto-point, if it is not dominated. 29 Dominance, Pareto Points (2) execution time 1 3 2 4 5 6 cost 30

Pareto-Ranking Fitness function: execution time = = 1: Ji p J F ( J ) i 1.. N, J 0 : else J i 1 2 4 3 5 6 F (1) = 0 F (2) = 1 F (3) = 2 F (4) = 0 F (5) = 0 F (6) = 5 cost 31 EA - Case Study (1) 32

EA - Case Study (2) 33 EA - Case Study (3) frame memory dual ported frame memory block matching module input module subtract/add module DCT/IDCT module 34 output module Huffman encoder

HW/SW Partitioning simplest case: bi-partitioning (processor-asic system) P = {p SW, p HW } software-oriented approach: P = {O, {}} Πin software we can realize all functions Πbut the performance may be inacceptably low migrate objects into hardware to improve performance hardware-oriented approach: P = {{}, O} Πin hardware the performance is sufficient Πbut the cost might be too high migrate objects into software to lower cost 35 Greedy Algorithms Migration of objects into the other block (hw/sw), until there is no more improvement repeat { P old = P; for i = 1 to n { if (f(move(p, o i )) < f(p)) { P = Move(P, o i ); } } until (P == P old ) cost function 36

Yorktown Silicon Compiler (YSC) functional partitioning of hardware Πinput: functional description at the level of arithmetic and logic expressions Πpartitioning into functional units of a datapath (ALUs, register) Πmethod: hierarchical clustering Πcloseness function: Closeness( p, p min i j maxsize sharedwires( pi, p ) = maxwires( P) c 3 2 ) { size( p ), ( )} ( ) + ( ) i size p j size pi size p j j maxsize c 37 Hw/Sw Partitioning - Vulcan Input: program in HardwareC ΠC extended by a process concept and inter process communication Πspecification with constraits (min/max-times and rates) Target architecture: single-processor / single-asic Πone global bus, one global memory Πprocessor is the bus master Abstraction level: basic blocks and operations Πderministic computation times Πinternal/external non-deterministic computation times Method: HW-oriented Greedy-Algorithm Πcost function includes HW-cost, memory requirement, performance, and synchronization effort 38

Hw/Sw Partitioning - Cosyma Input: Program in C x ΠC extended by a process concept and inter process communication Πspecification with min/max-times Target architecture: processor + coprocessor Πcoupled by shared memory Πcomputations on the processor and on the coprocessor may not overlap Abstraction level: basic blocks Method: SW-oriented, 2 loops: Πinner loop: Simulated Annealing with cost function that measures the gain in computation time for a hardware-realization of a block Πouter loop: synthesis to get estimations for the inner loop 39