Heuristic Optimisation Methods for System Partitioning in HW/SW Co-Design

Similar documents
Restricted Range Exhaustive Search: A New Heuristic for HW/SW Partitioning of Task Graphs

Evolutionary Algorithm for Embedded System Topology Optimization. Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang

Hardware/Software Codesign

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

EXTENDING THE GCLP ALGORITHM FOR HW/SW PARTITIONING: A DETAILED PLATFORM MODEL AND PERFORMANCE IMPROVEMENTS

Static Code Analysis of Functional Descriptions in SystemC

Optimization Techniques for Design Space Exploration

Automatic Design Techniques for Embedded Systems

Hardware Software Partitioning of Multifunction Systems

Hardware-Software Codesign

EE382V: System-on-a-Chip (SoC) Design

Research Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Design Space Exploration

Co-synthesis and Accelerator based Embedded System Design

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Generic Topology Mapping Strategies for Large-scale Parallel Architectures

Research Article Accounting for Recent Changes of Gain in Dealing with Ties in Iterative Methods for Circuit Partitioning

A Recursive Coalescing Method for Bisecting Graphs

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Minimizing Clock Domain Crossing in Network on Chip Interconnect

CAD Algorithms. Circuit Partitioning

Genetic Algorithm for Circuit Partitioning

IMPROVEMENT OF THE QUALITY OF VLSI CIRCUIT PARTITIONING PROBLEM USING GENETIC ALGORITHM

Unit 5A: Circuit Partitioning

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Hardware/Software Codesign

Efficient Design Methods for Embedded Communication Systems

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS

A Consistent Design Methodology to Meet SDR Challenges

VLSI Physical Design: From Graph Partitioning to Timing Closure

Hardware Design and Simulation for Verification

EE382V: System-on-a-Chip (SoC) Design

Heuristic Optimisation Methods for System Partitioning in HW/SW Co-Design

Applications to MPSoCs

Hardware/Software Co-design

Non-convex Multi-objective Optimization

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS

N-Model Tests for VLSI Circuits

SDR Forum Technical Conference 2007

ACO and other (meta)heuristics for CO

Evolutionary Algorithms: Lecture 4. Department of Cybernetics, CTU Prague.

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Single Chip Heterogeneous Multiprocessor Design

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

Modelling, Analysis and Scheduling with Dataflow Models

NOVEL CONSTRAINED SEARCH-TACTIC FOR OPTIMAL DYNAMIC ECONOMIC DISPATCH USING MODERN META-HEURISTIC OPTIMIZATION ALGORITHMS

Search Space Reduction for E/E-Architecture Partitioning

INTERACTIVE MULTI-OBJECTIVE GENETIC ALGORITHMS FOR THE BUS DRIVER SCHEDULING PROBLEM

CHAPTER 1 INTRODUCTION

Outline of the talk. Local search meta-heuristics for combinatorial problems. Constraint Satisfaction Problems. The n-queens problem

Technical Recommendation S. 10/07: Source Encoding of High Definition Mobile TV Services

Synthesis and Optimization of Digital Circuits

FPGA-Based Embedded Systems for Testing and Rapid Prototyping

Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization

An Evolutionary Algorithm Approach to Generate Distinct Sets of Non-Dominated Solutions for Wicked Problems

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Handling Constraints in Multi-Objective GA for Embedded System Design

Tabu Search for Constraint Solving and Its Applications. Jin-Kao Hao LERIA University of Angers 2 Boulevard Lavoisier Angers Cedex 01 - France

Dimensionality Reduction in Multiobjective Optimization: The Minimum Objective Subset Problem

Using CODEQ to Train Feed-forward Neural Networks

Combined System Synthesis and Communication Architecture Exploration for MPSoCs

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Place and Route for FPGAs

Hardware-Software Codesign. 1. Introduction

Hardware Software Codesign of Embedded Systems

EE382N: Embedded System Design and Modeling

Simple mechanisms for escaping from local optima:

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Testing Embedded Cores Using Partial Isolation Rings

Systems Development Tools for Embedded Systems and SOC s

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

Design Space Exploration Using Parameterized Cores

SPATIAL OPTIMIZATION METHODS

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

Optimization of Turning Process during Machining of Al-SiCp Using Genetic Algorithm

Cell-to-switch assignment in. cellular networks. barebones particle swarm optimization

Scan-Based BIST Diagnosis Using an Embedded Processor

Tabu search and genetic algorithms: a comparative study between pure and hybrid agents in an A-teams approach

HEURISTICS FOR THE NETWORK DESIGN PROBLEM

Evolutionary Non-Linear Great Deluge for University Course Timetabling

Computational Intelligence

Iterative Learning of Single Individual Haplotypes from High-Throughput DNA Sequencing Data

Placement Algorithm for FPGA Circuits

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Optimizing the Sailing Route for Fixed Groundfish Survey Stations

SENSITIVITY ANALYSIS OF CRITICAL PATH AND ITS VISUALIZATION IN JOB SHOP SCHEDULING

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

GT HEURISTIC FOR SOLVING MULTI OBJECTIVE JOB SHOP SCHEDULING PROBLEMS

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

Hardware-Software Codesign. 1. Introduction

An Integrated Design Algorithm for Detailed Layouts Based on the Contour Distance

Multi-objective Optimization Algorithm based on Magnetotactic Bacterium

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen

Transcription:

Heuristic Optimisation Methods for System Partitioning in HW/SW Co-Design Univ.Prof. Dipl.-Ing. Dr.techn. Markus Rupp Vienna University of Technology Institute of Communications and Radio-Frequency Engineering Dr. Bastian Knerr, Dr. Martin Holzer Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms

Outline Embedded system design System partitioning Heterogeneous platforms Mapping system graphs to platforms Heuristic optimisation for platform mapping Multi-objective optimization (Area, Timing) Conclusions 2/30

Embedded system design An embedded system is a computing device which is in general subject to a specific purpose. whose implementation is predominantly determined by this purpose. which entails a complete encapsulation into the environment where the purpose is located at. Phones/PDAs Automotive Transceiver (WIFI, WLAN, xdsl,...) 3

Embedded system design 4

Outline Embedded system design System partitioning Heterogeneous platforms Mapping system graphs to platforms Heuristic optimisation for platform mapping Multi-objective optimization (Area, Timing) Conclusions 5/30

Heterogeneous platforms Classical HW/SW co-design platform First published in 1992 Served well to get a first grip on (bi-)partitioning Recent work still uses this classical model (Kalavade 2002, Wiangtong 2002) This model does not sufficiently represent state-of-the-art platforms in embedded systems 6

Heterogeneous platforms Modern system-on-chip platform UMTS baseband transceiver chip (2003) DSP+Microcontroller ASICs Busses and bridges RAM and registers Interfaces 7

Heterogeneous platforms Contribution: Design of a modular platform library Contribution: arbitrary platform compositions Processor DSP FPGA ASIC Communication Bus FIFO Direct Memory B. Knerr, M. Holzer, M. Rupp, Extending the GCLP Algorithm for HW/SW Partitioning: A Detailed Platform Model and Performance Improvements, IEEE Austrochip, Vienna, Austria, July 2006. M. Holzer, B. Knerr, M. Rupp, Efficient Design Methods for Embedded Communications Systems, EURASIP Journal on Embedded Systems, 2006. 8

Outline Embedded system design System partitioning Heterogeneous platforms Mapping system graphs to platforms Heuristic optimisation for platform mapping Multi-objective optimization (Area, Timing) Conclusions 9/30

Mapping system graphs to platforms System Process graph representations for UMTS signal processing 10

Mapping system graphs to platforms The constrained mapping problem Proven to be NP-hard Use heuristics 11

Outline Embedded system design System partitioning Heterogeneous platforms Mapping system graphs to platforms Heuristic optimisation for platform mapping Multi-objective optimization (Area, Timing) Conclusions 12/30

Heuristic optimisation Popular heuristics for system partitioning Gradient search Simulated annealing Tabu search Kernighan-Lin min-cut Genetic algorithm Global criticality/local phase Restricted range exhaustive search 13

Heuristic optimisation Contribution : New or substantially improved methods Kernighan-Lin min-cut B. Knerr, M. Holzer, M. Rupp, HW/SW Partitioning Using High Level Metrics, Int. Conf. on Computing, Communications, and Control (CCCT), Austin, TX, USA, August 2004. Genetic algorithm B. Knerr, M. Holzer, M. Rupp, Novel Genome Coding of Genetic Algorithms for the System Partitioning Problem, IEEE 2 nd Int. Symposium on Industrial Embedded Systems (SIES), Lisbon, Portugal, July 2007. Global criticality/local phase (GCLP) B. Knerr, M. Holzer, M. Rupp, Improvements of the GCLP Algorithm for HW/SW Partitioning of Task Graphs, IASTED 4 th Int. Conf. on Circuits, Signals, and Systems (CSS), San Francisco, CA, USA, Nov 2006. Restricted range exhaustive search (RRES) B. Knerr, M. Holzer, M. Rupp, RRES: A Novel Approach to the Partitioning Problem for a Typical Subset of System Graphs, EURASIP Journal on Embedded Systems, Volume 2008, Article 259686. 14

Heuristic optimisation Contribution: Analysis of representative graphs for signal processing Identification of their specific typical properties Density/Sparsity Density Degree of parallelism Avg number of parallel nodes Rank-locality 15

Heuristic optimisation Consider graphs with strong locality property Compare GA with different genome codings Randomly ordered Rank ordered Locality ordered Multi-core platform 3 Objectives Area, Code, Time 16

Heuristic optimisation Analysed the dependency of different algorithms on graph locality Comparison of GA, TS, and RRES over κ on large sparse graphs ( ) 17

Outline Embedded system design System partitioning Heterogeneous platforms Mapping system graphs to platforms Heuristic optimisation for platform mapping Multi-objective optimization (Area, Timing) Conclusions 18/30

Execution Time Estimation Metrics Number of operations Parallelism Available resources 19

Execution Time Profile Path analysis Best Case Execution Time (BCET) Worst Case Execution Time (WCET) Infeasible paths Condition (A < 1) && (A >= 1) cannot be fulfilled BCET WCET Narrow bounds for the execution time interval 20

Comparison Synthesis/Estimation Benchmark algorithms UMTS cell searching MPEG algorithm Comparison with results from high level synthesis (SPARK) Fidelity value counts how often the relation between two estimations (E(i), E(j)) is preserved also for the real values (R(i), R(j)) Fidelity of 100% achieved M. Holzer, M. Rupp, Static Estimation of Execution Times for Hardware Accelerators in System-on-Chips, International Symposium on System-on-Chip 2005, Tampere, Nov. 2005. 21

Area Time Trade-off (1) 22

Area Time Trade-off (2) 23

Multi-objective Optimization Minimization of a set of conflicting functions A decision x is Pareto optimal if there is no other decision that dominates x Set of Pareto optimal points is called Pareto front Evolutionary algorithm approach to compute Pareto front 24

Evolutionary Optimisation area complexity cycle count 25

Pareto Front Coverage Performance measure: hyper volume indicator area complexity Improvement of design space coverage by 20% cycle count M. Holzer, B. Knerr, M. Rupp, Design Space Exploration with Evolutionary Multi-objective Optimization, Symposium on Industrial Embedded Systems, Lisbon, 2007 26

Pareto Front Examples Control flow graph 10 basic blocks No loops 10 10 design points 14 Pareto optimal design points Control flow graph 15 basic blocks 2 loops 10 20 design points 29 Pareto optimal design points 27

Conclusions Formulation of the system partitioning problem for state-of-the-art embedded systems Design of modular platform library Allowing for arbitrary platform compositions Including board level and SoC models Analysis of typical graphs in digital signal processing Identification of specific value ranges for the graph properties Multi-objective optimization algorithm Coverage improvements of 20% increases the number of implementation variants 28

Conclusions Evaluation of a large variety of heuristics Including ES, GS, SA, TS, GA, RRES, GCLP, KL Analysis of the impact of system graph properties on the algorithm s performance Identification of appropriate algorithms depending on Improved genome sequence for genetic algorithms Proposed genome sequence according to graph properties Cost reduction of ~50% compared to random sequence Proposed new algorithm (RRES) RRES exploits strong locality component of system graphs Outperforms any other algorithm on a distinct subset of system graphs 29

Fin Thank you for your attention 30