Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling

Size: px
Start display at page:

Download "Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling"

Transcription

1 Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Tobias Schwarzer 1, Joachim Falk 1, Michael Glaß 1, Jürgen Teich 1, Christian Zebelein 2, Christian Haubelt 2 1 Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany 2 University of Rostock, Germany

2 Fine-granular Modeling with Data Flow r CPU1 r CPU2 a 7 Advantages: Exposes all degrees of available parallelism High flexibility of mapping actors to resources Disadvantages: Scheduling overhead at runtime May sacrifize the code-generation efficiency of the compiler Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

3 Coarse-granular Modeling c r CPU1 r CPU2 c a 7 Advantages: High code-generation efficiency Low scheduling overhead at runtime Disadvantages: May sacrifize parallelism significantly Reduces mapping flexibility Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

4 Proposed Solution Input: Fine-grained (data flow) modelling of applications Model of the target platform Given: a clustering technique [FKHTB08] to automatically combine fine-grained actors to composite actors using Quasi-Static Scheduling (QSS) Key Idea: Perform a Design Space Exploration (DSE) to find the best combinations of clusters and their mapping to available resources in terms of application throughput Contributions: A genetic representation for the exploration of clusters A repair strategy to efficiently restrict the search to feasible clusters only Synthesis-in-the-Loop: Automatic synthesis of each implementation candidate to the target platform to determine quality numbers, capturing the effects of (a) scheduling overhead, (b) memory accesses, and (c) compiler optimizations FKHTB08: J. Falk, J. Keinert, C. Haubelt, J. Teich, and S. S.Bhattacharyya: A Generalized Static Data Flow Clustering Algorithm for MPSoC Scheduling of Multimedia Applications. In Proc. of EMSOFT, pages , Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

5 Proposed System Compilation Flow a 7 a8 Automatic extraction of the dataflow graph Dataflow application Input Read in executable specification a 7 r CPU1 r CPU2 Explore design space Step 1: Determine clustering Step 2: Perform binding Architecture graph Synthesis-in-the-loop Run executable Obtain throughput DSE Evaluate implementation on target hardware r CPU3 r CPU4 Non-dominated design points Output Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

6 Fundamentals

7 Classic System Synthesis Problem graph g p = (A, C): Architecture graph g r = (R, L): r CPU1 r CPU2 r CPU3 r CPU4 a 7 dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

8 Classic System Synthesis Problem graph g p = (A, C): Architecture graph g r = (R, L): r CPU1 r CPU2 Binding β r CPU3 r CPU4 a 7 dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

9 Clustering-aware System Synthesis Clustering: Replace a connected subgraph containing only static actors of a DFG by a single composite actor cluster 1 a 7 dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

10 Clustering-aware System Synthesis Clustering: Replace a connected subgraph containing only static actors of a DFG by a single composite actor Each composite actor contains a Quasi-Static Schedule (QSS) executing sequences of statically scheduled actor firings i 1 QSS 1 o 1 QSS is represented by a Finite State Machine (FSM) o 2 #i 1 1 & #o 2 1 / < > Details in [FKHTB08] q 0 q 1 FKHTB08: J. Falk, J. Keinert, C. Haubelt, J. Teich, and S. S.Bhattacharyya: A Generalized Static Data Flow Clustering Algorithm for MPSoC Scheduling of Multimedia Applications. In Proc. of EMSOFT, pages , dynamic actors #i 1 1 & #o 1 1 & #o 2 1 / <,a 7, > static actors FSM state Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

11 Clustering-aware System Synthesis cluster 2 Clustering: Replace a connected subgraph containing only static actors of a DFG by a single composite actor Each composite actor contains a Quasi-Static Schedule (QSS) executing sequences of statically scheduled actor firings cluster 1 QSS is represented by a Finite State Machine (FSM) a 7 Details in [FKHTB08] FKHTB08: J. Falk, J. Keinert, C. Haubelt, J. Teich, and S. S.Bhattacharyya: A Generalized Static Data Flow Clustering Algorithm for MPSoC Scheduling of Multimedia Applications. In Proc. of EMSOFT, pages , dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

12 Clustering-aware System Synthesis cluster 2 Clustering: Replace a connected subgraph containing only static actors of a DFG by a single composite actor Each composite actor contains a Quasi-Static Schedule (QSS) executing sequences of statically scheduled actor firings cluster 1 QSS is represented by a Finite State Machine (FSM) a 7 Details in [FKHTB08] FKHTB08: J. Falk, J. Keinert, C. Haubelt, J. Teich, and S. S.Bhattacharyya: A Generalized Static Data Flow Clustering Algorithm for MPSoC Scheduling of Multimedia Applications. In Proc. of EMSOFT, pages , dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

13 Clustering-aware System Synthesis cluster 2 r CPU1 r CPU2 Binding β cluster 1 r CPU3 r CPU4 a 7 dynamic actors static actors Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

14 Design Space Exploration

15 Design Space Exploration - Opt4J Custom optimization problems are typically defined by providing three main classes: Creator: Generate random genotype objects Decoder: Convert the genotype into a phenotype Evaluator: Determine the quality of one phenotype Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

16 Design Space Exploration - Creator Generate random genotype objects: Associate each channel of the problem graph with a Boolean value using the clusterization function γ: γ = C S {,false} channel should be contained in a cluster false channel should not be contained in a cluster Exclude channels connected to a dynamic actor from genotype false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

17 Design Space Exploration - Decoder How to convert the genotype (/false edges) to a phenotype (clusters)? cluster 2? cluster 1 false a 7 a 7 dynamic actors static actors Conversion consists of two steps: 1. Determine clustering candidates 2. Repair clustering candidates to obtain a feasible cluster Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

18 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

19 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

20 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with stop? false stop? a 7 stop? Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

21 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

22 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with stop? false stop? a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

23 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

24 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 clustering candidate 1 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

25 Step 1: Determine Clustering Candidate Collect reachable actors starting at channel associated with false a 7 clustering candidate 1 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

26 Step 1: Determine Clustering Candidate clustering candidate 2 Collect reachable actors starting at channel associated with false a 7 clustering candidate 1 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

27 Step 2: Repair Strategy false a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

28 Step 2: Repair Strategy a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

29 Step 2: Repair Strategy a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

30 Step 2: Repair Strategy Clustering condition: For each pair of cluster input and output actor, there exists a directed path from the input actor to the output actor. a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

31 Step 2: Repair Strategy a 5 a 7 Clustering condition: For each pair of cluster input and output actor, there exists a directed path from the input actor to the output actor. Repair strategy: Transform an infeasible clustering candidate into (multiple) feasible clusters cluster candidate input actors {a2,a6} clustering candidate output actors {a3,a4,a7} Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

32 Step 2: Repair Strategy Repair clustering candidate to obtain feasible cluster: a 7 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

33 Step 2: Repair Strategy Repair clustering candidate to obtain feasible cluster: a 7 cluster candidate input actors {a2,a6} clustering candidate output actors {a3,a4,a7} Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

34 Step 2: Repair Strategy I = { } O= {, } Repair clustering candidate to obtain feasible cluster: Determine set of predecessor input actors and set of successor output actors for each actor a 7 cluster candidate input actors {a2,a6} clustering candidate output actors {a3,a4,a7} Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

35 Step 2: Repair Strategy I = {, } O= {, } I = { } O= {, } I = {, } O= {, } Repair clustering candidate to obtain feasible cluster: Determine set of predecessor input actors and set of successor output actors for each actor I = { } O= {,,a 7 } I = { } O= {,,a 7 } a 7 I = { } O= {, } I = { } O= {,,a 7 } Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

36 Step 2: Repair Strategy EC 4 I = {, } O= {, } EC 2 I = { } O= {, } EC 1 I = {, } O= {, } I = { } O= {,,a 7 } I = { } O= {,,a 7 } Repair clustering candidate to obtain feasible cluster: Determine set of predecessor input actors and set of successor output actors for each actor Create equivalence classes (ECs) a 7 I = { } O= {, } EC 3 I = { } O= {,,a 7 } Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

37 Step 2: Repair Strategy EC 4 I = {, } O= {, } EC 2 I = { } O= {, } EC 1 I = {, } O= {, } I = { } O= {,,a 7 } I = { } O= {,,a 7 } a 7 Repair clustering candidate to obtain feasible cluster: Determine set of predecessor input actors and set of successor output actors for each actor Create equivalence classes (ECs) Determine clashes between each EC I = { } O= {, } EC 3 I = { } O= {,,a 7 } Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

38 Step 2: Repair Strategy EC 4 I = {, } O= {, } I = { } O= {, } EC 1 I = { } O= {,,a 7 } I = { } O= {,,a 7 } a 7 Repair clustering candidate to obtain feasible cluster: Determine set of predecessor input actors and set of successor output actors for each actor Clash: Detect I = {a the 2, } situation when an output actor is not connected O= {,a to 4 } all input actors, since this may result in the infinite token accumulation problem. (Formal details given in the paper) EC 2 Create equivalence classes (ECs) Determine clashes between each EC I = { } O= {, } EC 3 I = { } O= {,,a 7 } Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

39 Step 2: Repair Strategy I = { } O= {, } EC 1 EC 4 I = {, } O= {, } EC 2 I = {, } O= {, } I = { } O= {,,a 7 } I = { } O= {,,a 7 } Clash exists between EC 2 and EC 4! Repair strategy splits the cluster candidate between and or between and. a 7 I = { } O= {, } EC 3 I = { } O= {,,a 7 } Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

40 Design Space Exploration - Evaluator Determine the quality of one phenotype: Each implementation candidate is evaluated via synthesis-in-the-loop on the target platform with respect to the design objectives: Throughput [iterations/s] (MAX) Number of used processor cores (MIN) a 7 r CPU1 r CPU2 Explore design space Step 1: Determine clustering Step 2: Perform binding Architecture graph Synthesis-in-the-loop Run executable Obtain throughput Evaluate implementation on target hardware r CPU3 r CPU4 Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

41 Experimental Results

42 Experimental results Test suite: Six synthetic DFGs using the SDF 3 tool Brake-by-wire control application (BBW) Target platforms: Intel(R) Core(TM) i GHz Xilinx Zynq-7000 All Programmable SoC with a dual-core ARM Cortex- A9 667MHz Approaches: classic (DSE without clustering) post QSS (DSE with clustering as a post-processing step) Direct combination of [FKHTB08] and DSE Proposed holistic DSE flow FKHTB08: J. Falk, J. Keinert, C. Haubelt, J. Teich, and S. S.Bhattacharyya: A Generalized Static Data Flow Clustering Algorithm for MPSoC Scheduling of Multimedia Applications. In Proc. of EMSOFT, pages , Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

43 Results for synthetic DFG g P C Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

44 Results Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

45 Results Holistic approach is always superior to classic DSE outperforms post QSS in the multi-processor case Speedups up to 9.91x for four processor cores of the intel platform 7.2x for two processor cores on the Zynq platform Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

46 Conclusion Holistic compilation flow combines system synthesis and the use of QSSs in a DSE to maximize application throughput for a given target platform DSE explores the clustering of static actors to fine-tune the tradeoff between exploitable degree of parallelism, dynamic scheduling overhead reduction, and compiler code-generation efficiency Novel repair strategy ensures the creation of feasible clusters Each implementation candidate is compiled to and measured on real target hardware (synthesis-in-the-loop) Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

47 Thank you for your attention! Questions? Acknowledgements: This work is supported in part by the German Science Foundation (DFG), HA 4463/3-2 and TE 163/18-2. Schwarzer et al. Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling SCOPES

SysteMoC. Verification and Refinement of Actor-Based Models of Computation

SysteMoC. Verification and Refinement of Actor-Based Models of Computation SysteMoC Verification and Refinement of Actor-Based Models of Computation Joachim Falk, Jens Gladigau, Christian Haubelt, Joachim Keinert, Martin Streubühr, and Jürgen Teich {falk, haubelt}@cs.fau.de Hardware-Software-Co-Design

More information

Combined System Synthesis and Communication Architecture Exploration for MPSoCs

Combined System Synthesis and Communication Architecture Exploration for MPSoCs This is the author s version of the work. The definitive work was published in Proceedings of Design, Automation and Test in Europe (DATE 2009), pp. 472-477, 2009. The work is supported in part by the

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System

More information

Reliable Embedded Multimedia Systems?

Reliable Embedded Multimedia Systems? 2 Overview Reliable Embedded Multimedia Systems? Twan Basten Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others Embedded Multi-media Analysis of

More information

Design Space Exploration

Design Space Exploration Design Space Exploration SS 2012 Jun.-Prof. Dr. Christian Plessl Custom Computing University of Paderborn Version 1.1.0 2012-06-15 Overview motivation for design space exploration design space exploration

More information

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

By: Chaitanya Settaluri Devendra Kalia

By: Chaitanya Settaluri Devendra Kalia By: Chaitanya Settaluri Devendra Kalia What is an embedded system? An embedded system Uses a controller to perform some function Is not perceived as a computer Software is used for features and flexibility

More information

Control Performance-Aware System Level Design

Control Performance-Aware System Level Design Control Performance-Aware System Level Design Nina Mühleis, Michael Glaß, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Email: {nina.muehleis, glass, teich}@cs.fau.de Abstract

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

Industrial Multicore Software with EMB²

Industrial Multicore Software with EMB² Siemens Industrial Multicore Software with EMB² Dr. Tobias Schüle, Dr. Christian Kern Introduction In 2022, multicore will be everywhere. (IEEE CS) Parallel Patterns Library Apple s Grand Central Dispatch

More information

Efficient Symbolic Multi Objective Design Space Exploration

Efficient Symbolic Multi Objective Design Space Exploration This is the author s version of the work. The definitive work was published in Proceedings of the 3th Asia and South Pacific Design Automation Conference (ASP-DAC 2008), pp. 69-696, 2008. The work is supported

More information

Extensions of Daedalus Todor Stefanov

Extensions of Daedalus Todor Stefanov Extensions of Daedalus Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Overview of Extensions in Daedalus DSE limited to

More information

Automatic Generation of System-Level Virtual Prototypes from Streaming Application Models

Automatic Generation of System-Level Virtual Prototypes from Streaming Application Models Automatic Generation of System-Level Virtual Prototypes from Streaming Application Models Philipp Kutzer, Jens Gladigau, Christian Haubelt, and Jürgen Teich Hardware/Software Co-Design, Department of Computer

More information

SYSTEMCODESIGNER An Automatic ESL Synthesis Approach by Design Space Exploration and Behavioral Synthesis for Streaming Applications

SYSTEMCODESIGNER An Automatic ESL Synthesis Approach by Design Space Exploration and Behavioral Synthesis for Streaming Applications 1 SYSTEMCODESIGNER An Automatic ESL Synthesis Approach by Design Space Exploration and Behavioral Synthesis for Streaming Applications JOACHIM KEINERT, MARTIN STREUBŪHR, THOMAS SCHLICHTER, JOACHIM FALK,

More information

A Parallelizing Compiler for Multicore Systems

A Parallelizing Compiler for Multicore Systems A Parallelizing Compiler for Multicore Systems José M. Andión, Manuel Arenaz, Gabriel Rodríguez and Juan Touriño 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2014)

More information

Multi-Objective Distributed Run-time Resource Management for Many-Cores

Multi-Objective Distributed Run-time Resource Management for Many-Cores Multi-Objective Distributed Run-time Resource Management for Many-Cores Stefan Wildermann, Michael Glaß, Jürgen Teich University of Erlangen-Nuremberg, Germany {stefan.wildermann, michael.glass, teich}@fau.de

More information

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments

More information

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA Akash Kumar,, Shakith Fernando, Yajun Ha, Bart Mesman and Henk Corporaal Eindhoven University of Technology, Eindhoven,

More information

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS Jing Lin, Akshaya Srivasta, Prof. Andreas Gerstlauer, and Prof. Brian L. Evans Department of Electrical and Computer Engineering The

More information

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for

More information

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers Rene Griessl, Meysam Peykanu, Lennart Tigges, Jens Hagemeyer, Mario Porrmann Center of Excellence Cognitive Interaction Technology

More information

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION In Proceedings of the IEEE Workshop on Signal Processing Systems, Quebec City, Canada, October 2012. MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT IMPLEMENTATION Lai-Huei Wang 1, Chung-Ching

More information

A MODEL-BASED INTER-PROCESS RESOURCE SHARING APPROACH FOR HIGH-LEVEL SYNTHESIS OF DATAFLOW GRAPHS

A MODEL-BASED INTER-PROCESS RESOURCE SHARING APPROACH FOR HIGH-LEVEL SYNTHESIS OF DATAFLOW GRAPHS A MODEL-BAED INTER-PROCE REOURCE HARING APPROACH FOR HIGH-LEVEL YNTHEI OF DATAFLOW GRAPH Christian Zebelein 1, Joachim Falk 2, Christian Haubelt 1, Jürgen Teich 2 1 University of Rostock 2 University of

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Multi-Variant-based Design Space Exploration for Automotive Embedded Systems

Multi-Variant-based Design Space Exploration for Automotive Embedded Systems Multi-Variant-based Design Space Exploration for Automotive Embedded Systems Sebastian Graf, Michael Glaß, and Jürgen Teich Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany Email: sebastian.graf,

More information

Reliable Dynamic Embedded Data Processing Systems

Reliable Dynamic Embedded Data Processing Systems 2 Embedded Data Processing Systems Reliable Dynamic Embedded Data Processing Systems sony Twan Basten thales Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen,

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

Mapping of Applications to Multi-Processor Systems

Mapping of Applications to Multi-Processor Systems Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts.

More information

Classification of General Data Flow Actors into Known Models of Computation

Classification of General Data Flow Actors into Known Models of Computation Classification of General Data Flow Actors into Known Models of Computation Christian Zebelein, Joachim Falk, Christian Haubelt, and Jürgen Teich Hardware/Software Co-Design University Erlangen-Nuremberg,

More information

Preliminary Discussion

Preliminary Discussion Preliminary Discussion Multi-Core Architectures and Programming Oliver Reiche, Christian Schmitt, Michael Witterauf, Frank Hannig Hardware/Software Co-Design, Friedrich-Alexander University Erlangen-Nürnberg

More information

PetaBricks. With the dawn of the multi-core era, programmers are being challenged to write

PetaBricks. With the dawn of the multi-core era, programmers are being challenged to write PetaBricks Building adaptable and more efficient programs for the multi-core era is now within reach. By Jason Ansel and Cy Chan DOI: 10.1145/1836543.1836554 With the dawn of the multi-core era, programmers

More information

Petri Nets. Petri Nets. Petri Net Example. Systems are specified as a directed bipartite graph. The two kinds of nodes in the graph:

Petri Nets. Petri Nets. Petri Net Example. Systems are specified as a directed bipartite graph. The two kinds of nodes in the graph: System Design&Methodologies Fö - 1 System Design&Methodologies Fö - 2 Petri Nets 1. Basic Petri Net Model 2. Properties and Analysis of Petri Nets 3. Extended Petri Net Models Petri Nets Systems are specified

More information

MODELING OF BLOCK-BASED DSP SYSTEMS

MODELING OF BLOCK-BASED DSP SYSTEMS MODELING OF BLOCK-BASED DSP SYSTEMS Dong-Ik Ko and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Software Synthesis from Dataflow Models for G and LabVIEW

Software Synthesis from Dataflow Models for G and LabVIEW Software Synthesis from Dataflow Models for G and LabVIEW Hugo A. Andrade Scott Kovner Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 andrade@mail.utexas.edu

More information

Modelling, Analysis and Scheduling with Dataflow Models

Modelling, Analysis and Scheduling with Dataflow Models technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University

More information

A System-Level Synthesis Approach from Formal Application Models to Generic Bus-Based MPSoCs

A System-Level Synthesis Approach from Formal Application Models to Generic Bus-Based MPSoCs A System-Level Synthesis Approach from Formal Application s to Generic Bus-Based MPSoCs Jens Gladigau 1, Andreas Gerstlauer 2, Christian Haubelt 1, Martin Streubühr 1, and Jürgen Teich 1 1 Department of

More information

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS Sukumar Reddy Anapalli Krishna Chaithanya Chakilam Timothy W. O Neil Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science The

More information

Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics

Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics Jutta Pirkl, Andreas Becher, Jorge Echavarria, Jürgen Teich, and Stefan Wildermann Hardware/Software Co-Design, Friedrich-Alexander-Universität

More information

WaveScalar. Winter 2006 CSE WaveScalar 1

WaveScalar. Winter 2006 CSE WaveScalar 1 WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism traditional coarser-grain parallelism cheap thread management memory ordering enforced through wave-ordered memory Winter 2006 CSE

More information

Extending Model-Based Design for HW/SW Design and Verification in MPSoCs Jim Tung MathWorks Fellow

Extending Model-Based Design for HW/SW Design and Verification in MPSoCs Jim Tung MathWorks Fellow Extending Model-Based Design for HW/SW Design and Verification in MPSoCs Jim Tung MathWorks Fellow jim@mathworks.com 2014 The MathWorks, Inc. 1 Model-Based Design: From Concept to Production RESEARCH DESIGN

More information

Ohua: Implicit Dataflow Programming for Concurrent Systems

Ohua: Implicit Dataflow Programming for Concurrent Systems Ohua: Implicit Dataflow Programming for Concurrent Systems Sebastian Ertel Compiler Construction Group TU Dresden, Germany Christof Fetzer Systems Engineering Group TU Dresden, Germany Pascal Felber Institut

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Automated Space/Time Scaling of Streaming Task Graphs. Hossein Omidian Supervisor: Guy Lemieux

Automated Space/Time Scaling of Streaming Task Graphs. Hossein Omidian Supervisor: Guy Lemieux Automated Space/Time Scaling of Streaming Task Graphs Hossein Omidian Supervisor: Guy Lemieux 1 Contents Introduction KPN-based HLS Tool for MPPA overlay Experimental Results Future Work Conclusion 2 Introduction

More information

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17, Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems

More information

Optimal Architectures for Massively Parallel Implementation of Hard. Real-time Beamformers

Optimal Architectures for Massively Parallel Implementation of Hard. Real-time Beamformers Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers Final Report Thomas Holme and Karen P. Watkins 8 May 1998 EE 382C Embedded Software Systems Prof. Brian Evans 1

More information

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Raminhas pedro.raminhas@tecnico.ulisboa.pt Stage: 2 nd Year PhD Student Research Area: Dependable and fault-tolerant systems

More information

Fault Tolerance Analysis of Distributed Reconfigurable Systems Using SAT-Based Techniques

Fault Tolerance Analysis of Distributed Reconfigurable Systems Using SAT-Based Techniques In Field-Programmable Logic and Applications by Peter Y. K. Cheung, George A. Constantinides, and Jose T. de Sousa (Eds.). In Lecture Notes in Computer Science (LNCS), Volume 2778, c Springer, Berlin,

More information

LOGIC SYNTHESIS AND VERIFICATION ALGORITHMS. Gary D. Hachtel University of Colorado. Fabio Somenzi University of Colorado.

LOGIC SYNTHESIS AND VERIFICATION ALGORITHMS. Gary D. Hachtel University of Colorado. Fabio Somenzi University of Colorado. LOGIC SYNTHESIS AND VERIFICATION ALGORITHMS by Gary D. Hachtel University of Colorado Fabio Somenzi University of Colorado Springer Contents I Introduction 1 1 Introduction 5 1.1 VLSI: Opportunity and

More information

Partitioning Methods. Outline

Partitioning Methods. Outline Partitioning Methods 1 Outline Introduction to Hardware-Software Codesign Models, Architectures, Languages Partitioning Methods Design Quality Estimation Specification Refinement Co-synthesis Techniques

More information

Fiona A Tool to Analyze Interacting Open Nets

Fiona A Tool to Analyze Interacting Open Nets Fiona A Tool to Analyze Interacting Open Nets Peter Massuthe and Daniela Weinberg Humboldt Universität zu Berlin, Institut für Informatik Unter den Linden 6, 10099 Berlin, Germany {massuthe,weinberg}@informatik.hu-berlin.de

More information

Automatic Graph-based Success Tree Construction and Analysis

Automatic Graph-based Success Tree Construction and Analysis Automatic Graph-based Success Tree Construction and Analysis Hananeh Aliee, Michael Glaß, Dr.-Ing., Rolf Wanka, Dr., Jürgen Teich, Dr.-Ing., Key Words: Success tree, fault tree, transient fault, permanent

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Embedded Systems CS - ES

Embedded Systems CS - ES Embedded Systems - 1 - Synchronous dataflow REVIEW Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this Each edge labeled with number of tokens consumed/produced

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/32963 holds various files of this Leiden University dissertation Author: Zhai, Jiali Teddy Title: Adaptive streaming applications : analysis and implementation

More information

Performance Analysis of Weakly-Consistent Scenario-Aware Dataflow Graphs

Performance Analysis of Weakly-Consistent Scenario-Aware Dataflow Graphs Performance Analysis of Weakly-Consistent Scenario-Aware Dataflow Graphs Marc Geilen 1, Joachim Falk 2, Christian Haubelt 3, Twan Basten 1,4, Bart Theelen 4 and Sander Stuijk 1 1 Electrical Engineering

More information

SystemCoDesigner: Performance Modeling Tutorial

SystemCoDesigner: Performance Modeling Tutorial SystemCoDesigner: Performance Modeling Tutorial Martin Streubühr Sebastian Graf Hardware/Software Co-Design Department of Computer Science University of Erlangen-Nuremberg Germany Version 1.4 Christian

More information

From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons

From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons Florian Dittmann, Franz J. Rammig Heinz Nixdorf Institute University of Paderborn, Germany Motivation Making reconfigurable computing

More information

System Level Modeling and Performance Simulation for Dynamic Reconfigurable Computing Systems in SystemC

System Level Modeling and Performance Simulation for Dynamic Reconfigurable Computing Systems in SystemC In Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen. by Christian Haubelt, Jürgen Teich (Eds.). GI/ITG/GMM-Workshop, Shaker Verlag, Aachen, March 5-7, 2007

More information

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution of KPN on MPSoC Efficiency regarding speed-up small memory footprint portability

More information

An Evaluation of Domain-Specific Language Technologies for Code Generation

An Evaluation of Domain-Specific Language Technologies for Code Generation An Evaluation of Domain-Specific Language Technologies for Code Generation Christian Schmitt, Sebastian Kuckuk, Harald Köstler, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, System Simulation,

More information

Dataflow: The Road Less Complex

Dataflow: The Road Less Complex Dataflow: The Road Less Complex Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of Washington Sponsored by NSF and Intel Things to keep you up at night (~2016) Opportunities 8 billion

More information

SystemCoDesigner: System Programming Tutorial

SystemCoDesigner: System Programming Tutorial SystemCoDesigner: System Programming Tutorial Martin Streubühr Falk Christian Zebelein Hardware/Software Co-Design Department of Computer Science University of Erlangen-Nuremberg Germany Version 1.2 Christian

More information

SAT, SMT and QBF Solving in a Multi-Core Environment

SAT, SMT and QBF Solving in a Multi-Core Environment SAT, SMT and QBF Solving in a Multi-Core Environment Bernd Becker Tobias Schubert Faculty of Engineering, Albert-Ludwigs-University Freiburg, 79110 Freiburg im Breisgau, Germany {becker schubert}@informatik.uni-freiburg.de

More information

Parameterized Modeling and Scheduling for Dataflow Graphs 1

Parameterized Modeling and Scheduling for Dataflow Graphs 1 Technical Report #UMIACS-TR-99-73, Institute for Advanced Computer Studies, University of Maryland at College Park, December 2, 999 Parameterized Modeling and Scheduling for Dataflow Graphs Bishnupriya

More information

Mapping of Applications to Multi-Processor Systems

Mapping of Applications to Multi-Processor Systems Springer, 2010 Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions

More information

First Experiences with Intel Cluster OpenMP

First Experiences with Intel Cluster OpenMP First Experiences with Intel Christian Terboven, Dieter an Mey, Dirk Schmidl, Marcus Wagner surname@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University, Germany IWOMP 2008 May

More information

Department Informatik. EMPYA: An Energy-Aware Middleware Platform for Dynamic Applications. Technical Reports / ISSN

Department Informatik. EMPYA: An Energy-Aware Middleware Platform for Dynamic Applications. Technical Reports / ISSN Department Informatik Technical Reports / ISSN 2191-58 Christopher Eibel, Thao-Nguyen Do, Robert Meißner, Tobias Distler EMPYA: An Energy-Aware Middleware Platform for Dynamic Applications Technical Report

More information

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi Lecture - 34 Compilers for Embedded Systems Today, we shall look at the compilers, which

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

EE382N: Embedded System Design and Modeling

EE382N: Embedded System Design and Modeling EE382N: Embedded System Design and Modeling Lecture 11 Mapping & Exploration Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu Lecture 11: Outline

More information

Actor-oriented Modeling and Simulation of Cut-through Communication in Network Controllers

Actor-oriented Modeling and Simulation of Cut-through Communication in Network Controllers In Proceedings of Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), Kaiserslautern, Germany, March 05-07, 2012 Actor-oriented Modeling and Simulation

More information

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

Feature Detection Plugins Speed-up by

Feature Detection Plugins Speed-up by Feature Detection Plugins Speed-up by OmpSs@FPGA Nicola Bettin Daniel Jimenez-Gonzalez Xavier Martorell Pierangelo Nichele Alberto Pomella nicola.bettin@vimar.com, pierangelo.nichele@vimar.com, alberto.pomella@vimar.com

More information

Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs

Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs Stavros Tripakis Dai Bui Marc Geilen Bert Rodiers Edward A. Lee Electrical Engineering and Computer Sciences

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

Optimization of Behavioral IPs in Multi-Processor System-on- Chips

Optimization of Behavioral IPs in Multi-Processor System-on- Chips Optimization of Behavioral IPs in Multi-Processor System-on- Chips Yidi Liu and Benjamin Carrion Schafer # Department of Electronic and Information Engineering b.carrionschafer@polyu.edu.hk # Outline High-Level

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

Sequential Circuit Test Generation Using Decision Diagram Models

Sequential Circuit Test Generation Using Decision Diagram Models Sequential Circuit Test Generation Using Decision Diagram Models Jaan Raik, Raimund Ubar Department of Computer Engineering Tallinn Technical University, Estonia Abstract A novel approach to testing sequential

More information

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator Embedded Computing Conference 2017 Matthias Frei zhaw InES Patrick Müller Enclustra GmbH 5 September 2017 Agenda Enclustra introduction

More information

Partitioning of computationally intensive tasks between FPGA and CPUs

Partitioning of computationally intensive tasks between FPGA and CPUs Partitioning of computationally intensive tasks between FPGA and CPUs Tobias Welti, MSc (Author) Institute of Embedded Systems Zurich University of Applied Sciences Winterthur, Switzerland tobias.welti@zhaw.ch

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Anan Bouakaz Pascal Fradet Alain Girault Real-Time and Embedded Technology and Applications Symposium, Vienna April 14th, 2016

More information

GRAph Parallel Actor Language A Programming Language for Parallel Graph Algorithms

GRAph Parallel Actor Language A Programming Language for Parallel Graph Algorithms GRAph Parallel Actor Language A Programming Language for Parallel Graph Algorithms Thesis by Michael delorimier In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California

More information

Habermann, Philipp Design and Implementation of a High- Throughput CABAC Hardware Accelerator for the HEVC Decoder

Habermann, Philipp Design and Implementation of a High- Throughput CABAC Hardware Accelerator for the HEVC Decoder Habermann, Philipp Design and Implementation of a High- Throughput CABAC Hardware Accelerator for the HEVC Decoder Conference paper Published version This version is available at https://doi.org/10.14279/depositonce-6780

More information

Search Space Reduction for E/E-Architecture Partitioning

Search Space Reduction for E/E-Architecture Partitioning Search Space Reduction for E/E-Architecture Partitioning Andreas Ettner Robert Bosch GmbH, Corporate Reasearch, Robert-Bosch-Campus 1, 71272 Renningen, Germany andreas.ettner@de.bosch.com Abstract. As

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Massively Parallel Phase Field Simulations using HPC Framework walberla

Massively Parallel Phase Field Simulations using HPC Framework walberla Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich

More information