PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

Size: px
Start display at page:

Download "PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming"

Transcription

1 PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming Maxime Pelcat, Karol Desnos, Julien Heulot Clément Guy, Jean-François Nezan, Slaheddine Aridhi EDERC 2014 Conference, Milan, September 11 th 1

2 Transistors/chip x2 every 18 months Source: Hardware-dependent Software, Ecker, et. al 2

3 Lines of code/chip x3.5 every 18 months Transistors/chip x2 every 18 months Source: Hardware-dependent Software, Ecker, et. al 3

4 Lines of code/chip x3.5 every 18 months Transistors/chip x2 every 18 months Lines of code/day +25% every 18 months Source: Hardware-dependent Software, Ecker, et. al 4

5 Lines of code/chip x3.5 every 18 months Transistors/chip x2 every 18 months Software Productivity Gap Lines of code/day +25% every 18 months Source: Hardware-dependent Software, Ecker, et. al 5

6 Typical Single DSP Environment INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES C/C++ Algorithm Code Compiler Program Command Line Options Simulator + Debugger + Profiler OS Core (s) 6

7 Multicore DSP Rapid Prototyping Functional Algorithm Model + Code Rapid Prototyping Program Program Program Program Deployment Constraints + Options Architecture Model Simulator + Debugger + Profiler OS Core 1 OS Core 2 7

8 Reduce Software Productivity Gap In early design phases: Metrics Design parallel algorithms Automatic mapping and scheduling Predictable time and memory choose the right algorithm and hardware 8

9 Reduce Software Productivity Gap In late design phases: Rapid Prototyping Automatic multi-core speedup Inter-core communication Guaranteed Deadlock-freeness 9

10 Reduce Software Productivity Gap For migration to a new hardware Seamless porting to a new architecture Legacy code reuseability Portable performance Dataflow modelling can help 10

11 PREESM for C6678 INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Algo dataflow + C Code Program Program Program Program PREESM Multiple C Programs Scenario Archi Model PREESM Simulator + CCS Debugger and Profiler SYS/ BIOS C66 C6678 SYS/ BIOS C66 11

12 Algo dataflow: PiSDF INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Read 1 Size Size Size Size Filter Size Display K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII 12

13 PiSDF Size Read 1 Size Size Size Size Filter Size Display K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII 13

14 back feed in out INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES PiSDF Size Read C Code 1 Size Size Size Size Filter Size Size Display C Code N Size Size Size/N Size/N Kernel Size/N Size/N Size Size K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII 14

15 back feed in out INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES PiSDF Size Read C Code 1 Size Size Size Size Filter Size Size Display C Code N Size Size Size/N Size/N Kernel C Code Size/N Size/N Size Size K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII 15

16 Algo dataflow: PiSDF INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES PiSDF MoC is: Hierarchical & Compositional Statically parameterizable Dynamically reconfigurable PiSDF fosters: - Predictability - Parallelism - Lightweight runtime overhead - Developer-friendliness K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII 16

17 Archi: System-Level Archi. Model Representing contentions as TDMA core1 TMS320C6678 core5 core2 core3 core4 MSMC 16 GB/s DDR3 5.3 GB/s core6 core7 core8 17

18 PREESM: Multicore Scheduling Scheduling based on latency and load balancing 18

19 PREESM: Multicore Scheduling Scheduling based on latency and load balancing 19

20 PREESM: Multicore Scheduling Scheduling based on latency and load balancing core1 core2 core3 core4 20

21 PREESM: Memory Bounds INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Bounding the memory needs of an application graph to: - Evaluate the memory requirements - Adjust the size of architecture memory - Assess the optimality of a memory allocation Insufficient memory Possible allocated memory Wasted memory 0 Lower Bound Upper Bound Available Memory 21

22 PREESM: Prototype Code Generation A B C D E o1 o2 A B C D E o1 Actor A Actor B Actor D o2 Actor C time Actor E 22 22

23 PREESM Features INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Open Source Tool Available on GitHub Research-Oriented Tool New models, optimizations, scheduling Eclipse-based Integrated Tool Several plug-ins, metamodels Extended Web Tutorials 23

24 Other Tools INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES OpenMP, OpenEM Adding Rapid Prototyping MAPS Compiler, Polycore Polymapper, SynDEx Open-source code 24

25 PREESM Features INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES 25

26 Some Results on Stereo Matching Theoretical speedup Measured Performance allocated memory lower memory bund Number of cores Number of cores 26

27 Conclusion INSTITUT D ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Reduce Software Productivity Gap Design space exploration Rapid Prototyping Extract coarse grain parallelism Portable performance PREESM Dataflow modelling can help! Good decisions necessitate extensive information on both computation and data flow 27

28 Thanks! M. Pelcat, K. Desnos, J. Heulot, C. Guy, J.-F. Nezan, S. Aridhi, "PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming" EDERC, PREESM Tutorial 16:00 17:00 - Room: Oro Plenaria M. Pelcat, S. Aridhi, J. Piat, J.-F. Nezan, "Physical Layer Multicore Prototyping: A Dataflow-Based Approach for LTE enodeb". Springer,

Tutorial: PREESM - Dataflow Programming of Multicore DSPs

Tutorial: PREESM - Dataflow Programming of Multicore DSPs Tutorial: PREESM - Dataflow Programming of Multicore DSPs Karol Desnos, Clément Guy, Maxime Pelcat EDERC 2014 Conference, Milan, September 11 th 1 PREESM http://preesm.sourceforge.net/website Eclipse-based

More information

PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean François Nezan, Slaheddine Aridhi To cite this

More information

Dynamic Dataflow. Seminar on embedded systems

Dynamic Dataflow. Seminar on embedded systems Dynamic Dataflow Seminar on embedded systems Dataflow Dataflow programming, Dataflow architecture Dataflow Models of Computation Computation is divided into nodes that can be executed concurrently Dataflow

More information

HW/SW Cyber-System Co-Design and Modeling

HW/SW Cyber-System Co-Design and Modeling HW/SW Cyber-System Co-Design and Modeling Julio OLIVEIRA Karol DESNOS Karol Desnos (IETR) & Julio Oliveira (TNO) 1 Introduction Who are we? Julio de OLIVEIRA Position: TNO - Researcher & innovation scientist

More information

MARTE to PiSDF transformation for data-intensive applications analysis

MARTE to PiSDF transformation for data-intensive applications analysis MARTE to PiSDF transformation for data-intensive applications analysis Manel Ammar, Mouna Baklouti, Maxime Pelcat, Karol Desnos, Mohammed Abid To cite this version: Manel Ammar, Mouna Baklouti, Maxime

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

Models of Architecture

Models of Architecture Models of Architecture Maxime Pelcat, Karol Desnos, Luca Maggiani, Yanzhou Liu, Julien Heulot, Jean-François Nezan, Shuvra S. Bhattacharyya To cite this version: Maxime Pelcat, Karol Desnos, Luca Maggiani,

More information

A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore Embedded Systems

A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore Embedded Systems A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore Embedded Systems Maxime Pelcat, Jean François Nezan, Jonathan Piat, Jerome Croizer, Slaheddine Aridhi To cite this version:

More information

Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs

Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs Karol Desnos To cite this version: Karol Desnos. Memory Study and Dataflow Representations for

More information

Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors

Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors Author manuscript, published in " " Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors Julien Heulot, Jani

More information

Automatic Generation of S-LAM Descriptions from UML/MARTE for the DSE of Massively Parallel Embedded Systems

Automatic Generation of S-LAM Descriptions from UML/MARTE for the DSE of Massively Parallel Embedded Systems Automatic Generation of S-LAM Descriptions from UML/MARTE for the DSE of Massively Parallel Embedded Systems Manel Ammar, Mouna Baklouti, Maxime Pelcat, Karol Desnos, Mohamed Abid To cite this version:

More information

On Memory Reuse Between Inputs and Outputs of Dataflow Actors

On Memory Reuse Between Inputs and Outputs of Dataflow Actors On Memory Reuse Between Inputs and Outputs of Dataflow Actors Karol Desnos, Maxime Pelcat, Jean François Nezan, Slaheddine Aridhi To cite this version: Karol Desnos, Maxime Pelcat, Jean François Nezan,

More information

Partial Expansion Graphs: Exposing Parallelism and Dynamic Scheduling Opportunities for DSP Applications

Partial Expansion Graphs: Exposing Parallelism and Dynamic Scheduling Opportunities for DSP Applications In Proceedings of the International Conference on Application Specific Systems, Architectures, and Processors, 2012, to appear. Partial Expansion Graphs: Exposing Parallelism and Dynamic Scheduling Opportunities

More information

Relaxed Subgraph Execution Model for the Throughput Evaluation of IBSDF Graphs

Relaxed Subgraph Execution Model for the Throughput Evaluation of IBSDF Graphs Relaxed Subgraph Execution Model for the Throughput Evaluation of ISF raphs Hamza eroui, Karol esnos and Jean-François Nezan IETR, INS Rennes CNRS UMR 664, UE Rennes, France Email: hderoui, kdesnos, jnezan@insa-rennes.fr

More information

Programming Heterogeneous Embedded Systems for IoT

Programming Heterogeneous Embedded Systems for IoT Programming Heterogeneous Embedded Systems for IoT Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Get-together toward a sustainable collaboration in IoT

More information

AVSynDEx Methodology For Fast Prototyping of Multi-C6x DSP Architectures

AVSynDEx Methodology For Fast Prototyping of Multi-C6x DSP Architectures AVSynDEx Methodology For Fast Prototyping of Multi-C6x DSP Architectures Jean-François NEZAN, Virginie FRESSE, Olivier DEFORGES, Michael RAULET CNRS UMR IETR (Institut en Electronique et Télecommunications

More information

Department of Electrical and Computer Engineering, University of Maryland

Department of Electrical and Computer Engineering, University of Maryland Department of Electrical and Computer Engineering, University of Maryland OUTLINE Introduction: Problem statement Background Goals Co-processing units generation: Approach and baseline Multi-Dataflow Composer

More information

Optimization of automatically generated multi-core code for the LTE RACH-PD algorithm

Optimization of automatically generated multi-core code for the LTE RACH-PD algorithm Optimization of automatically generated multi-core code for the LTE RACH-PD algorithm Maxime Pelcat, Slaheddine Aridhi, Jean François Nezan To cite this version: Maxime Pelcat, Slaheddine Aridhi, Jean

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Multikernels Doug Shook Multikernels Two predominant approaches to OS: Full weight kernel Lightweight kernel Why not both? How does implementation affect usage and performance?

More information

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Mickaël Dardaillon Research Intern with NOKIA Technologies January 27th, 2015 2 / 33 What we know

More information

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

More information

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013 Getting the Most out of Advanced ARM IP ARM Technology Symposia November 2013 Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block are now Sub-Systems Cortex

More information

Design and Implementation of Adaptive Signal Processing Systems Using Markov Decision Processes

Design and Implementation of Adaptive Signal Processing Systems Using Markov Decision Processes Design and Implementation of Adaptive Signal Processing Systems Using Markov Decision Processes Lin Li, Adrian E. Sapio, Jiahao Wu, Yanzhou Liu, Kyunghun Lee, Marilyn Wolf, Shuvra S. Bhattacharyya University

More information

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments

More information

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc. Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing

More information

Dynamic inter-core scheduling in Barrelfish

Dynamic inter-core scheduling in Barrelfish Dynamic inter-core scheduling in Barrelfish. avoiding contention with malleable domains Georgios Varisteas, Mats Brorsson, Karl-Filip Faxén November 25, 2011 Outline Introduction Scheduling & Programming

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

MULTICORE DIGITAL SIGNAL PROCESSING

MULTICORE DIGITAL SIGNAL PROCESSING 1 MULTICORE DIGITAL SIGNAL PROCESSING Maxime Pelcat mpelcat@insa-rennes.fr Slides from M. Pelcat, K. Desnos, J.-F. Nezan, D. Ménard, M. Raulet, J. Gorin, F. Pescador Institute 2 IETR INSA Rennes 3 Introduction:

More information

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

University of Cagliari Multicore Digital Signal Processing

University of Cagliari Multicore Digital Signal Processing Seminar @ University of Cagliari Multicore Digital Signal Processing Maxime Pelcat June 2013 Slides from M. Pelcat, K. Desnos, J-F. Nezan, D. Ménard, M. Raulet, J Gorin 2 Porting Algorithms on Multicore

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Orcc: multimedia development made easy

Orcc: multimedia development made easy Orcc: multimedia development made easy Hervé Yviquel, Antoine Lorence, Khaled Jerbi, Gildas Cocherel, Alexandre Sanchez, Mickaël Raulet To cite this version: Hervé Yviquel, Antoine Lorence, Khaled Jerbi,

More information

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

HPC learning using Cloud infrastructure

HPC learning using Cloud infrastructure HPC learning using Cloud infrastructure Florin MANAILA IT Architect florin.manaila@ro.ibm.com Cluj-Napoca 16 March, 2010 Agenda 1. Leveraging Cloud model 2. HPC on Cloud 3. Recent projects - FutureGRID

More information

Conclusions. Introduction. Objectives. Module Topics

Conclusions. Introduction. Objectives. Module Topics Conclusions Introduction In this chapter a number of design support products and services offered by TI to assist you in the development of your DSP system will be described. Objectives As initially stated

More information

From MDD back to basic: Building DRE systems

From MDD back to basic: Building DRE systems From MDD back to basic: Building DRE systems, ENST MDx in software engineering Models are everywhere in engineering, and now in software engineering MD[A, D, E] aims at easing the construction of systems

More information

Experience in Developing Model- Integrated Tools and Technologies for Large-Scale Fault Tolerant Real-Time Embedded Systems

Experience in Developing Model- Integrated Tools and Technologies for Large-Scale Fault Tolerant Real-Time Embedded Systems Institute for Software Integrated Systems Vanderbilt University Experience in Developing Model- Integrated Tools and Technologies for Large-Scale Fault Tolerant Real-Time Embedded Systems Presented by

More information

MODELING OF BLOCK-BASED DSP SYSTEMS

MODELING OF BLOCK-BASED DSP SYSTEMS MODELING OF BLOCK-BASED DSP SYSTEMS Dong-Ik Ko and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011 MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to

More information

FUJITSU Cloud Service K5 CF Service Functional Overview

FUJITSU Cloud Service K5 CF Service Functional Overview FUJITSU Cloud Service K5 CF Service Functional Overview December 2016 Fujitsu Limited - Unauthorized copying and replication of the contents of this document is prohibited. - The contents of this document

More information

OpenCL: History & Future. November 20, 2017

OpenCL: History & Future. November 20, 2017 Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017 OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language

More information

Ohua: Implicit Dataflow Programming for Concurrent Systems

Ohua: Implicit Dataflow Programming for Concurrent Systems Ohua: Implicit Dataflow Programming for Concurrent Systems Sebastian Ertel Compiler Construction Group TU Dresden, Germany Christof Fetzer Systems Engineering Group TU Dresden, Germany Pascal Felber Institut

More information

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between

More information

Runtime multicore scheduling techniques for dispatching parameterized signal and vision dataflow applications on heterogeneous MPSoCs

Runtime multicore scheduling techniques for dispatching parameterized signal and vision dataflow applications on heterogeneous MPSoCs Runtime multicore scheduling techniques for dispatching parameterized signal and vision dataflow applications on heterogeneous MPSoCs Julien Heulot To cite this version: Julien Heulot. Runtime multicore

More information

Kismet: Parallel Speedup Estimates for Serial Programs

Kismet: Parallel Speedup Estimates for Serial Programs Kismet: Parallel Speedup Estimates for Serial Programs Donghwan Jeon, Saturnino Garcia, Chris Louie, and Michael Bedford Taylor Computer Science and Engineering University of California, San Diego 1 Questions

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Eclipse in Embedded. Neha Garg : Prerna Rustagi :

Eclipse in Embedded. Neha Garg : Prerna Rustagi : Eclipse in Embedded Neha Garg :200601138 Prerna Rustagi : 200601203 Flow Of Presentation What is Eclipse? Eclipse Platform Architecture Features in Eclipse(RCP) Exploring Eclipse s ercp Eclipse For Embdded

More information

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of

More information

Optimize HPC - Application Efficiency on Many Core Systems

Optimize HPC - Application Efficiency on Many Core Systems Meet the experts Optimize HPC - Application Efficiency on Many Core Systems 2018 Arm Limited Florent Lebeau 27 March 2018 2 2018 Arm Limited Speedup Multithreading and scalability I wrote my program to

More information

Introduction to AADL analysis and modeling with FACE Units of Conformance

Introduction to AADL analysis and modeling with FACE Units of Conformance Introduction to AADL analysis and modeling with FACE Units of Conformance AMRDEC Aviation Applied Technology Directorate Contract Number W911W6-17- D-0003 Delivery Order 3 This material is based upon work

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

Early Models in Silicon with SystemC synthesis

Early Models in Silicon with SystemC synthesis Early Models in Silicon with SystemC synthesis Agility Compiler summary C-based design & synthesis for SystemC Pure, standard compliant SystemC/ C++ Most widely used C-synthesis technology Structural SystemC

More information

On mapping to multi/manycores

On mapping to multi/manycores On mapping to multi/manycores Jeronimo Castrillon Chair for Compiler Construction (CCC) TU Dresden, Germany MULTIPROG HiPEAC Conference Stockholm, 24.01.2017 Mapping for dataflow programming models MEM

More information

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator Embedded Computing Conference 2017 Matthias Frei zhaw InES Patrick Müller Enclustra GmbH 5 September 2017 Agenda Enclustra introduction

More information

CSSE 490 Model-Based Software Engineering: More MBSD. Shawn Bohner Office: Moench Room F212 Phone: (812)

CSSE 490 Model-Based Software Engineering: More MBSD. Shawn Bohner Office: Moench Room F212 Phone: (812) CSSE 490 Model-Based Software Engineering: More MBSD Shawn Bohner Office: Moench Room F212 Phone: (812) 877-8685 Email: bohner@rose-hulman.edu Learning Outcomes: MBE Discipline Relate Model-Based Engineering

More information

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation

More information

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Industrial Multicore Software with EMB²

Industrial Multicore Software with EMB² Siemens Industrial Multicore Software with EMB² Dr. Tobias Schüle, Dr. Christian Kern Introduction In 2022, multicore will be everywhere. (IEEE CS) Parallel Patterns Library Apple s Grand Central Dispatch

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Technology and Design Tools for Multicore Embedded Systems Software Development

Technology and Design Tools for Multicore Embedded Systems Software Development Technology and Design Tools for Multicore Embedded Systems Software Development Yuriy Sheynin, Alexey Syschikov, Boris Sedov Saint Petersburg State University of Aerospace Instrumentation Why do we need

More information

The OpenVX Computer Vision and Neural Network Inference

The OpenVX Computer Vision and Neural Network Inference The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos

More information

Modeling pilot project at Ericsson Expert Analytics

Modeling pilot project at Ericsson Expert Analytics Modeling pilot project at Ericsson Expert Analytics Gábor Ferenc Kovács, Gergely Dévai ELTE-Soft, ELTE University, Ericsson Ericsson Modeling Days, Stockholm, 13-14 September 2016 Overview Background of

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

A unified multicore programming model

A unified multicore programming model A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging

More information

A Meta-Modeling-Based Approach for Automatic Generation of Fault- Injection Processes

A Meta-Modeling-Based Approach for Automatic Generation of Fault- Injection Processes A Meta-Modeling-Based Approach for Automatic Generation of Fault- Injection Processes B.-A. Tabacaru, M. Chaari, W. Ecker, T. Kruse Infineon Technologies AG Accellera Systems Initiative 1 Outline Motivation

More information

Mercury Computer Systems & The Cell Broadband Engine

Mercury Computer Systems & The Cell Broadband Engine Mercury Computer Systems & The Cell Broadband Engine Georgia Tech Cell Workshop 18-19 June 2007 About Mercury Leading provider of innovative computing solutions for challenging applications R&D centers

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

THE FUTURE OF GPU DATA MANAGEMENT. Michael Wolfe, May 9, 2017

THE FUTURE OF GPU DATA MANAGEMENT. Michael Wolfe, May 9, 2017 THE FUTURE OF GPU DATA MANAGEMENT Michael Wolfe, May 9, 2017 CPU CACHE Hardware managed What data to cache? Where to store the cached data? What data to evict when the cache fills up? When to store data

More information

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Sam White Parallel Programming Lab UIUC 1 Introduction How to enable Overdecomposition, Asynchrony, and Migratability in existing

More information

Easy Multicore Programming using MAPS

Easy Multicore Programming using MAPS Easy Multicore Programming using MAPS Jeronimo Castrillon, Maximilian Odendahl Multicore Challenge Conference 2012 September 24 th, 2012 Institute for Communication Technologies and Embedded Systems Outline

More information

How to explicitly defines MoCCs within a model

How to explicitly defines MoCCs within a model CCSL@work: How to explicitly defines MoCCs within a model AOSTE sophia I3S/UNS/INRIA Synchron 2010 1 CCSL@work: the RT-Simex project (or a mean to check an implementation against its specification ) AOSTE

More information

The etrice Eclipse Project Proposal

The etrice Eclipse Project Proposal The etrice Eclipse Project Proposal Dipl.-Ing. Thomas Schütz, Protos Software GmbH Eclipse Embedded Day 2010, Stuttgart Agenda Motivation Scope of etrice ROOM Language Codegenerators Middleware Realization

More information

Adaptive SMT Control for More Responsive Web Applications

Adaptive SMT Control for More Responsive Web Applications Adaptive SMT Control for More Responsive Web Applications Hiroshi Inoue and Toshio Nakatani IBM Research Tokyo University of Tokyo Oct 27, 2014 IISWC @ Raleigh, NC, USA Response time matters! Peak throughput

More information

Third annual ITU IMT-2020/5G Workshop and Demo Day 2018

Third annual ITU IMT-2020/5G Workshop and Demo Day 2018 All Sessions Outcome Third annual ITU IMT-2020/5G Workshop and Demo Day 2018 Geneva, Switzerland, 18 July 2018 Session 1: IMT-2020/5G standardization (part 1): activities and future plan in ITU-T SGs 1.

More information

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution of KPN on MPSoC Efficiency regarding speed-up small memory footprint portability

More information

Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov , OpenStack Summit Paris.

Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov , OpenStack Summit Paris. Telemetry the foundation of intelligent cloud orchestration. Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov 3 2014, OpenStack Summit Paris. http://sched.co/1xj2lm9 Datacenter Trends and

More information

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing OmpCloud: Bridging the Gap between OpenMP and Cloud Computing Hervé Yviquel, Marcio Pereira and Guido Araújo University of Campinas (UNICAMP), Brazil A bit of background qguido Araujo, PhD Princeton University

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore

More information

Placement de processus (MPI) sur architecture multi-cœur NUMA

Placement de processus (MPI) sur architecture multi-cœur NUMA Placement de processus (MPI) sur architecture multi-cœur NUMA Emmanuel Jeannot, Guillaume Mercier LaBRI/INRIA Bordeaux Sud-Ouest/ENSEIRB Runtime Team Lyon, journées groupe de calcul, november 2010 Emmanuel.Jeannot@inria.fr

More information

Modelling, Analysis and Scheduling with Dataflow Models

Modelling, Analysis and Scheduling with Dataflow Models technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

CS4961 Parallel Programming. Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code 09/23/2010

CS4961 Parallel Programming. Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code 09/23/2010 Parallel Programming Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code Mary Hall September 23, 2010 1 Observations from the Assignment Many of you are doing really well Some more are doing

More information

Wireless SDN 기술. Seungwon Shin KAIST

Wireless SDN 기술. Seungwon Shin KAIST Wireless SDN 기술 Seungwon Shin KAIST Background First, we need to talk about traditional network devices Consist of two main components Control path (plane) decision module (e.g., routing) Data path (plane)

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

Overview of ROCCC 2.0

Overview of ROCCC 2.0 Overview of ROCCC 2.0 Walid Najjar and Jason Villarreal SUMMARY FPGAs have been shown to be powerful platforms for hardware code acceleration. However, their poor programmability is the main impediment

More information

Scalable Shared Memory Programing

Scalable Shared Memory Programing Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Copyright 2014 Xilinx

Copyright 2014 Xilinx IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17, Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems

More information

System-level co-modeling AADL and Simulink specifications using Polychrony (and Syndex)

System-level co-modeling AADL and Simulink specifications using Polychrony (and Syndex) System-level co-modeling AADL and Simulink specifications using Polychrony (and Syndex) AADL Standards Meeting June 6., 2011 Jean-Pierre Talpin, INRIA Parts of this presentation are joint work with Paul,

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information