Dataflow programming for heterogeneous computing systems

Size: px
Start display at page:

Download "Dataflow programming for heterogeneous computing systems"

Transcription

1 Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden Tutorial: Algorithmic specification, tools and algorithms for programming heterogeneous platforms. PACT Conference. San Francisco, October 19 th 2015

2 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 2 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

3 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 3 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

4 Heterogeneous computing systems Today s heterogeneity Desktop/HPC: GPGPU, GP + ACC Embedded: RISCs, DSPs, ASIPs, è Resources: different performance/energy characteristics MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Tomorrow s heterogeneity: Emerging technologies Heterogeneity beyond performance and energy Reliability/error tolerance Computation model 4 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

5 Heterogeneity: Cfaed Vision German Excellence Cluster: Goal to explore new technologies for electronic information processing which overcome the limits of CMOS technology Information Processing Systems research to handle heterogeneity Devices & Circuits Materials & Functions CMOS (industry focus) H HAEC: Highly Adaptive Energy-Efficient Computing A Silicon Nanowires B Carbon F Orchestration C Organic G Resilience D Biomolecular Assembly E Chemical Biology to inspire solutions I Biological Systems Material research for post-cmos technologies 5 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

6 Heterogeneity: Example SiNW SiNW: Silicon Nanowires Multi-gate devices with less performance penalty Reconfigurable P/N functionality Possibilities New micro architectures New pipeline structures New field programmable devices [Trommer15] 6 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

7 Heterogeneity: Example chemical processing Lab-on-Chips: for sensing, analysis and test Also for computing? Different kinds of transistors Oscillators and other components Fundamentally different way of computation [Voigt14] 7 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

8 Heterogeneity: Example DNA origami DNA origami: Self-assembled 2D/3D structures made of DNA strands Use structures to build advance electronic devices Example: Plasmonic waveguides Courtesy: Thorsten Lars-Schmidt 8 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

9 Consequence of heterogeneity MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Already difficult to program them today, what about tomorrow? Need for models and abstraction 9 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

10 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 10 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

11 Models: Introduction Von Neumann model makes things complicated Sharing state Data races Control unit Inst. reg Prog. counter Memory & I/O Processing unit Reg. file ALU Central Processing Unit (CPU) Task graphs: A simple parallel programming model Intel TBB,.NET Task parallel library (TPL), OpenMP Tasks, Runtime and data management, e.g., StarPU [Augonnet10] 11 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

12 Directed acyclic task graphs Very well studied, see for example [Kwok99] Difficult problem for heterogeneous systems RISC Interconnect DSP Processor Shared Memory 1 2 4? 1 3 Processor 2 Time Time 12 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

13 Dataflow models Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor cf. LabVIEW models Time 13 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

14 Dataflow models (2) Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor What now? Time 14 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

15 Dataflow models (3) Synchronous Dataflow (SDF): every actor has a fixed behavior 2 e a1 a2 a3 e1 e3 6 2 e2 1 a3 always writes 1 token to e4 Cyclo-Static Dataflow (CSDF): every actor has a set of fixed behaviors {1,0,0} e2 1 1 e1 {0,1,0} 1 a1 a2 a3 {0,2} e3 a1: writes 1 token, then 0, then 0 to e2 15 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

16 Dynamic models and Kahn Process Networks Dynamic dataflow: set of firing rules per actor i1 a1 a2 a3 i2 Kahn process networks (KPN): nodes are now called processes e4 p1 p2 p3 e1 e3 p1: writes any amount of tokens to e2 at any time e2 16 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

17 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 17 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

18 Analysis and synthesis MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 Application Architecture model L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Non-functional specification Communication support HW queues Network Processor Packet DMA cf. Silexica s tool flows Analysis Synthesis Code generation Property models (timing, energy, error, ) PNargs_ifft_r.ID = 6U; PNargs_ifft_r.PNchannel_freq_coef = fi PNargs_ifft_r.PNnum_freq_coef = 0U; PNargs_ifft_r.PNchannel_time_coef = si PNargs_ifft_r.channel = 1; sink_left = IPCllmrf_open(3, 1, 1); sink_right = IPCllmrf_open(7, 1, 1); PNargs_sink.ID = 7U; PNargs_sink.PNchannel_in_left = sink_l PNargs_sink.PNnum_in_left = 0U; PNargs_sink.PNchannel_in_right = sink_ PNargs_sink.PNnum_in_right = 0U; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_f taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn ft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_i taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn fft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; 18 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

19 Example for SDFs Compute topology matrix, and solve system of equations Solution: repetition vector serve to unroll the graph [1 3 2] Perform mapping and scheduling on the resulting directed acyclic graph (DAG) 19 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

20 Example for KPNs: Static & dynamic analysis Need to understand process interactions Unrolled CFG for read(&c1); f3(...); write(&c3); f4(...); write(&c3); [Castrillon14] f1(...); write(&c2); read(&c2); f5(...); write(&c4); read(&c0); f2(...); write(&c1); read(&c3); f6(...); read(&c3); f7(...); read(&c4); R4 R3 R2 R4 R3 R2 R1 Unrolled CFG for f2 f1 f5 R1 f1 f1 f1 f2 f3 f1 f6 f7 f7 f7 f7 f4 f1 f5 f3 f6 f7 f7 f7 f7 f4 f5 f5 f5 f5... t2 t t t J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

21 KPN & DDF: Tracing Dynamic analysis based on execution traces [Castrillon10/13, Brunet15, Singh15] DAG representation for further analysis and synthesis 21 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

22 Models from functional specification Inspect functional specification of actors/processes (cf. 2 nd talk) Instrumentation, emulation, cost models, For timing, energy, errors, Example [SAMOS14] Profiling: Execution counts, branch stats 22 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

23 Models based on algorithmic descriptions When functional specification is not meant for synthesis Common/required in heterogeneous systems (special components) Need to match algorithms to hardware components [Castrillon10b, Castrillon11] Algorithm library Plain code Plain code Platform model + characterization of special components N: Algorithmic actors F: Existing implementation in target platform 23 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

24 Multiple-applications Use traces and mappings to reason about platform sharing 24 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

25 Multiple-applications (2) Quickly discard bad multi-application configurations by observing the platform utilization profiles 25 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

26 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 26 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

27 Summary Need programming models (and HW/SW stacks) to handle heterogeneity Even more dramatic in the post-cmos era Dataflow models Natural way to describe some applications Amenable to analysis and synthesis for parallel execution Discuss different kinds of models and required analysis Need models of hardware for synthesis 27 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

28 Acknowledgements German Cluster of Excellence: Center for Advancing Electronics Dresden ( Collaborative research center (CRC): Highly Adaptive Energy-Efficient Computing (HAEC) CRC 912: Highly Adaptive Energy- Efficient Computing Silexica Software Solutions GmbH 28 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

29 References [Trommer15] Trommer, et al. "Functionality-Enhanced Logic Gate Design Enabled by Symmetrical Reconfigurable Silicon Nanowire Transistors, IEEE Trans on in Nanotechnology, pp , July 2015 [Voigt14] Andreas Voigt, et al. Towards Computation with Microchemomechanical Systems, In International Journal of Foundations of Computer Science, World Scientific, volume 25, 2014 [Augonnet10] C. Augonnet, et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23.2 (2011), pp [Kwok99] Y.-K. Kwok, et al, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Comput. Surv., ACM Press, 1999, 31, [Castrillon14] J. Castrillon, et al, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap, Springer, 2014 [Castrillon10] J. Castrillon, et al, Trace-based KPN composability analysis for mapping simultaneous applications to MPSoC platforms, DATE 10, pp , 2010 [Castrillon13] J. Castrillon, et al, MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs, IEEE Trans on Industrial Informatics, vol. 9, no. 1, pp , 2013 [Brunet15] Brunet, Simone C. Analysis and optimization of dynamic dataflow programs. Diss. ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE, 2015 [Singh15] Singh, Amit, et al. "Resource and Throughput Aware Execution Trace Analysis for Efficient Run-time Mapping on MPSoCs., TCAD 2015 [SAMOS14] J.F. Eusse, et al, "Pre-architectural performance estimation for ASIP design based on abstract processor models," Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 pp , 2014 [Castrillon10b] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable SDR, Wireless Innovation Conference and Product Exposition (SDR), 2010 [Castrillon11] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable software defined radio, Analog Integrated Circuits and Signal Processing, vol. 69, no. 2 3, pp , J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Programming Heterogeneous Embedded Systems for IoT

Programming Heterogeneous Embedded Systems for IoT Programming Heterogeneous Embedded Systems for IoT Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Get-together toward a sustainable collaboration in IoT

More information

Compiling for deeply embedded and heterogeneous signal processing systems

Compiling for deeply embedded and heterogeneous signal processing systems Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction (CCC) 5G Summit, Dresden, Germany September 29, 2016 Multi-Processor/core

More information

On mapping to multi/manycores

On mapping to multi/manycores On mapping to multi/manycores Jeronimo Castrillon Chair for Compiler Construction (CCC) TU Dresden, Germany MULTIPROG HiPEAC Conference Stockholm, 24.01.2017 Mapping for dataflow programming models MEM

More information

Analysis and software synthesis of KPN applications

Analysis and software synthesis of KPN applications Analysis and software synthesis of KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS Seminar UC Berkeley, CA. October 22 nd 2015 Acknowledgements

More information

Easy Multicore Programming using MAPS

Easy Multicore Programming using MAPS Easy Multicore Programming using MAPS Jeronimo Castrillon, Maximilian Odendahl Multicore Challenge Conference 2012 September 24 th, 2012 Institute for Communication Technologies and Embedded Systems Outline

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING 1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation

More information

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS Sukumar Reddy Anapalli Krishna Chaithanya Chakilam Timothy W. O Neil Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science The

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems Rainer Leupers, Miguel Angel Aguilar, Jeronimo Castrillon, and Weihua Sheng Abstract The increasing demands of modern embedded

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

Applying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel

Applying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel Applying Models of Computation to OpenCL Pipes for FPGA Computing Nachiket Kapre + Hiren Patel nachiket@uwaterloo.ca Outline Models of Computation and Parallelism OpenCL code samples Synchronous Dataflow

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Hardware Software Codesign of Embedded Systems

Hardware Software Codesign of Embedded Systems Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Modelling, Analysis and Scheduling with Dataflow Models

Modelling, Analysis and Scheduling with Dataflow Models technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University

More information

Adaptive Query Processing on Prefix Trees Wolfgang Lehner

Adaptive Query Processing on Prefix Trees Wolfgang Lehner Adaptive Query Processing on Prefix Trees Wolfgang Lehner Fachgruppentreffen, 22.11.2012 TU München Prof. Dr.-Ing. Wolfgang Lehner > Challenges for Database Systems Three things are important in the database

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Extending the Power of FPGAs

Extending the Power of FPGAs Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with

More information

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation Contents Part I Basic Concepts 1 The Nature of Hardware and Software... 3 1.1 Introducing Hardware/Software Codesign... 3 1.1.1 Hardware... 3 1.1.2 Software... 5 1.1.3 Hardware and Software... 7 1.1.4

More information

Dataflow: The Road Less Complex

Dataflow: The Road Less Complex Dataflow: The Road Less Complex Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of Washington Sponsored by NSF and Intel Things to keep you up at night (~2016) Opportunities 8 billion

More information

Design Space Exploration for Memory Subsystems of VLIW Architectures

Design Space Exploration for Memory Subsystems of VLIW Architectures E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Reconfigurable Cell Array for DSP Applications

Reconfigurable Cell Array for DSP Applications Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

3.1 Description of Microprocessor. 3.2 History of Microprocessor

3.1 Description of Microprocessor. 3.2 History of Microprocessor 3.0 MAIN CONTENT 3.1 Description of Microprocessor The brain or engine of the PC is the processor (sometimes called microprocessor), or central processing unit (CPU). The CPU performs the system s calculating

More information

Embedded Systems CS - ES

Embedded Systems CS - ES Embedded Systems - 1 - Synchronous dataflow REVIEW Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this Each edge labeled with number of tokens consumed/produced

More information

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Mickaël Dardaillon Research Intern with NOKIA Technologies January 27th, 2015 2 / 33 What we know

More information

Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case

Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs PhD work of Mickael Dardaillon Mickaël Dardaillon, Kevin Marquet (Citi), Tanguy Risset (Citi), Jérôme Martin

More information

MoCC - Models of Computation and Communication SystemC as an Heterogeneous System Specification Language

MoCC - Models of Computation and Communication SystemC as an Heterogeneous System Specification Language SystemC as an Heterogeneous System Specification Language Eugenio Villar Fernando Herrera University of Cantabria Challenges Massive concurrency Complexity PCB MPSoC with NoC Nanoelectronics Challenges

More information

The University of Texas at Austin

The University of Texas at Austin EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel

More information

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design Ahmed Amine JERRAYA EPFL November 2005 TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr

More information

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009 Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling

More information

Multi processor systems with configurable hardware acceleration

Multi processor systems with configurable hardware acceleration Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations

More information

Key technologies for many core architectures

Key technologies for many core architectures Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 2: Fundamental Concepts and ISA Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu What Do I Expect From You? Chance favors the prepared mind. (Louis Pasteur) كل

More information

Multi-Level Computing Architectures (MLCA)

Multi-Level Computing Architectures (MLCA) Multi-Level Computing Architectures (MLCA) Faraydon Karim ST US-Fellow Emerging Architectures Advanced System Technology STMicroelectronics Outline MLCA What s the problem Traditional solutions: VLIW,

More information

SpiNNaker - a million core ARM-powered neural HPC

SpiNNaker - a million core ARM-powered neural HPC The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK

More information

Node Prefetch Prediction in Dataflow Graphs

Node Prefetch Prediction in Dataflow Graphs Node Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering The University of Texas at Austin newton.petersen@ni.com mrw325@yahoo.com

More information

Extending the Power of FPGAs to Software Developers:

Extending the Power of FPGAs to Software Developers: Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming

More information

Communication Systems Design in Practice

Communication Systems Design in Practice Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 03 04 '05 '06 '07 '08 '09 '10 '11 '12 '13

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

Portland State University ECE 588/688. Dataflow Architectures

Portland State University ECE 588/688. Dataflow Architectures Portland State University ECE 588/688 Dataflow Architectures Copyright by Alaa Alameldeen and Haitham Akkary 2018 Hazards in von Neumann Architectures Pipeline hazards limit performance Structural hazards

More information

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for

More information

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall

More information

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

HW/SW Cyber-System Co-Design and Modeling

HW/SW Cyber-System Co-Design and Modeling HW/SW Cyber-System Co-Design and Modeling Julio OLIVEIRA Karol DESNOS Karol Desnos (IETR) & Julio Oliveira (TNO) 1 Introduction Who are we? Julio de OLIVEIRA Position: TNO - Researcher & innovation scientist

More information

Hardware Software Codesign of Embedded System

Hardware Software Codesign of Embedded System Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

Computer Hardware Requirements for Real-Time Applications

Computer Hardware Requirements for Real-Time Applications Lecture (4) Computer Hardware Requirements for Real-Time Applications Prof. Kasim M. Al-Aubidy Computer Engineering Department Philadelphia University Real-Time Systems, Prof. Kasim Al-Aubidy 1 Lecture

More information

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University Dataflow Languages Languages for Embedded Systems Prof. Stephen A. Edwards Columbia University March 2009 Philosophy of Dataflow Languages Drastically different way of looking at computation Von Neumann

More information

MPSOC Architectures for Computing for Imaging. Thierry Collette, Ph.D. CEA LIST Head of Architecture and Design Department

MPSOC Architectures for Computing for Imaging. Thierry Collette, Ph.D. CEA LIST Head of Architecture and Design Department MPSOC Architectures for Computing for Imaging Thierry Collette, Ph.D. CEA LIST Head of Architecture and Design Department thierry.collette@cea.fr Performance Embedded Computing: a New Area www.tilera.com

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

Dynamic Dataflow. Seminar on embedded systems

Dynamic Dataflow. Seminar on embedded systems Dynamic Dataflow Seminar on embedded systems Dataflow Dataflow programming, Dataflow architecture Dataflow Models of Computation Computation is divided into nodes that can be executed concurrently Dataflow

More information

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

More information

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale

More information

Modern Processors. RISC Architectures

Modern Processors. RISC Architectures Modern Processors RISC Architectures Figures used from: Manolis Katevenis, RISC Architectures, Ch. 20 in Zomaya, A.Y.H. (ed), Parallel and Distributed Computing Handbook, McGraw-Hill, 1996 RISC Characteristics

More information

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song.   HUAWEI TECHNOLOGIES Co., Ltd. IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring

More information

Moore s Law. Computer architect goal Software developer assumption

Moore s Law. Computer architect goal Software developer assumption Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 The End of Dennard MOSFET Scaling Theory 2013 Kalray SA All Rights Reserved MPSoC

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

A Memory System Design Framework: Creating Smart Memories

A Memory System Design Framework: Creating Smart Memories A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c2s2.org Stephen Richardson, Christos Kozyrakis,

More information

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics Veloce2 the Enterprise Verification Platform Simon Chen Emulation Business Development Director Mentor Graphics Agenda Emulation Use Modes Veloce Overview ARM case study Conclusion 2 Veloce Emulation Use

More information

System on Chip (SoC) Design

System on Chip (SoC) Design System on Chip (SoC) Design Moore s Law and Technology Scaling the performance of an IC, including the number components on it, doubles every 18-24 months with the same chip price... - Gordon Moore - 1960

More information

Scientific Computing on GPUs: GPU Architecture Overview

Scientific Computing on GPUs: GPU Architecture Overview Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

LabVIEW Based Embedded Design [First Report]

LabVIEW Based Embedded Design [First Report] LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com

More information

The Next Evolution of Instrumentation for Microwave Test. Jin Bains RF R&D Director National Instruments

The Next Evolution of Instrumentation for Microwave Test. Jin Bains RF R&D Director National Instruments The Next Evolution of Instrumentation for Microwave Test Jin Bains RF R&D Director National Instruments The Complexity of Testing Systems of Systems www.airbus.com www.esa.int EISCAT_3D A European 3D Imaging

More information