Dataflow programming for heterogeneous computing systems
|
|
- Cleopatra Cross
- 5 years ago
- Views:
Transcription
1 Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden Tutorial: Algorithmic specification, tools and algorithms for programming heterogeneous platforms. PACT Conference. San Francisco, October 19 th 2015
2 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 2 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
3 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 3 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
4 Heterogeneous computing systems Today s heterogeneity Desktop/HPC: GPGPU, GP + ACC Embedded: RISCs, DSPs, ASIPs, è Resources: different performance/energy characteristics MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Tomorrow s heterogeneity: Emerging technologies Heterogeneity beyond performance and energy Reliability/error tolerance Computation model 4 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
5 Heterogeneity: Cfaed Vision German Excellence Cluster: Goal to explore new technologies for electronic information processing which overcome the limits of CMOS technology Information Processing Systems research to handle heterogeneity Devices & Circuits Materials & Functions CMOS (industry focus) H HAEC: Highly Adaptive Energy-Efficient Computing A Silicon Nanowires B Carbon F Orchestration C Organic G Resilience D Biomolecular Assembly E Chemical Biology to inspire solutions I Biological Systems Material research for post-cmos technologies 5 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
6 Heterogeneity: Example SiNW SiNW: Silicon Nanowires Multi-gate devices with less performance penalty Reconfigurable P/N functionality Possibilities New micro architectures New pipeline structures New field programmable devices [Trommer15] 6 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
7 Heterogeneity: Example chemical processing Lab-on-Chips: for sensing, analysis and test Also for computing? Different kinds of transistors Oscillators and other components Fundamentally different way of computation [Voigt14] 7 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
8 Heterogeneity: Example DNA origami DNA origami: Self-assembled 2D/3D structures made of DNA strands Use structures to build advance electronic devices Example: Plasmonic waveguides Courtesy: Thorsten Lars-Schmidt 8 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
9 Consequence of heterogeneity MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Already difficult to program them today, what about tomorrow? Need for models and abstraction 9 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
10 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 10 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
11 Models: Introduction Von Neumann model makes things complicated Sharing state Data races Control unit Inst. reg Prog. counter Memory & I/O Processing unit Reg. file ALU Central Processing Unit (CPU) Task graphs: A simple parallel programming model Intel TBB,.NET Task parallel library (TPL), OpenMP Tasks, Runtime and data management, e.g., StarPU [Augonnet10] 11 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
12 Directed acyclic task graphs Very well studied, see for example [Kwok99] Difficult problem for heterogeneous systems RISC Interconnect DSP Processor Shared Memory 1 2 4? 1 3 Processor 2 Time Time 12 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
13 Dataflow models Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor cf. LabVIEW models Time 13 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
14 Dataflow models (2) Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor What now? Time 14 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
15 Dataflow models (3) Synchronous Dataflow (SDF): every actor has a fixed behavior 2 e a1 a2 a3 e1 e3 6 2 e2 1 a3 always writes 1 token to e4 Cyclo-Static Dataflow (CSDF): every actor has a set of fixed behaviors {1,0,0} e2 1 1 e1 {0,1,0} 1 a1 a2 a3 {0,2} e3 a1: writes 1 token, then 0, then 0 to e2 15 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
16 Dynamic models and Kahn Process Networks Dynamic dataflow: set of firing rules per actor i1 a1 a2 a3 i2 Kahn process networks (KPN): nodes are now called processes e4 p1 p2 p3 e1 e3 p1: writes any amount of tokens to e2 at any time e2 16 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
17 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 17 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
18 Analysis and synthesis MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 Application Architecture model L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Non-functional specification Communication support HW queues Network Processor Packet DMA cf. Silexica s tool flows Analysis Synthesis Code generation Property models (timing, energy, error, ) PNargs_ifft_r.ID = 6U; PNargs_ifft_r.PNchannel_freq_coef = fi PNargs_ifft_r.PNnum_freq_coef = 0U; PNargs_ifft_r.PNchannel_time_coef = si PNargs_ifft_r.channel = 1; sink_left = IPCllmrf_open(3, 1, 1); sink_right = IPCllmrf_open(7, 1, 1); PNargs_sink.ID = 7U; PNargs_sink.PNchannel_in_left = sink_l PNargs_sink.PNnum_in_left = 0U; PNargs_sink.PNchannel_in_right = sink_ PNargs_sink.PNnum_in_right = 0U; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_f taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn ft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_i taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn fft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; 18 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
19 Example for SDFs Compute topology matrix, and solve system of equations Solution: repetition vector serve to unroll the graph [1 3 2] Perform mapping and scheduling on the resulting directed acyclic graph (DAG) 19 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
20 Example for KPNs: Static & dynamic analysis Need to understand process interactions Unrolled CFG for read(&c1); f3(...); write(&c3); f4(...); write(&c3); [Castrillon14] f1(...); write(&c2); read(&c2); f5(...); write(&c4); read(&c0); f2(...); write(&c1); read(&c3); f6(...); read(&c3); f7(...); read(&c4); R4 R3 R2 R4 R3 R2 R1 Unrolled CFG for f2 f1 f5 R1 f1 f1 f1 f2 f3 f1 f6 f7 f7 f7 f7 f4 f1 f5 f3 f6 f7 f7 f7 f7 f4 f5 f5 f5 f5... t2 t t t J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
21 KPN & DDF: Tracing Dynamic analysis based on execution traces [Castrillon10/13, Brunet15, Singh15] DAG representation for further analysis and synthesis 21 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
22 Models from functional specification Inspect functional specification of actors/processes (cf. 2 nd talk) Instrumentation, emulation, cost models, For timing, energy, errors, Example [SAMOS14] Profiling: Execution counts, branch stats 22 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
23 Models based on algorithmic descriptions When functional specification is not meant for synthesis Common/required in heterogeneous systems (special components) Need to match algorithms to hardware components [Castrillon10b, Castrillon11] Algorithm library Plain code Plain code Platform model + characterization of special components N: Algorithmic actors F: Existing implementation in target platform 23 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
24 Multiple-applications Use traces and mappings to reason about platform sharing 24 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
25 Multiple-applications (2) Quickly discard bad multi-application configurations by observing the platform utilization profiles 25 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
26 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 26 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
27 Summary Need programming models (and HW/SW stacks) to handle heterogeneity Even more dramatic in the post-cmos era Dataflow models Natural way to describe some applications Amenable to analysis and synthesis for parallel execution Discuss different kinds of models and required analysis Need models of hardware for synthesis 27 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
28 Acknowledgements German Cluster of Excellence: Center for Advancing Electronics Dresden ( Collaborative research center (CRC): Highly Adaptive Energy-Efficient Computing (HAEC) CRC 912: Highly Adaptive Energy- Efficient Computing Silexica Software Solutions GmbH 28 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
29 References [Trommer15] Trommer, et al. "Functionality-Enhanced Logic Gate Design Enabled by Symmetrical Reconfigurable Silicon Nanowire Transistors, IEEE Trans on in Nanotechnology, pp , July 2015 [Voigt14] Andreas Voigt, et al. Towards Computation with Microchemomechanical Systems, In International Journal of Foundations of Computer Science, World Scientific, volume 25, 2014 [Augonnet10] C. Augonnet, et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23.2 (2011), pp [Kwok99] Y.-K. Kwok, et al, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Comput. Surv., ACM Press, 1999, 31, [Castrillon14] J. Castrillon, et al, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap, Springer, 2014 [Castrillon10] J. Castrillon, et al, Trace-based KPN composability analysis for mapping simultaneous applications to MPSoC platforms, DATE 10, pp , 2010 [Castrillon13] J. Castrillon, et al, MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs, IEEE Trans on Industrial Informatics, vol. 9, no. 1, pp , 2013 [Brunet15] Brunet, Simone C. Analysis and optimization of dynamic dataflow programs. Diss. ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE, 2015 [Singh15] Singh, Amit, et al. "Resource and Throughput Aware Execution Trace Analysis for Efficient Run-time Mapping on MPSoCs., TCAD 2015 [SAMOS14] J.F. Eusse, et al, "Pre-architectural performance estimation for ASIP design based on abstract processor models," Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 pp , 2014 [Castrillon10b] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable SDR, Wireless Innovation Conference and Product Exposition (SDR), 2010 [Castrillon11] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable software defined radio, Analog Integrated Circuits and Signal Processing, vol. 69, no. 2 3, pp , J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015
Programming Heterogeneous Embedded Systems for IoT
Programming Heterogeneous Embedded Systems for IoT Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Get-together toward a sustainable collaboration in IoT
More informationCompiling for deeply embedded and heterogeneous signal processing systems
Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction (CCC) 5G Summit, Dresden, Germany September 29, 2016 Multi-Processor/core
More informationOn mapping to multi/manycores
On mapping to multi/manycores Jeronimo Castrillon Chair for Compiler Construction (CCC) TU Dresden, Germany MULTIPROG HiPEAC Conference Stockholm, 24.01.2017 Mapping for dataflow programming models MEM
More informationAnalysis and software synthesis of KPN applications
Analysis and software synthesis of KPN applications Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de DREAMS Seminar UC Berkeley, CA. October 22 nd 2015 Acknowledgements
More informationEasy Multicore Programming using MAPS
Easy Multicore Programming using MAPS Jeronimo Castrillon, Maximilian Odendahl Multicore Challenge Conference 2012 September 24 th, 2012 Institute for Communication Technologies and Embedded Systems Outline
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu
More informationDIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING
1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation
More informationMulticore DSP Software Synthesis using Partial Expansion of Dataflow Graphs
Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationSTATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS
STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS Sukumar Reddy Anapalli Krishna Chaithanya Chakilam Timothy W. O Neil Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science The
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationSoftware Compilation Techniques for Heterogeneous Embedded Multi-Core Systems
Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems Rainer Leupers, Miguel Angel Aguilar, Jeronimo Castrillon, and Weihua Sheng Abstract The increasing demands of modern embedded
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationApplying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel
Applying Models of Computation to OpenCL Pipes for FPGA Computing Nachiket Kapre + Hiren Patel nachiket@uwaterloo.ca Outline Models of Computation and Parallelism OpenCL code samples Synchronous Dataflow
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationHardware Software Codesign of Embedded Systems
Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationModelling, Analysis and Scheduling with Dataflow Models
technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University
More informationAdaptive Query Processing on Prefix Trees Wolfgang Lehner
Adaptive Query Processing on Prefix Trees Wolfgang Lehner Fachgruppentreffen, 22.11.2012 TU München Prof. Dr.-Ing. Wolfgang Lehner > Challenges for Database Systems Three things are important in the database
More informationHardware/Software Co-design
Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction
More information4. Hardware Platform: Real-Time Requirements
4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationEECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization
EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationExtending the Power of FPGAs
Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with
More informationContents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation
Contents Part I Basic Concepts 1 The Nature of Hardware and Software... 3 1.1 Introducing Hardware/Software Codesign... 3 1.1.1 Hardware... 3 1.1.2 Software... 5 1.1.3 Hardware and Software... 7 1.1.4
More informationDataflow: The Road Less Complex
Dataflow: The Road Less Complex Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of Washington Sponsored by NSF and Intel Things to keep you up at night (~2016) Opportunities 8 billion
More informationDesign Space Exploration for Memory Subsystems of VLIW Architectures
E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationReconfigurable Cell Array for DSP Applications
Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell
More informationLecture 8: RISC & Parallel Computers. Parallel computers
Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine
More information04 - DSP Architecture and Microarchitecture
September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:
More information3.1 Description of Microprocessor. 3.2 History of Microprocessor
3.0 MAIN CONTENT 3.1 Description of Microprocessor The brain or engine of the PC is the processor (sometimes called microprocessor), or central processing unit (CPU). The CPU performs the system s calculating
More informationEmbedded Systems CS - ES
Embedded Systems - 1 - Synchronous dataflow REVIEW Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this Each edge labeled with number of tokens consumed/produced
More informationCompilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar
Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Mickaël Dardaillon Research Intern with NOKIA Technologies January 27th, 2015 2 / 33 What we know
More informationAutomated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationCompilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs
Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs PhD work of Mickael Dardaillon Mickaël Dardaillon, Kevin Marquet (Citi), Tanguy Risset (Citi), Jérôme Martin
More informationMoCC - Models of Computation and Communication SystemC as an Heterogeneous System Specification Language
SystemC as an Heterogeneous System Specification Language Eugenio Villar Fernando Herrera University of Cantabria Challenges Massive concurrency Complexity PCB MPSoC with NoC Nanoelectronics Challenges
More informationThe University of Texas at Austin
EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel
More informationA Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design
A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design Ahmed Amine JERRAYA EPFL November 2005 TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr
More informationAssembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009
Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling
More informationMulti processor systems with configurable hardware acceleration
Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations
More informationKey technologies for many core architectures
Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance
More informationA Process Model suitable for defining and programming MpSoCs
A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationEnergy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
More informationComputer Architecture
Computer Architecture Lecture 2: Fundamental Concepts and ISA Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu What Do I Expect From You? Chance favors the prepared mind. (Louis Pasteur) كل
More informationMulti-Level Computing Architectures (MLCA)
Multi-Level Computing Architectures (MLCA) Faraydon Karim ST US-Fellow Emerging Architectures Advanced System Technology STMicroelectronics Outline MLCA What s the problem Traditional solutions: VLIW,
More informationSpiNNaker - a million core ARM-powered neural HPC
The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK
More informationNode Prefetch Prediction in Dataflow Graphs
Node Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering The University of Texas at Austin newton.petersen@ni.com mrw325@yahoo.com
More informationExtending the Power of FPGAs to Software Developers:
Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming
More informationCommunication Systems Design in Practice
Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 03 04 '05 '06 '07 '08 '09 '10 '11 '12 '13
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationClearSpeed Visual Profiler
ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are
More informationPortland State University ECE 588/688. Dataflow Architectures
Portland State University ECE 588/688 Dataflow Architectures Copyright by Alaa Alameldeen and Haitham Akkary 2018 Hazards in von Neumann Architectures Pipeline hazards limit performance Structural hazards
More informationESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder
ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for
More informationBuffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms
Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL
More informationIntegrating MRPSOC with multigrain parallelism for improvement of performance
Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,
More informationMapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.
Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationAdministration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture
CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall
More informationA Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms
A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationNetwork on Chip Architecture: An Overview
Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology
More informationHardware-Software Codesign
Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual
More informationLinköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing
Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.
More informationHW/SW Cyber-System Co-Design and Modeling
HW/SW Cyber-System Co-Design and Modeling Julio OLIVEIRA Karol DESNOS Karol Desnos (IETR) & Julio Oliveira (TNO) 1 Introduction Who are we? Julio de OLIVEIRA Position: TNO - Researcher & innovation scientist
More informationHardware Software Codesign of Embedded System
Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationComputer Hardware Requirements for Real-Time Applications
Lecture (4) Computer Hardware Requirements for Real-Time Applications Prof. Kasim M. Al-Aubidy Computer Engineering Department Philadelphia University Real-Time Systems, Prof. Kasim Al-Aubidy 1 Lecture
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationDataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University
Dataflow Languages Languages for Embedded Systems Prof. Stephen A. Edwards Columbia University March 2009 Philosophy of Dataflow Languages Drastically different way of looking at computation Von Neumann
More informationMPSOC Architectures for Computing for Imaging. Thierry Collette, Ph.D. CEA LIST Head of Architecture and Design Department
MPSOC Architectures for Computing for Imaging Thierry Collette, Ph.D. CEA LIST Head of Architecture and Design Department thierry.collette@cea.fr Performance Embedded Computing: a New Area www.tilera.com
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationVLSI Design Automation. Maurizio Palesi
VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips
More informationDynamic Dataflow. Seminar on embedded systems
Dynamic Dataflow Seminar on embedded systems Dataflow Dataflow programming, Dataflow architecture Dataflow Models of Computation Computation is divided into nodes that can be executed concurrently Dataflow
More informationA Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
More informationHIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS
HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale
More informationModern Processors. RISC Architectures
Modern Processors RISC Architectures Figures used from: Manolis Katevenis, RISC Architectures, Ch. 20 in Zomaya, A.Y.H. (ed), Parallel and Distributed Computing Handbook, McGraw-Hill, 1996 RISC Characteristics
More informationIQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.
IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationKalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013
Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 The End of Dennard MOSFET Scaling Theory 2013 Kalray SA All Rights Reserved MPSoC
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationA Memory System Design Framework: Creating Smart Memories
A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c2s2.org Stephen Richardson, Christos Kozyrakis,
More informationVeloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics
Veloce2 the Enterprise Verification Platform Simon Chen Emulation Business Development Director Mentor Graphics Agenda Emulation Use Modes Veloce Overview ARM case study Conclusion 2 Veloce Emulation Use
More informationSystem on Chip (SoC) Design
System on Chip (SoC) Design Moore s Law and Technology Scaling the performance of an IC, including the number components on it, doubles every 18-24 months with the same chip price... - Gordon Moore - 1960
More informationScientific Computing on GPUs: GPU Architecture Overview
Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationLabVIEW Based Embedded Design [First Report]
LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com
More informationThe Next Evolution of Instrumentation for Microwave Test. Jin Bains RF R&D Director National Instruments
The Next Evolution of Instrumentation for Microwave Test Jin Bains RF R&D Director National Instruments The Complexity of Testing Systems of Systems www.airbus.com www.esa.int EISCAT_3D A European 3D Imaging
More information