Computing in the age of parallelism: Challenges and opportunities

Size: px
Start display at page:

Download "Computing in the age of parallelism: Challenges and opportunities"

Transcription

1 Computing in the age of parallelism: Challenges and opportunities Jorn W. Janneck Computer Science Dept. Lund University Multicore Day, 23 Sep 2013

2 why we are here Original data collected and plotted by M. Horowitz, F. Labonte, O.Shacham, K. Olukotun, L. Hammond, and C. Batten Dotted line extrapolations by C. Moore. C. Moore, Data Processing in ExaScale-ClassComputer Systems, April

3 physics of computing speed Circuits are still getting larger at a steady pace... but speed-up is no longer the result of a faster clock. Instead, we get more parallel devices. Intel desktop processors, top model for the resp. year. 3

4 physics of computing efficiency Performance = parallelism. Efficiency = locality. It really is that simple. Bill Dally approximate power in 28nm processor Bill Dally 4

5 sin r co e gl P ec U Pentium IV ( Northwood ) m i- c ult or P ec U Core i7 980X ( Gulftown ) pro ce y rra a or s s Epiphany-IV 5

6 programming parallel machines Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 Virtex FPGA 100s ~ 1000s 6

7 programming parallel machines SOPM ( ajava ) Am2045 C w/ ilib CUDA C + VHDL TILE 64 PC102 GT200 SEAforth 40C18 Virtex VentureForth HDL 7

8 portability unicore Pentium IV then PDP-11 now instruction set all of the above, plus... number/kind of registers number and kind of processing elements address space (size, structure) Fujitsu SuperSPARC system software OS GUI... and possibly other 'stuff', like memories multicore processor array topology of interconnect GPU FPGA 8

9 programming sequential machines Pentium IV PDP-11 Fujitsu SuperSPARC

10 programming sequential machines Pentium IV von Neumann machine* C, Pascal, Forth, Java, Lisp, Fortran, Haskell,... PDP-11 Fujitsu SuperSPARC * for the purpose of this talk von Neumann machine single stored-program computer 10

11 a pivotal model of computing computer architects compiler writers programmers language designers 11

12 programming parallel machines? Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 Virtex FPGA 100s ~ 1000s 12

13 our work Implement streaming applications efficiently on a wide range of computing platforms. Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 language, methodology VHDL/Verilog synthesis software synthesis, IDE profiling, hardware synth., STHORM profiling, tools, software synth. software synthesis, applications software synthesis, applications UC Berkeley Xilinx Ericsson EPFL Lund University INSA Rennes Halmstad Högskolan Virtex FPGA 100s ~ 1000s 13

14 streaming applications how a video expert explains an MPEG decoding algorithm digital signal processing media processing video/audio coding image processing / analytics networking / packet processing automatic control cryptography gene sequencing... 14

15 dataflow computational kernels (actors) provide explicit concurrency execute in discrete atomic steps directed point-to-point connections lossless, order-preserving conceptually unbounded asynchrony, abstraction from time communicating streams of discrete data packets (tokens) no shared state Some things not represented in this model: time (incl. speed of actors and communication channels) execution policy of actors communication protocol and communication medium location and implementation technology of actors MPEG/ISO standardized RVC-CAL, a subset of our actor language, in

16 implementation parsing, translation composition, scheduling code generation partitioning architecture interconnect 16

17 questions we try to answer about stream programs Is it determinate? Does it do what it is supposed to do? Is it bounded? Does it deadlock? Can it be implemented efficiently? about their (more or less) parallel implementations Does it still work? Does it deadlock now? Is it sufficiently fast/efficient/small...? What/how many resources does it use? 17

18 dataflow profiling A B C trace: the structure of a computation trace is a DAG vertices are steps edges are dependencies dependencies mediated by tokens state variables A:1 B:1 C:1 A:2 B:2 C:2 A:3 B:3 C:3 18

19 analysis: post-mortem scheduling basic schedule of trace latency density laboratory for hypothetical designs/architectures What if we changed latency somewhere?... multiplexed resources?... did so dynamically?... used different communication fabric?... 19

20 trace analysis input/output tagging 20

21 trace metrics input (examples) latency (bytes) (incremental) power (Σ power) 21

22 critical path Critical path: the longest time-weighted sequence of events from the start of the program to its termination. 22

23 critical path metrics Impact Analysis: identifying potential for improvement

24 critical path & buffer optimization Objective: Evaluate a trade-off between the buffer size and the critical path length Minimum buffer size configuration for a given throughput Optimize only critical buffers (i.e. buffers that block steps in the current critical path) Critical Path Length CP length with min. buffer size Deadlock Region CP length with unbounded buffer size min. Buffers Size Buffer Size

25 critical path & buffer optimization the throughput is not significatively improved resonable optimal buffer size configuration 25

26 programmers computer architects compiler writers te ac to her o! s, a pivotal model of computing language designers 26

27 here be dragons teaching parallelism There Is No More Sequential Programming. Why Are We Still Teaching It? panel title at the Intel Academic Forum, Supercomputing 08 Parallel Programming Programming preemption synchronization programming languages objects communication races types determinacy asynchrony threads design patterns variables tasks deadlock semaphores complexity data structures processors algorithms loops memory management design-by-contract 27

28 an alternative? Arvind, CSAIL, Massachusetts Institute of Technology Workshop on Directions in Multicore Programming Education, March 8,

29 the larger conversation? These are exciting times. computer architects compiler writers programmers Computing is losing its North Star. language designers Let's imagine that most computers have scores or hundreds of general-purpose cores. (And let's say we do not know exactly how many there are, or how they are connected.) Or programmable logic, or combinations of both. How would that affect... our idea of an algorithm? the languages and abstractions we use to program such a machine? the concept of a module? the things we need to do to validate programs? the techniques for making programs efficient? the concepts and the implementation of an operating system? the way we teach programming? 29

30 Thanks! Johan Eker Ericsson Edward A. Lee UC Berkeley Marco Mattavelli EPFL Carl von Platen Ericsson Dave B. Parlour Tabula Ian Miller Overture Mickaël Raulet INSA Rennes Mathieu Wipliez INSA Rennes Christophe Lucarz EPFL Shuvra S Bhattacharyya U Maryland Karl-Erik Arzen LTH Ruirui Gu Microsoft Endri Bezati EPFL Simone Casale Brunet EPFL Ghislain Roquier EPFL and many others. opendf.org Open Dataflow portal opendf.sf.net open source tool suite orcc.sf.net Open RVC-CAL Compiler ACTORS Project mpeg.chiariglione.org MPEG jwj@cs.lth.se 30

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL

More information

MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology

MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology Hussein Aman-Allah, Ehab Hanna, Karim Maarouf, Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland

More information

Design of an Embedded Low Complexity Image Coder using CAL language

Design of an Embedded Low Complexity Image Coder using CAL language Author manuscript, published in "Conference on Design and Architectures for Signal and Image Processing (DASIP) 2009, Sophia Antipolis : France (2009)" Design of an Embedded Low Complexity Image Coder

More information

Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study

Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickael Raulet To cite this

More information

Synthesizing Hardware from Dataflow Programs

Synthesizing Hardware from Dataflow Programs Synthesizing Hardware from Dataflow Programs Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickaël Raulet To cite this version: Jörn W. Janneck, Ian D. Miller, David

More information

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

More information

OpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems

OpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems OpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems Shuvra S. Bhattacharyya Dept. of ECE and UMIACS University of Maryland, College Park, MD 20742 USA Johan Eker, Carl von Platen

More information

Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers

Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers A machine model for dataflow actors and its applications Janneck, Jörn Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers DOI: 10.1109/ACSSC.2011.6190107

More information

Dr. Yassine Hariri CMC Microsystems

Dr. Yassine Hariri CMC Microsystems Dr. Yassine Hariri Hariri@cmc.ca CMC Microsystems 03-26-2013 Agenda MCES Workshop Agenda and Topics Canada s National Design Network and CMC Microsystems Processor Eras: Background and History Single core

More information

Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case

Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi

More information

VLSI Digital Signal Processing

VLSI Digital Signal Processing VLSI Digital Signal Processing EEC 28 Lecture Bevan M. Baas Tuesday, January 9, 28 Today Administrative items Syllabus and course overview My background Digital signal processing overview Read Programmable

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

AVC Entropy Coding for MPEG Reconfigurable Video Coding

AVC Entropy Coding for MPEG Reconfigurable Video Coding AVC Entropy Coding for MPEG Reconfigurable Video Coding Hussein Aman-Allah and Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland {hussein.aman-allah, ihab.amer}@epfl.ch

More information

Department of Electrical and Computer Engineering, University of Maryland

Department of Electrical and Computer Engineering, University of Maryland Department of Electrical and Computer Engineering, University of Maryland OUTLINE Introduction: Problem statement Background Goals Co-processing units generation: Approach and baseline Multi-Dataflow Composer

More information

PERFORMANCE BENCHMARKING OF RVC BASED MULTIMEDIA SPECIFICATIONS

PERFORMANCE BENCHMARKING OF RVC BASED MULTIMEDIA SPECIFICATIONS PERFORMANCE BENCHMARKING OF RVC BASED MULTIMEDIA SPECIFICATIONS Junaid Jameel Ahmad 1 Shujun Li 2 Marco Mattavelli 1 1 École Polytechnique Fédérale de Lausanne (EPFL), Switzerland 2 University of Surrey,

More information

High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms

High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms J Real-Time Image Proc (2014) 9:251 262 DOI 10.1007/s11554-013-0326-5 SPECIAL ISSUE High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms Endri

More information

Orcc: multimedia development made easy

Orcc: multimedia development made easy Orcc: multimedia development made easy Hervé Yviquel, Antoine Lorence, Khaled Jerbi, Gildas Cocherel, Alexandre Sanchez, Mickaël Raulet To cite this version: Hervé Yviquel, Antoine Lorence, Khaled Jerbi,

More information

An LLVM-based decoder for MPEG Reconfigurable Video Coding

An LLVM-based decoder for MPEG Reconfigurable Video Coding An LLVM-based decoder for MPEG Reconfigurable Video Coding Jérôme Gorin, Matthieu Wipliez, Jonathan Piat, Françoise Préteux, Mickaël Raulet To cite this version: Jérôme Gorin, Matthieu Wipliez, Jonathan

More information

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale

More information

Exploiting Statically Schedulable Regions in Dataflow Programs

Exploiting Statically Schedulable Regions in Dataflow Programs DOI 10.1007/s11265-009-0445-1 Exploiting Statically Schedulable Regions in Dataflow Programs Ruirui Gu Jörn W. Janneck Mickaël Raulet Shuvra S. Bhattacharyya Received: 23 June 2009 / Revised: 16 December

More information

EFFICIENT DATA FLOW VARIABLE LENGTH DECODING IMPLEMENTATION FOR THE MPEG RECONFIGURABLE VIDEO CODING FRAMEWORK

EFFICIENT DATA FLOW VARIABLE LENGTH DECODING IMPLEMENTATION FOR THE MPEG RECONFIGURABLE VIDEO CODING FRAMEWORK EFFICIENT DATA FLOW VARIABLE LENGTH DECODING IMPLEMENTATION FOR THE MPEG RECONFIGURABLE VIDEO CODING FRAMEWORK Jianjun Li, Dandan Ding, Christophe Lucarz, Samuel Keller, Marco Mattavelli Microelectronic

More information

A unified hardware/software co-synthesis solution for signal processing systems

A unified hardware/software co-synthesis solution for signal processing systems A unified hardware/software co-synthesis solution for signal processing systems Endri Bezati, Hervé Yviquel, Mickaël Raulet, Marco Mattavelli To cite this version: Endri Bezati, Hervé Yviquel, Mickaël

More information

In Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010.

In Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010. AUTOMATED GENERATION OF AN EFFICIENT MPEG-4 RECONFIGURABLE VIDEO CODING

More information

Exploring the Design Space of HEVC Inverse Transforms with Dataflow Programming

Exploring the Design Space of HEVC Inverse Transforms with Dataflow Programming Indonesian Journal of Electrical Engineering and Computer Science Vol. 6, No. 1, April 2017, pp. 104 ~ 109 DOI: 10.11591/ijeecs.v6.i1.pp104-109 104 Exploring the Design Space of HEVC Inverse Transforms

More information

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION In Proceedings of the IEEE Workshop on Signal Processing Systems, Quebec City, Canada, October 2012. MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT IMPLEMENTATION Lai-Huei Wang 1, Chung-Ching

More information

Proceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al.

Proceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al. Proceedings Chapter Efficient scheduling policies for dynamic data flow programs executed on multi-core MICHALSKA, Malgorzata, et al. Abstract An important challenge of dataflow program implementations

More information

Software Code Generation for the RVC-CAL Language

Software Code Generation for the RVC-CAL Language DOI 10.1007/s11265-009-0390-z Software Code Generation for the RVC-CAL Language Matthieu Wipliez Ghislain Roquier Jean-François Nezan Received: 23 January 2009 / Revised: 20 April 2009 / Accepted: 8 June

More information

generation for the MPEG Reconfigurable Video Coding framework: From CAL actions to C functions.

generation for the MPEG Reconfigurable Video Coding framework: From CAL actions to C functions. Code generation for the MPEG Reconfigurable Video Coding framework: From CAL actions to C functions Matthieu Wipliez, Ghislain Roquier, Mickael Raulet, Jean François Nezan, Olivier Déforges To cite this

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design ECE 5775 (Fall 17) High-Level Digital Design Automation Hardware-Software Co-Design Announcements Midterm graded You can view your exams during TA office hours (Fri/Wed 11am-noon, Rhodes 312) Second paper

More information

To hear the audio, please be sure to dial in: ID#

To hear the audio, please be sure to dial in: ID# Introduction to the HPP-Heterogeneous Processing Platform A combination of Multi-core, GPUs, FPGAs and Many-core accelerators To hear the audio, please be sure to dial in: 1-866-440-4486 ID# 4503739 Yassine

More information

RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France

RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING Endri Bezati 1, Marco Mattavelli 1, Mickaël Raulet 2 1 Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland {firstname.lastname}@epfl.ch

More information

Dynamic Dataflow. Seminar on embedded systems

Dynamic Dataflow. Seminar on embedded systems Dynamic Dataflow Seminar on embedded systems Dataflow Dataflow programming, Dataflow architecture Dataflow Models of Computation Computation is divided into nodes that can be executed concurrently Dataflow

More information

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get

More information

Efficient scheduling policies for dynamic dataflow programs executed on multi-core

Efficient scheduling policies for dynamic dataflow programs executed on multi-core Efficient scheduling policies for dynamic dataflow programs executed on multi-core Ma lgorzata Michalska 1, Nicolas Zufferey 2, Jani Boutellier 3, Endri Bezati 1, and Marco Mattavelli 1 1 EPFL STI-SCI-MM,

More information

Method For Efficient Hardware Implementation From RVC-CAL Dataflow: A LAR. Coder baseline Case Study

Method For Efficient Hardware Implementation From RVC-CAL Dataflow: A LAR. Coder baseline Case Study Automatic Method For Efficient Hardware Implementation From RVC-CAL Dataflow: A LAR Coder baseline Case Study Khaled Jerbi, Matthieu Wipliez, Mickaël Raulet, Olivier Déforges, Marie Babel, Mohamed Abid

More information

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable

More information

Reconfigurable media coding: a new specification model for multimedia coders

Reconfigurable media coding: a new specification model for multimedia coders University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Reconfigurable media coding: a new specification model for multimedia

More information

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Ryan Chladny Application Engineering May 13 th, 2014 2014 The MathWorks, Inc. 1 Design Challenge: Electric

More information

Classification-Based Optimization of Dynamic Dataflow Programs

Classification-Based Optimization of Dynamic Dataflow Programs Classification-Based Optimization of Dynamic Dataflow Programs Hervé Yviquel, Emmanuel Casseau, Matthieu Wipliez, Jérôme Gorin, Mickaël Raulet To cite this version: Hervé Yviquel, Emmanuel Casseau, Matthieu

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Shared-variable synchronization approaches for dynamic dataflow programs Modas, Apostolos; Casale-Brunet, Simone; Stewart, Robert James; Bezati,

More information

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Fundamental Algorithms for System Modeling, Analysis, and Optimization Fundamental Algorithms for System Modeling, Analysis, and Optimization Stavros Tripakis, Edward A. Lee UC Berkeley EECS 144/244 Fall 2014 Copyright 2014, E. A. Lee, J. Roydhowdhury, S. A. Seshia, S. Tripakis

More information

Evaluation and comparison of inter-processor communication techniques in model-based design flows/tools

Evaluation and comparison of inter-processor communication techniques in model-based design flows/tools Evaluation and comparison of inter-processor communication techniques in model-based design flows/tools Pratiksha Dilip Deshmukh Computer Science in Applications Embedded Systems Group Technical University

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Using FPGAs in Supercomputing Reconfigurable Supercomputing

Using FPGAs in Supercomputing Reconfigurable Supercomputing Using FPGAs in Supercomputing Reconfigurable Supercomputing Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors

More information

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen Compute Node Design for DAQ and Trigger Subsystem in Giessen Justus Liebig University in Giessen Outline Design goals Current work in Giessen Hardware Software Future work Justus Liebig University in Giessen,

More information

The Future of the Ptolemy Project

The Future of the Ptolemy Project The Future of the Ptolemy Project Edward A. Lee UC Berkeley With thanks to the entire Ptolemy Team. Ptolemy Miniconference Berkeley, CA, March 22-23, 2001 The Problem Composition Decomposition Corba? TAO?

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

Interface-Based Design Introduction

Interface-Based Design Introduction Interface-Based Design Introduction A. Richard Newton Department of Electrical Engineering and Computer Sciences University of California at Berkeley Integrated CMOS Radio Dedicated Logic and Memory uc

More information

A Stream Compiler for Communication-Exposed Architectures

A Stream Compiler for Communication-Exposed Architectures A Stream Compiler for Communication-Exposed Architectures Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Compositionality in system design: interfaces everywhere! UC Berkeley

Compositionality in system design: interfaces everywhere! UC Berkeley Compositionality in system design: interfaces everywhere! Stavros Tripakis UC Berkeley DREAMS Seminar, Mar 2013 Computers as parts of cyber physical systems cyber-physical ~98% of the world s processors

More information

Application-Platform Mapping in Multiprocessor Systems-on-Chip

Application-Platform Mapping in Multiprocessor Systems-on-Chip Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak lsi@cs.york.ac.uk http://www-users.cs.york.ac.uk/lsi CREDES Kick-off Meeting Tallinn - June 2009 Application-Platform

More information

The Ptolemy II Framework for Visual Languages

The Ptolemy II Framework for Visual Languages The Ptolemy II Framework for Visual Languages Xiaojun Liu Yuhong Xiong Edward A. Lee Department of Electrical Engineering and Computer Sciences University of California at Berkeley Ptolemy II - Heterogeneous

More information

Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules

Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Toomas Remmelg Michel Steuwer Christophe Dubach The 4th South of England Regional Programming Language Seminar 27th

More information

Concurrent Object Oriented Languages

Concurrent Object Oriented Languages Concurrent Object Oriented Languages Instructor Name: Franck van Breugel Email: franck@cse.yorku.ca Office: Lassonde Building, room 3046 Office hours: to be announced Overview concurrent algorithms concurrent

More information

CHAPTER 1 Introduction

CHAPTER 1 Introduction CHAPTER 1 Introduction 1.1 Overview 1 1.2 The Main Components of a Computer 3 1.3 An Example System: Wading through the Jargon 4 1.4 Standards Organizations 15 1.5 Historical Development 16 1.5.1 Generation

More information

Experts in Application Acceleration Synective Labs AB

Experts in Application Acceleration Synective Labs AB Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg

More information

Modelling, Analysis and Scheduling with Dataflow Models

Modelling, Analysis and Scheduling with Dataflow Models technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University

More information

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

COMP 633 Parallel Computing.

COMP 633 Parallel Computing. COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

GPGPU/CUDA/C Workshop 2012

GPGPU/CUDA/C Workshop 2012 GPGPU/CUDA/C Workshop 2012 Day-1: GPGPU/CUDA/C and WSU Presenter(s): Abu Asaduzzaman Nasrin Sultana Wichita State University July 10, 2012 GPGPU/CUDA/C Workshop 2012 Outline Introduction to the Workshop

More information

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Photos placed in horizontal position with even amount of white space between photos and header Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Christopher Forster,

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

Implementation Alternatives. Lecture 2: Implementation Alternatives & Concurrent Programming. Current Implementation Alternatives

Implementation Alternatives. Lecture 2: Implementation Alternatives & Concurrent Programming. Current Implementation Alternatives Lecture 2: Implementation Alternatives & Concurrent ming Implementation Alternatives Controllers can be implemented using a number of techniques [RTCS Ch 3] Implementation Alternatives Processes and threads

More information

Embedded Software from Concurrent Component Models

Embedded Software from Concurrent Component Models Embedded Software from Concurrent Component Models Edward A. Lee UC Berkeley with Shuvra Bhattacharyya, Johan Eker, Christopher Hylands, Jie Liu, Xiaojun Liu, Steve Neuendorffer, Jeff Tsay, and Yuhong

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

A Deterministic Concurrent Language for Embedded Systems

A Deterministic Concurrent Language for Embedded Systems SHIM:A A Deterministic Concurrent Language for Embedded Systems p. 1/28 A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A

More information

Parallel Computer Architecture Concepts

Parallel Computer Architecture Concepts Outline This image cannot currently be displayed. arallel Computer Architecture Concepts TDDD93 Lecture 1 Christoph Kessler ELAB / IDA Linköping university Sweden 2015 Lecture 1: arallel Computer Architecture

More information

By: Chaitanya Settaluri Devendra Kalia

By: Chaitanya Settaluri Devendra Kalia By: Chaitanya Settaluri Devendra Kalia What is an embedded system? An embedded system Uses a controller to perform some function Is not perceived as a computer Software is used for features and flexibility

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016. Citation

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

A Streaming Multi-Threaded Model

A Streaming Multi-Threaded Model A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread

More information

The Ptolemy Kernel Supporting Heterogeneous Design

The Ptolemy Kernel Supporting Heterogeneous Design February 16, 1995 The Ptolemy Kernel Supporting Heterogeneous Design U N T H E I V E R S I T Y A O F LET THERE BE 1868 LIG HT C A L I A I F O R N by The Ptolemy Team 1 Proposed article for the RASSP Digest

More information

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08 12 Data flow models Peter Marwedel TU Dortmund, Informatik 12 2009/10/08 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Models of computation considered in this course Communication/ local computations

More information

Understandable Concurrency

Understandable Concurrency Edward A. Lee Professor, Chair of EE, and Associate Chair of EECS Director, CHESS: Center for Hybrid and Embedded Software Systems Director, Ptolemy Project UC Berkeley Chess Review November 21, 2005 Berkeley,

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

An Overview of Parallel Computing

An Overview of Parallel Computing An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA

More information

Well-behaved Dataflow Graphs

Well-behaved Dataflow Graphs Well-behaved Dataflow Graphs Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of echnology L21-1 Outline Kahnian networks and dataflow Streams with holes & agged interpretation

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Translating Haskell to Hardware. Lianne Lairmore Columbia University

Translating Haskell to Hardware. Lianne Lairmore Columbia University Translating Haskell to Hardware Lianne Lairmore Columbia University FHW Project Functional Hardware (FHW) Martha Kim Stephen Edwards Richard Townsend Lianne Lairmore Kuangya Zhai CPUs file: ///home/lianne/

More information

Register Organization and Raw Hardware. 1 Register Organization for Media Processing

Register Organization and Raw Hardware. 1 Register Organization for Media Processing EE482C: Advanced Computer Organization Lecture #7 Stream Processor Architecture Stanford University Thursday, 25 April 2002 Register Organization and Raw Hardware Lecture #7: Thursday, 25 April 2002 Lecturer:

More information

System-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms

System-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms System-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms Hugo A. Andrade, Kaushik Ravindran, Alejandro Asenjo, Casey Weltzin NI Berkeley, NI Austin National Instruments Corporation

More information

The Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest

The Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest The Heterogeneous Programming Jungle Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest June 19, 2012 Outline 1. Introduction 2. Heterogeneous System Zoo 3. Similarities 4. Programming

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Embedded Systems CS - ES

Embedded Systems CS - ES Embedded Systems - 1 - Synchronous dataflow REVIEW Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this Each edge labeled with number of tokens consumed/produced

More information

Moore s Law. Computer architect goal Software developer assumption

Moore s Law. Computer architect goal Software developer assumption Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer

More information

Outline Introduction System development Video capture Image processing Results Application Conclusion Bibliography

Outline Introduction System development Video capture Image processing Results Application Conclusion Bibliography Real Time Video Capture and Image Processing System using FPGA Jahnvi Vaidya Advisors: Dr. Yufeng Lu and Dr. In Soo Ahn 4/30/2009 Outline Introduction System development Video capture Image processing

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

CHAPTER 1 Introduction

CHAPTER 1 Introduction CHAPTER 1 Introduction 1.1 Overview 1 1.2 The Main Components of a Computer 3 1.3 An Example System: Wading through the Jargon 4 1.4 Standards Organizations 13 1.5 Historical Development 14 1.5.1 Generation

More information

Published in: Proceedings of the 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems

Published in: Proceedings of the 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems Aalborg Universitet SystemC-AMS SDF Model Synthesis for Exploration of Heterogeneous Architectures Popp, Andreas; Herrholz, Andreas; Gruettner, Kim; Le Moullec, Yannick; Koch, Peter; Nebel, Wolfgang Published

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information