Computing in the age of parallelism: Challenges and opportunities
|
|
- Hope Johnston
- 6 years ago
- Views:
Transcription
1 Computing in the age of parallelism: Challenges and opportunities Jorn W. Janneck Computer Science Dept. Lund University Multicore Day, 23 Sep 2013
2 why we are here Original data collected and plotted by M. Horowitz, F. Labonte, O.Shacham, K. Olukotun, L. Hammond, and C. Batten Dotted line extrapolations by C. Moore. C. Moore, Data Processing in ExaScale-ClassComputer Systems, April
3 physics of computing speed Circuits are still getting larger at a steady pace... but speed-up is no longer the result of a faster clock. Instead, we get more parallel devices. Intel desktop processors, top model for the resp. year. 3
4 physics of computing efficiency Performance = parallelism. Efficiency = locality. It really is that simple. Bill Dally approximate power in 28nm processor Bill Dally 4
5 sin r co e gl P ec U Pentium IV ( Northwood ) m i- c ult or P ec U Core i7 980X ( Gulftown ) pro ce y rra a or s s Epiphany-IV 5
6 programming parallel machines Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 Virtex FPGA 100s ~ 1000s 6
7 programming parallel machines SOPM ( ajava ) Am2045 C w/ ilib CUDA C + VHDL TILE 64 PC102 GT200 SEAforth 40C18 Virtex VentureForth HDL 7
8 portability unicore Pentium IV then PDP-11 now instruction set all of the above, plus... number/kind of registers number and kind of processing elements address space (size, structure) Fujitsu SuperSPARC system software OS GUI... and possibly other 'stuff', like memories multicore processor array topology of interconnect GPU FPGA 8
9 programming sequential machines Pentium IV PDP-11 Fujitsu SuperSPARC
10 programming sequential machines Pentium IV von Neumann machine* C, Pascal, Forth, Java, Lisp, Fortran, Haskell,... PDP-11 Fujitsu SuperSPARC * for the purpose of this talk von Neumann machine single stored-program computer 10
11 a pivotal model of computing computer architects compiler writers programmers language designers 11
12 programming parallel machines? Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 Virtex FPGA 100s ~ 1000s 12
13 our work Implement streaming applications efficiently on a wide range of computing platforms. Pentium IV processor 1 Penryn quadcore multi-core 2 ~ 10s Am2045 MPPA, GPU 10s ~ 100s... Nvidia GT200 language, methodology VHDL/Verilog synthesis software synthesis, IDE profiling, hardware synth., STHORM profiling, tools, software synth. software synthesis, applications software synthesis, applications UC Berkeley Xilinx Ericsson EPFL Lund University INSA Rennes Halmstad Högskolan Virtex FPGA 100s ~ 1000s 13
14 streaming applications how a video expert explains an MPEG decoding algorithm digital signal processing media processing video/audio coding image processing / analytics networking / packet processing automatic control cryptography gene sequencing... 14
15 dataflow computational kernels (actors) provide explicit concurrency execute in discrete atomic steps directed point-to-point connections lossless, order-preserving conceptually unbounded asynchrony, abstraction from time communicating streams of discrete data packets (tokens) no shared state Some things not represented in this model: time (incl. speed of actors and communication channels) execution policy of actors communication protocol and communication medium location and implementation technology of actors MPEG/ISO standardized RVC-CAL, a subset of our actor language, in
16 implementation parsing, translation composition, scheduling code generation partitioning architecture interconnect 16
17 questions we try to answer about stream programs Is it determinate? Does it do what it is supposed to do? Is it bounded? Does it deadlock? Can it be implemented efficiently? about their (more or less) parallel implementations Does it still work? Does it deadlock now? Is it sufficiently fast/efficient/small...? What/how many resources does it use? 17
18 dataflow profiling A B C trace: the structure of a computation trace is a DAG vertices are steps edges are dependencies dependencies mediated by tokens state variables A:1 B:1 C:1 A:2 B:2 C:2 A:3 B:3 C:3 18
19 analysis: post-mortem scheduling basic schedule of trace latency density laboratory for hypothetical designs/architectures What if we changed latency somewhere?... multiplexed resources?... did so dynamically?... used different communication fabric?... 19
20 trace analysis input/output tagging 20
21 trace metrics input (examples) latency (bytes) (incremental) power (Σ power) 21
22 critical path Critical path: the longest time-weighted sequence of events from the start of the program to its termination. 22
23 critical path metrics Impact Analysis: identifying potential for improvement
24 critical path & buffer optimization Objective: Evaluate a trade-off between the buffer size and the critical path length Minimum buffer size configuration for a given throughput Optimize only critical buffers (i.e. buffers that block steps in the current critical path) Critical Path Length CP length with min. buffer size Deadlock Region CP length with unbounded buffer size min. Buffers Size Buffer Size
25 critical path & buffer optimization the throughput is not significatively improved resonable optimal buffer size configuration 25
26 programmers computer architects compiler writers te ac to her o! s, a pivotal model of computing language designers 26
27 here be dragons teaching parallelism There Is No More Sequential Programming. Why Are We Still Teaching It? panel title at the Intel Academic Forum, Supercomputing 08 Parallel Programming Programming preemption synchronization programming languages objects communication races types determinacy asynchrony threads design patterns variables tasks deadlock semaphores complexity data structures processors algorithms loops memory management design-by-contract 27
28 an alternative? Arvind, CSAIL, Massachusetts Institute of Technology Workshop on Directions in Multicore Programming Education, March 8,
29 the larger conversation? These are exciting times. computer architects compiler writers programmers Computing is losing its North Star. language designers Let's imagine that most computers have scores or hundreds of general-purpose cores. (And let's say we do not know exactly how many there are, or how they are connected.) Or programmable logic, or combinations of both. How would that affect... our idea of an algorithm? the languages and abstractions we use to program such a machine? the concept of a module? the things we need to do to validate programs? the techniques for making programs efficient? the concepts and the implementation of an operating system? the way we teach programming? 29
30 Thanks! Johan Eker Ericsson Edward A. Lee UC Berkeley Marco Mattavelli EPFL Carl von Platen Ericsson Dave B. Parlour Tabula Ian Miller Overture Mickaël Raulet INSA Rennes Mathieu Wipliez INSA Rennes Christophe Lucarz EPFL Shuvra S Bhattacharyya U Maryland Karl-Erik Arzen LTH Ruirui Gu Microsoft Endri Bezati EPFL Simone Casale Brunet EPFL Ghislain Roquier EPFL and many others. opendf.org Open Dataflow portal opendf.sf.net open source tool suite orcc.sf.net Open RVC-CAL Compiler ACTORS Project mpeg.chiariglione.org MPEG jwj@cs.lth.se 30
Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms
Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL
More informationMPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology
MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology Hussein Aman-Allah, Ehab Hanna, Karim Maarouf, Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland
More informationDesign of an Embedded Low Complexity Image Coder using CAL language
Author manuscript, published in "Conference on Design and Architectures for Signal and Image Processing (DASIP) 2009, Sophia Antipolis : France (2009)" Design of an Embedded Low Complexity Image Coder
More informationSynthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study
Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickael Raulet To cite this
More informationSynthesizing Hardware from Dataflow Programs
Synthesizing Hardware from Dataflow Programs Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickaël Raulet To cite this version: Jörn W. Janneck, Ian D. Miller, David
More informationA Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
More informationOpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems
OpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems Shuvra S. Bhattacharyya Dept. of ECE and UMIACS University of Maryland, College Park, MD 20742 USA Johan Eker, Carl von Platen
More informationPublished in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers
A machine model for dataflow actors and its applications Janneck, Jörn Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers DOI: 10.1109/ACSSC.2011.6190107
More informationDr. Yassine Hariri CMC Microsystems
Dr. Yassine Hariri Hariri@cmc.ca CMC Microsystems 03-26-2013 Agenda MCES Workshop Agenda and Topics Canada s National Design Network and CMC Microsystems Processor Eras: Background and History Single core
More informationAutomated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi
More informationVLSI Digital Signal Processing
VLSI Digital Signal Processing EEC 28 Lecture Bevan M. Baas Tuesday, January 9, 28 Today Administrative items Syllabus and course overview My background Digital signal processing overview Read Programmable
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationAVC Entropy Coding for MPEG Reconfigurable Video Coding
AVC Entropy Coding for MPEG Reconfigurable Video Coding Hussein Aman-Allah and Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland {hussein.aman-allah, ihab.amer}@epfl.ch
More informationDepartment of Electrical and Computer Engineering, University of Maryland
Department of Electrical and Computer Engineering, University of Maryland OUTLINE Introduction: Problem statement Background Goals Co-processing units generation: Approach and baseline Multi-Dataflow Composer
More informationPERFORMANCE BENCHMARKING OF RVC BASED MULTIMEDIA SPECIFICATIONS
PERFORMANCE BENCHMARKING OF RVC BASED MULTIMEDIA SPECIFICATIONS Junaid Jameel Ahmad 1 Shujun Li 2 Marco Mattavelli 1 1 École Polytechnique Fédérale de Lausanne (EPFL), Switzerland 2 University of Surrey,
More informationHigh-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms
J Real-Time Image Proc (2014) 9:251 262 DOI 10.1007/s11554-013-0326-5 SPECIAL ISSUE High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms Endri
More informationOrcc: multimedia development made easy
Orcc: multimedia development made easy Hervé Yviquel, Antoine Lorence, Khaled Jerbi, Gildas Cocherel, Alexandre Sanchez, Mickaël Raulet To cite this version: Hervé Yviquel, Antoine Lorence, Khaled Jerbi,
More informationAn LLVM-based decoder for MPEG Reconfigurable Video Coding
An LLVM-based decoder for MPEG Reconfigurable Video Coding Jérôme Gorin, Matthieu Wipliez, Jonathan Piat, Françoise Préteux, Mickaël Raulet To cite this version: Jérôme Gorin, Matthieu Wipliez, Jonathan
More informationHIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS
HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale
More informationExploiting Statically Schedulable Regions in Dataflow Programs
DOI 10.1007/s11265-009-0445-1 Exploiting Statically Schedulable Regions in Dataflow Programs Ruirui Gu Jörn W. Janneck Mickaël Raulet Shuvra S. Bhattacharyya Received: 23 June 2009 / Revised: 16 December
More informationEFFICIENT DATA FLOW VARIABLE LENGTH DECODING IMPLEMENTATION FOR THE MPEG RECONFIGURABLE VIDEO CODING FRAMEWORK
EFFICIENT DATA FLOW VARIABLE LENGTH DECODING IMPLEMENTATION FOR THE MPEG RECONFIGURABLE VIDEO CODING FRAMEWORK Jianjun Li, Dandan Ding, Christophe Lucarz, Samuel Keller, Marco Mattavelli Microelectronic
More informationA unified hardware/software co-synthesis solution for signal processing systems
A unified hardware/software co-synthesis solution for signal processing systems Endri Bezati, Hervé Yviquel, Mickaël Raulet, Marco Mattavelli To cite this version: Endri Bezati, Hervé Yviquel, Mickaël
More informationIn Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010.
In Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010. AUTOMATED GENERATION OF AN EFFICIENT MPEG-4 RECONFIGURABLE VIDEO CODING
More informationExploring the Design Space of HEVC Inverse Transforms with Dataflow Programming
Indonesian Journal of Electrical Engineering and Computer Science Vol. 6, No. 1, April 2017, pp. 104 ~ 109 DOI: 10.11591/ijeecs.v6.i1.pp104-109 104 Exploring the Design Space of HEVC Inverse Transforms
More informationMULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION
In Proceedings of the IEEE Workshop on Signal Processing Systems, Quebec City, Canada, October 2012. MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT IMPLEMENTATION Lai-Huei Wang 1, Chung-Ching
More informationProceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al.
Proceedings Chapter Efficient scheduling policies for dynamic data flow programs executed on multi-core MICHALSKA, Malgorzata, et al. Abstract An important challenge of dataflow program implementations
More informationSoftware Code Generation for the RVC-CAL Language
DOI 10.1007/s11265-009-0390-z Software Code Generation for the RVC-CAL Language Matthieu Wipliez Ghislain Roquier Jean-François Nezan Received: 23 January 2009 / Revised: 20 April 2009 / Accepted: 8 June
More informationgeneration for the MPEG Reconfigurable Video Coding framework: From CAL actions to C functions.
Code generation for the MPEG Reconfigurable Video Coding framework: From CAL actions to C functions Matthieu Wipliez, Ghislain Roquier, Mickael Raulet, Jean François Nezan, Olivier Déforges To cite this
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design
ECE 5775 (Fall 17) High-Level Digital Design Automation Hardware-Software Co-Design Announcements Midterm graded You can view your exams during TA office hours (Fri/Wed 11am-noon, Rhodes 312) Second paper
More informationTo hear the audio, please be sure to dial in: ID#
Introduction to the HPP-Heterogeneous Processing Platform A combination of Multi-core, GPUs, FPGAs and Many-core accelerators To hear the audio, please be sure to dial in: 1-866-440-4486 ID# 4503739 Yassine
More informationRVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France
RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING Endri Bezati 1, Marco Mattavelli 1, Mickaël Raulet 2 1 Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland {firstname.lastname}@epfl.ch
More informationDynamic Dataflow. Seminar on embedded systems
Dynamic Dataflow Seminar on embedded systems Dataflow Dataflow programming, Dataflow architecture Dataflow Models of Computation Computation is divided into nodes that can be executed concurrently Dataflow
More informationEmbarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA
Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get
More informationEfficient scheduling policies for dynamic dataflow programs executed on multi-core
Efficient scheduling policies for dynamic dataflow programs executed on multi-core Ma lgorzata Michalska 1, Nicolas Zufferey 2, Jani Boutellier 3, Endri Bezati 1, and Marco Mattavelli 1 1 EPFL STI-SCI-MM,
More informationMethod For Efficient Hardware Implementation From RVC-CAL Dataflow: A LAR. Coder baseline Case Study
Automatic Method For Efficient Hardware Implementation From RVC-CAL Dataflow: A LAR Coder baseline Case Study Khaled Jerbi, Matthieu Wipliez, Mickaël Raulet, Olivier Déforges, Marie Babel, Mohamed Abid
More informationEfficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford
Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable
More informationReconfigurable media coding: a new specification model for multimedia coders
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Reconfigurable media coding: a new specification model for multimedia
More informationAccelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization
Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Ryan Chladny Application Engineering May 13 th, 2014 2014 The MathWorks, Inc. 1 Design Challenge: Electric
More informationClassification-Based Optimization of Dynamic Dataflow Programs
Classification-Based Optimization of Dynamic Dataflow Programs Hervé Yviquel, Emmanuel Casseau, Matthieu Wipliez, Jérôme Gorin, Mickaël Raulet To cite this version: Hervé Yviquel, Emmanuel Casseau, Matthieu
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Shared-variable synchronization approaches for dynamic dataflow programs Modas, Apostolos; Casale-Brunet, Simone; Stewart, Robert James; Bezati,
More informationFundamental Algorithms for System Modeling, Analysis, and Optimization
Fundamental Algorithms for System Modeling, Analysis, and Optimization Stavros Tripakis, Edward A. Lee UC Berkeley EECS 144/244 Fall 2014 Copyright 2014, E. A. Lee, J. Roydhowdhury, S. A. Seshia, S. Tripakis
More informationEvaluation and comparison of inter-processor communication techniques in model-based design flows/tools
Evaluation and comparison of inter-processor communication techniques in model-based design flows/tools Pratiksha Dilip Deshmukh Computer Science in Applications Embedded Systems Group Technical University
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationSoftware Synthesis Trade-offs in Dataflow Representations of DSP Applications
in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationUsing FPGAs in Supercomputing Reconfigurable Supercomputing
Using FPGAs in Supercomputing Reconfigurable Supercomputing Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors
More informationCompute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen
Compute Node Design for DAQ and Trigger Subsystem in Giessen Justus Liebig University in Giessen Outline Design goals Current work in Giessen Hardware Software Future work Justus Liebig University in Giessen,
More informationThe Future of the Ptolemy Project
The Future of the Ptolemy Project Edward A. Lee UC Berkeley With thanks to the entire Ptolemy Team. Ptolemy Miniconference Berkeley, CA, March 22-23, 2001 The Problem Composition Decomposition Corba? TAO?
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationParallelism Marco Serafini
Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November
More informationInterface-Based Design Introduction
Interface-Based Design Introduction A. Richard Newton Department of Electrical Engineering and Computer Sciences University of California at Berkeley Integrated CMOS Radio Dedicated Logic and Memory uc
More informationA Stream Compiler for Communication-Exposed Architectures
A Stream Compiler for Communication-Exposed Architectures Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationCompositionality in system design: interfaces everywhere! UC Berkeley
Compositionality in system design: interfaces everywhere! Stavros Tripakis UC Berkeley DREAMS Seminar, Mar 2013 Computers as parts of cyber physical systems cyber-physical ~98% of the world s processors
More informationApplication-Platform Mapping in Multiprocessor Systems-on-Chip
Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak lsi@cs.york.ac.uk http://www-users.cs.york.ac.uk/lsi CREDES Kick-off Meeting Tallinn - June 2009 Application-Platform
More informationThe Ptolemy II Framework for Visual Languages
The Ptolemy II Framework for Visual Languages Xiaojun Liu Yuhong Xiong Edward A. Lee Department of Electrical Engineering and Computer Sciences University of California at Berkeley Ptolemy II - Heterogeneous
More informationLift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules
Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Toomas Remmelg Michel Steuwer Christophe Dubach The 4th South of England Regional Programming Language Seminar 27th
More informationConcurrent Object Oriented Languages
Concurrent Object Oriented Languages Instructor Name: Franck van Breugel Email: franck@cse.yorku.ca Office: Lassonde Building, room 3046 Office hours: to be announced Overview concurrent algorithms concurrent
More informationCHAPTER 1 Introduction
CHAPTER 1 Introduction 1.1 Overview 1 1.2 The Main Components of a Computer 3 1.3 An Example System: Wading through the Jargon 4 1.4 Standards Organizations 15 1.5 Historical Development 16 1.5.1 Generation
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationModelling, Analysis and Scheduling with Dataflow Models
technische universiteit eindhoven Modelling, Analysis and Scheduling with Dataflow Models Marc Geilen, Bart Theelen, Twan Basten, Sander Stuijk, AmirHossein Ghamarian, Jeroen Voeten Eindhoven University
More informationMulticore DSP Software Synthesis using Partial Expansion of Dataflow Graphs
Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs George F. Zaki, William Plishker, Shuvra S. Bhattacharyya University of Maryland, College Park, MD, USA & Frank Fruth Texas Instruments
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationGPGPU/CUDA/C Workshop 2012
GPGPU/CUDA/C Workshop 2012 Day-1: GPGPU/CUDA/C and WSU Presenter(s): Abu Asaduzzaman Nasrin Sultana Wichita State University July 10, 2012 GPGPU/CUDA/C Workshop 2012 Outline Introduction to the Workshop
More informationPortability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures
Photos placed in horizontal position with even amount of white space between photos and header Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Christopher Forster,
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationParallelism. Parallel Hardware. Introduction to Computer Systems
Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,
More informationImplementation Alternatives. Lecture 2: Implementation Alternatives & Concurrent Programming. Current Implementation Alternatives
Lecture 2: Implementation Alternatives & Concurrent ming Implementation Alternatives Controllers can be implemented using a number of techniques [RTCS Ch 3] Implementation Alternatives Processes and threads
More informationEmbedded Software from Concurrent Component Models
Embedded Software from Concurrent Component Models Edward A. Lee UC Berkeley with Shuvra Bhattacharyya, Johan Eker, Christopher Hylands, Jie Liu, Xiaojun Liu, Steve Neuendorffer, Jeff Tsay, and Yuhong
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationA Deterministic Concurrent Language for Embedded Systems
SHIM:A A Deterministic Concurrent Language for Embedded Systems p. 1/28 A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A
More informationParallel Computer Architecture Concepts
Outline This image cannot currently be displayed. arallel Computer Architecture Concepts TDDD93 Lecture 1 Christoph Kessler ELAB / IDA Linköping university Sweden 2015 Lecture 1: arallel Computer Architecture
More informationBy: Chaitanya Settaluri Devendra Kalia
By: Chaitanya Settaluri Devendra Kalia What is an embedded system? An embedded system Uses a controller to perform some function Is not perceived as a computer Software is used for features and flexibility
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016. Citation
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationA Streaming Multi-Threaded Model
A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread
More informationThe Ptolemy Kernel Supporting Heterogeneous Design
February 16, 1995 The Ptolemy Kernel Supporting Heterogeneous Design U N T H E I V E R S I T Y A O F LET THERE BE 1868 LIG HT C A L I A I F O R N by The Ptolemy Team 1 Proposed article for the RASSP Digest
More informationfakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08
12 Data flow models Peter Marwedel TU Dortmund, Informatik 12 2009/10/08 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Models of computation considered in this course Communication/ local computations
More informationUnderstandable Concurrency
Edward A. Lee Professor, Chair of EE, and Associate Chair of EECS Director, CHESS: Center for Hybrid and Embedded Software Systems Director, Ptolemy Project UC Berkeley Chess Review November 21, 2005 Berkeley,
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationAn Overview of Parallel Computing
An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA
More informationWell-behaved Dataflow Graphs
Well-behaved Dataflow Graphs Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of echnology L21-1 Outline Kahnian networks and dataflow Streams with holes & agged interpretation
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationTranslating Haskell to Hardware. Lianne Lairmore Columbia University
Translating Haskell to Hardware Lianne Lairmore Columbia University FHW Project Functional Hardware (FHW) Martha Kim Stephen Edwards Richard Townsend Lianne Lairmore Kuangya Zhai CPUs file: ///home/lianne/
More informationRegister Organization and Raw Hardware. 1 Register Organization for Media Processing
EE482C: Advanced Computer Organization Lecture #7 Stream Processor Architecture Stanford University Thursday, 25 April 2002 Register Organization and Raw Hardware Lecture #7: Thursday, 25 April 2002 Lecturer:
More informationSystem-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms
System-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms Hugo A. Andrade, Kaushik Ravindran, Alejandro Asenjo, Casey Weltzin NI Berkeley, NI Austin National Instruments Corporation
More informationThe Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest
The Heterogeneous Programming Jungle Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest June 19, 2012 Outline 1. Introduction 2. Heterogeneous System Zoo 3. Similarities 4. Programming
More informationHARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK
DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication
More informationEmbedded Systems CS - ES
Embedded Systems - 1 - Synchronous dataflow REVIEW Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this Each edge labeled with number of tokens consumed/produced
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationOutline Introduction System development Video capture Image processing Results Application Conclusion Bibliography
Real Time Video Capture and Image Processing System using FPGA Jahnvi Vaidya Advisors: Dr. Yufeng Lu and Dr. In Soo Ahn 4/30/2009 Outline Introduction System development Video capture Image processing
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationHigh Performance Computing. University questions with solution
High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The
More informationCHAPTER 1 Introduction
CHAPTER 1 Introduction 1.1 Overview 1 1.2 The Main Components of a Computer 3 1.3 An Example System: Wading through the Jargon 4 1.4 Standards Organizations 13 1.5 Historical Development 14 1.5.1 Generation
More informationPublished in: Proceedings of the 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems
Aalborg Universitet SystemC-AMS SDF Model Synthesis for Exploration of Heterogeneous Architectures Popp, Andreas; Herrholz, Andreas; Gruettner, Kim; Le Moullec, Yannick; Koch, Peter; Nebel, Wolfgang Published
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More information