Application-Platform Mapping in Multiprocessor Systems-on-Chip
|
|
- Baldric Newton
- 6 years ago
- Views:
Transcription
1 Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak CREDES Kick-off Meeting Tallinn - June 2009
2 Application-Platform Mapping in MPSoC L. S. Indrusiak Outline Motivation Application-Platform Mapping Multiprocessor System-on-Chip Platforms Application Modelling System Validation Conclusions and Future Work 2
3 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation Application-Platform mapping isn t really a problem when dealing with sequential software and uniprocessor hardware application platform 3
4 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation It has been studied for decades by researchers in parallel and distributed computing platform 1 platform 3 platform 2 platform 4 application application application application application
5 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation CMOS limits can be felt in many ways number of transistors clock frequency power dissipation Expectations of performance increase must be met somehow multiprocessing seems to be the best shot transistors (x 10 3 ) clock freq. (MHz) power (W)
6 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms? IDLE IDLE IDLE REALLY? 6
7 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation Sources: Gordon ASPLOS 06, Kudlur PLDI Unicore Homogeneous Multicore Heterogeneous Multicore PicoChip AMBRIC CISCO CSR1 NVIDIA G80 Larrabee # cores/chip C/C++/Java??? Opteron 4P AMD Fusion BCM 1480 Core2Quad Xbox 360 Xeon 4004 Power4 PA8800 Power Opteron Pentium P2 P3 P4 Core2Duo CoreDuo RAW Athlon Niagara RAZA XLR Core Itanium Itanium2 Cavium Cell
8 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms must be able to explore concurrency at the application level 8
9 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms can be done dynamically to improve performance or to increase dependability 9
10 Application-Platform Mapping in MPSoC L. S. Indrusiak Application-Platform Mapping The simplest formulation of the mapping problem resembles a graph isomorphism problem application is a graph G = G(A, C), where a i A is an application task and c i,j C represents the communication from a i to a j platform is a graph G = G(B, D), where b i B is a processor and d i,j D represents the channel from b i to b j objective is to map tasks onto processors such that task that communicate with each other lie on adjacent processors There is no known polynomial time solution for the graph isomorphism problem 10
11 Application-Platform Mapping Application-Platform Mapping in MPSoC L. S. Indrusiak More sophisticated problem formulations may include additional information to the application and platform graphs, so that different objectives can be met platform: processing power of b i power consumption and latency of d i,j application: computational cost and deadline of a i volume, max latency and required bandwidth of c i,j objectives: reduce bandwidth requirements on the platform, reduce power consumption, balance thermal dissipation, etc. 11
12 Application-Platform Mapping in MPSoC L. S. Indrusiak Application-Platform Mapping Platform models over-simplify the complex design space of on-chip multiprocessor platforms All approaches disregard the temporal dimension application and platform models must include further details on time and concurrency 12
13 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Many architectural alternatives homogeneous vs. heterogeneous processing shared memory vs. distributed memory on-chip interconnect point-to-point, crossbar on-chip bus network-on-chip source: Intel 13
14 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Cell Processor It has nine processors: one Power Processor Element (P) and eight Synergistic Processing Elements (S) P has separated I & D L1 cache (32KB each) Each S can only access its 256KB of local storage (LS) and uses its memory flow controller (MFE) to perform DMA operations to/from LS (non-blocking) Bandwidth (3.2 GHz) S <-> LS = 2x 25.6 GB/s MFC <-> EIB = 2x 25.6 GB/s MIC <-> EIB = 2x 25.6 GB/s L1 <-> L2 = 2x 51.2 GB/s 14
15 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms ARM Cortex-9 MPCore Can have 1-4 Cortex-A9 cores with separated I & D L1 cache (32KB each) Data caches of all cores are fully coherent Interface to external components be made cache-coherent On-chip interconnect based on AMBA standard 15
16 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms As the number of cores increase, on-chip communication becomes a major issue Source: ITRS Roadmap 16
17 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Current solution: reusability standard processors reusable IP cores reusable on-chip interconnects OS application middleware OS OS OS reusable interconnect Networks-on-Chip (NoC) multi-hop, packet-based interconnect R R R scalable, high bandwidth low power (shorter wires) R R R regular, reusable R R R 17
18 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Networks-on-Chip: design decisions topology switching packetisation arbitration flow control buffering routing R R R R R R R R R R R R R R R R R 18
19 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Networks-on-Chip for 3D architectures all previous alternatives and more may integrate dies produced with different technologies asymmetry between horizontal and vertical channels increased locality thermal issues become critical load balancing 19
20 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Design decisions must be evaluated using abstract models of the platform cycle-accurate cycle-approximate untimed/functional structural/graph Modeling trade-offs accuracy observability speed 20
21 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Cycle-accurate and cycle-approximate models flit-level accuracy observability: buffer occupation, latency, throughput,... graphical interface to experiment with topology, routing, buffering,... 21
22 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Coding techniques to reduce bit transition activity over NoC interconnects experimentation of alternative coding techniques aiming to reduce the power consumption in NoC interconnects reductions of up to 60% of bit transition were achieved 22
23 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Simplified models packet-level accuracy, 1-position buffers complete packet is abstracted by header and trailler header s way through the network is fully simulated, trailler latencies are calculated less than 5% error for average latency (compared with cycle-accurate HDL model) fully analytical latencies of packet delivery are calculated upon packet creation and refined until packet delivery function of network occupation, route, buffer sizes, switching and arbitration techniques real-time analysis static analysis of worst-case interference between network flows 23
24
25
26
27
28
29
30
31 Application-Platform Mapping in MPSoC Application Modeling Compilers can only go so far on extracting parallelism from sequential code Developers must have the means to specify the inherent concurrency of each particular application avoiding pitfalls such as deadlocks or undesired non-determinism application L. S. Indrusiak platform IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE 31
32 Application-Platform Mapping in MPSoC Application Modeling application L. S. Indrusiak application Many concurrent programming models available threads concurrent processes with message-passing actors streams/dataflow CSP timed automata <add your favorite here> 32
33 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling The choice of which concurrent programming model should be adopted depends on application domain familiarity by the designer/developer availability of stable flows and tools model tools threads libraries in Java and C++ message passing OpenMP actors Simulink, Ptolemy streams / dataflow StreamIt, CUDA timed automata UPAAL 33
34 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling period=4ms period=1ms Application models are abstract representations of a program, which can be used both at design time and at runtime A period=2ms B D can be strongly influenced by the programming model A1 A2 D1 B1 D2 C period=1ms A3 A4 D3 B2 C1 D4 34
35 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling Executable models fast execution rich set of modeling constructs, different abstraction levels clearly defined execution semantics, so that the dynamic behavior of the application can be properly validated Formal properties models of concurrent computation concurrency constraint through the definition of ordering relations polymorphic type systems 35
36 Application-Platform Mapping in MPSoC L. S. Indrusiak Concurrent Models of Computation Actor orientation, as proposed by E. A. Lee, UC Berkeley revisited fundamental concepts from Atkinson & Hewitt (MIT, 1977) and Gul Agha (MIT, 1986) Actor orientation basics execution semantics of a given system model was factored out from the individual components of that model execution semantics is associated to a well defined Model of Computation (MoC) heterogeneous models can be created through the hierarchical composition of MoCs 36
37 Application-Platform Mapping in MPSoC L. S. Indrusiak Actor-orientation execution semantic of the top level model execution semantic of the composite model 37
38 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling Extending actor-orientation with types and explicit ordering UML suitable visual representation for the definition of polymorphic type systems (class diagrams) and ordering relations (sequence diagram) but UML is not an executable specification language? 38
39 Application-Platform Mapping in MPSoC L. S. Indrusiak UML sequence diagrams within actors Recall the formal definition of MSC tuple P, E, C, l, m, < where P is a finite set of processes E is a finite set of events C is a finite set of names for messages l: E T = { p!q(a), p?q(a), p(a) p q P, a C } m: S R < E E is a acyclic relation between events consisting of: a total order on EP for every p P, and s < r, whenever m(s)=r 39
40 Application-Platform Mapping in MPSoC L. S. Indrusiak UML sequence diagrams within actors Recall the formal definition of MSC order of events (message occurrence) within a process (lifeline) is a total order the reflexive-transitive closure of < (denoted as <*) is a partial order on the complete set E enough for the definition of an untimed model of computation different possibilities were explored and integrated as a library of directors on an extended version of Ptolemy II 40
41 Application-Platform Mapping in MPSoC L. S. Indrusiak Case study Application modeling based on behavioral patterns and polymorphic type systems application functionality described using actors constraints to concurrent execution described using UML 41
42 Application-Platform Mapping in MPSoC L. S. Indrusiak System Validation Joint validation of application and platform models Application is backannotated with communication costs and power consumption data from the execution of platform models MAPR 42
43 Application-Platform Mapping in MPSoC L. S. Indrusiak System Validation Mapping heuristic becomes a design decision experiments show that in large multiprocessor platforms, the average communication latency of a given application can vary by 2 orders of magnitude according to the employed mapping heuristic MAPR 43
44 Application-Platform Mapping in MPSoC L. S. Indrusiak Case study Application and platform modeling on Ptolemy II multiple MoCs supports the evaluation of how different platforms execute a given application supports the evaluation of different mapping heuristics synthetic example: autonomous vehicle 44
45 Application-Platform Mapping in MPSoC L. S. Indrusiak Conclusions and Future Work Extensions to the state-of-the art on system-level specification and validation combination of formalism, simulation/execution and analytical solution Extensions on application and platform models allow for richer mapping mechanisms temporal constraint based on concurrent MoC and explicit ordering spacial constraint based on functional types 45
46 Application-Platform Mapping in MPSoC L. S. Indrusiak Conclusions and Future Work Further progress must be done in mapping to fully explore information in application and platform models time and concurrency functional constraints Must explore different implementation styles, aiming to achieve more reliable platforms static vs. dynamic mapping centralised vs. distributed mapping 46
CS758: Multicore Programming
CS758: Multicore Programming Introduction Fall 2009 1 CS758 Credits Material for these slides has been contributed by Prof. Saman Amarasinghe, MIT Prof. Mark Hill, Wisconsin Prof. David Patterson, Berkeley
More informationSaman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology
Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores
More informationDistributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC
Distributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zürich MPSoCs are Hard to program (
More informationCellSs Making it easier to program the Cell Broadband Engine processor
Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of
More informationReal-Time Mixed-Criticality Wormhole Networks
eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks
More informationIBM Cell Processor. Gilbert Hendry Mark Kretschmann
IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More information4. Shared Memory Parallel Architectures
Master rogram (Laurea Magistrale) in Computer cience and Networking High erformance Computing ystems and Enabling latforms Marco Vanneschi 4. hared Memory arallel Architectures 4.4. Multicore Architectures
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationCOSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors
COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.
More informationMapping of Real-time Applications on
Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationNetwork on Chip Architecture: An Overview
Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationBlue-Steel Ray Tracer
MIT 6.189 IAP 2007 Student Project Blue-Steel Ray Tracer Natalia Chernenko Michael D'Ambrosio Scott Fisher Russel Ryan Brian Sweatt Leevar Williams Game Developers Conference March 7 2007 1 Imperative
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationA unified multicore programming model
A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging
More informationExperiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain
More informationParallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers
More informationCell Broadband Engine. Spencer Dennis Nicholas Barlow
Cell Broadband Engine Spencer Dennis Nicholas Barlow The Cell Processor Objective: [to bring] supercomputer power to everyday life Bridge the gap between conventional CPU s and high performance GPU s History
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationOutline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities
Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationAchieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation
Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation
More informationNoC Simulation in Heterogeneous Architectures for PGAS Programming Model
NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe
More informationFault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson
Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationThe SARC Architecture
The SARC Architecture Polo Regionale di Como of the Politecnico di Milano Advanced Computer Architecture Arlinda Imeri arlinda.imeri@mail.polimi.it 19-Jun-12 Advanced Computer Architecture - The SARC Architecture
More informationConcurrent Models of Computation
Concurrent Models of Computation Edward A. Lee Robert S. Pepper Distinguished Professor, UC Berkeley EECS 219D: Concurrent Models of Computation Fall 2011 Copyright 2011, Edward A. Lee, All rights reserved
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationA Thermal-aware Application specific Routing Algorithm for Network-on-chip Design
A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationCompositionality in system design: interfaces everywhere! UC Berkeley
Compositionality in system design: interfaces everywhere! Stavros Tripakis UC Berkeley DREAMS Seminar, Mar 2013 Computers as parts of cyber physical systems cyber-physical ~98% of the world s processors
More informationModeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano
Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationTeleport Messaging for. Distributed Stream Programs
Teleport Messaging for 1 Distributed Stream Programs William Thies, Michal Karczmarek, Janis Sermulins, Rodric Rabbah and Saman Amarasinghe Massachusetts Institute of Technology PPoPP 2005 http://cag.lcs.mit.edu/streamit
More informationThe Shift to Multicore Architectures
Parallel Programming Practice Fall 2011 The Shift to Multicore Architectures Bernd Burgstaller Yonsei University Bernhard Scholz The University of Sydney Why is it important? Now we're into the explicit
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationCSC501 Operating Systems Principles. OS Structure
CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last
More informationGiorgio Buttazzo. Scuola Superiore Sant Anna, Pisa. The transition
Giorgio Buttazzo Scuola Superiore Sant Anna, Pisa The transition On May 7 th, 2004, Intel, the world s largest chip maker, canceled the development of the Tejas processor, the successor of the Pentium4-style
More informationHybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University
Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects
More informationParallelism in Hardware
Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationChapter 2 Designing Crossbar Based Systems
Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based
More informationParallel Computing. Parallel Computing. Hwansoo Han
Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple
More informationnforce 680i and 680a
nforce 680i and 680a NVIDIA's Next Generation Platform Processors Agenda Platform Overview System Block Diagrams C55 Details MCP55 Details Summary 2 Platform Overview nforce 680i For systems using the
More informationECE/CS 757: Advanced Computer Architecture II Interconnects
ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationComputer Architecture 计算机体系结构. Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Classification of parallel architectures Shared-memory system Cache coherency
More informationNetwork-on-Chip Architecture
Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers
William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationModelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system
Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system A.J.M. Moonen Information and Communication Systems Department of Electrical Engineering Eindhoven University
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationMapping and Configuration Methods for Multi-Use-Case Networks on Chips
Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali, Stanford University Martijn Coenen, Andrei Radulescu, Kees Goossens, Giovanni De Micheli, Ecole Polytechnique Federal
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationModule 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:
The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012
More informationOpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April
More informationNetworks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.
Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample
More informationPart IV: 3D WiNoC Architectures
Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures
More informationPtolemy II The automotive challenge problems version 4.1
Ptolemy II The automotive challenge problems version 4.1 Johan Eker Edward Lee with thanks to Jie Liu, Paul Griffiths, and Steve Neuendorffer MoBIES Working group meeting, 27-28 September 2001, Dearborn
More informationECE/CS 250 Computer Architecture. Summer 2016
ECE/CS 250 Computer Architecture Summer 2016 Multicore Dan Sorin and Tyler Bletsch Duke University Multicore and Multithreaded Processors Why multicore? Thread-level parallelism Multithreaded cores Multiprocessors
More informationArquitecturas y Modelos de. Multicore
Arquitecturas y Modelos de rogramacion para Multicore 17 Septiembre 2008 Castellón Eduard Ayguadé Alex Ramírez Opening statements * Some visionaries already predicted multicores 30 years ago And they have
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationHardware/Software Co-design
Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationSony/Toshiba/IBM (STI) CELL Processor. Scientific Computing for Engineers: Spring 2008
Sony/Toshiba/IBM (STI) CELL Processor Scientific Computing for Engineers: Spring 2008 Nec Hercules Contra Plures Chip's performance is related to its cross section same area 2 performance (Pollack's Rule)
More informationRoadrunner. By Diana Lleva Julissa Campos Justina Tandar
Roadrunner By Diana Lleva Julissa Campos Justina Tandar Overview Roadrunner background On-Chip Interconnect Number of Cores Memory Hierarchy Pipeline Organization Multithreading Organization Roadrunner
More informationLecture 8. The StreamIt Language IAP 2007 MIT
6.189 IAP 2007 Lecture 8 The StreamIt Language 1 6.189 IAP 2007 MIT Languages Have Not Kept Up C von-neumann machine Modern architecture Two choices: Develop cool architecture with complicated, ad-hoc
More informationParallel and Distributed Computing
Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering
More informationLecture 2: Memory Systems
Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer
More informationA Simplified Executable Model to Evaluate Latency and Throughput of Networks-on-Chip
A Simplified Executable Model to Evaluate Latency and Throughput of Networks-on-Chip Leandro Möller Luciano Ost, Leandro Soares Indrusiak Sanna Määttä Fernando G. Moraes Manfred Glesner Jari Nurmi {ost,
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationParallel Exact Inference on the Cell Broadband Engine Processor
Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview
More informationAdvanced Tool Architectures. Edited and Presented by Edward A. Lee, Co-PI UC Berkeley. Tool Projects. Chess Review May 10, 2004 Berkeley, CA
Advanced Tool Architectures Edited and Presented by Edward A. Lee, Co-PI UC Berkeley Chess Review May 10, 2004 Berkeley, CA Tool Projects Concurrent model-based design Giotto (Henzinger) E machine & S
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More information