Issues in Multiprocessors

Similar documents
Issues in Multiprocessors

Computer parallelism Flynn s categories

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Dr. Joe Zhang PDC-3: Parallel Platforms

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chap. 4 Multiprocessors and Thread-Level Parallelism

Parallel Computing Platforms

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Processor Architecture and Interconnect

Introduction to Parallel Computing

CMSC 611: Advanced. Parallel Systems

Parallel Architectures

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Parallel Architecture. Sathish Vadhiyar

Computer Architecture

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Multiprocessors - Flynn s Taxonomy (1966)

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

COSC 6385 Computer Architecture - Multi Processor Systems

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Why Multiprocessors?

Computer Systems Architecture

Lecture 2 Parallel Programming Platforms

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

WHY PARALLEL PROCESSING? (CE-401)

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Computer Systems Architecture

Flynn s Classification

Convergence of Parallel Architecture

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Overview. Processor organizations Types of parallel machines. Real machines

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Introduction II. Overview

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Parallel Architecture. Hwansoo Han

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Organization. Chapter 16

Organisasi Sistem Komputer

Chapter 5. Thread-Level Parallelism

Chapter 9 Multiprocessors

Objectives of the Course

Aleksandar Milenkovich 1

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Multiprocessors & Thread Level Parallelism

Module 5 Introduction to Parallel Processing Systems

Multi-core Programming - Introduction

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Chapter 4 Data-Level Parallelism

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Advanced Parallel Architecture. Annalisa Massini /2017

Online Course Evaluation. What we will do in the last week?

Embedded Systems Architecture. Computer Architectures

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

CS Parallel Algorithms in Scientific Computing

Computer Science 246. Computer Architecture

CSCI 4717 Computer Architecture

Part IV. Chapter 15 - Introduction to MIMD Architectures

Introduction to parallel computing

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

A taxonomy of computer architectures

Parallel Computing: Parallel Architectures Jin, Hai

Lecture 24: Virtual Memory, Multiprocessors

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 11. Introduction to Multiprocessors

Lecture 9: MIMD Architecture

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COSC4201 Multiprocessors

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Lecture 24: Memory, VM, Multiproc

Advanced Computer Architecture. The Architecture of Parallel Computers

Types of Parallel Computers

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Three basic multiprocessing issues

Introduction. EE 4504 Computer Organization

Lecture 9: MIMD Architectures

Dheeraj Bhardwaj May 12, 2003

THREAD LEVEL PARALLELISM

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Parallel Architectures

Transcription:

Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel identify & synchronize different asynchronous threads data parallel same operation on different parts of the shared data space Winter 2006 CSE 548 - Multiprocessors 1

Issues in Multiprocessors How to express parallelism language support HPF, ZPL runtime library constructs coarse-grain, explicitly parallel C programs automatic (compiler) detection implicitly parallel C & Fortran programs, e.g., SUIF & PTRANS compilers Algorithm development embarrassingly parallel programs could be easily parallelized development of different algorithms for same problem Winter 2006 CSE 548 - Multiprocessors 2

Issues in Multiprocessors How to get good parallel performance recognize parallelism transform programs to increase parallelism without decreasing processor locality decrease sharing costs Winter 2006 CSE 548 - Multiprocessors 3

Flynn Classification SISD: single instruction stream, single data stream single-context uniprocessors SIMD: single instruction stream, multiple data streams exploits data parallelism example: Thinking Machines CM MISD: multiple instruction streams, single data stream systolic arrays example: Intel iwarp, streaming processors MIMD: multiple instruction streams, multiple data streams multiprocessors multithreaded processors parallel programming & multiprogramming relies on control parallelism: execute & synchronize different asynchronous threads of control example: most processor companies have MP configurations Winter 2006 CSE 548 - Multiprocessors 4

CM-1 Winter 2006 CSE 548 - Multiprocessors 5

Systolic Array Winter 2006 CSE 548 - Multiprocessors 6

MIMD Low-end bus-based simple, but a bottleneck simple cache coherency protocol physically centralized memory uniform memory access (UMA machine) Sequent Symmetry, SPARCCenter, Alpha-, PowerPC- or SPARCbased servers Winter 2006 CSE 548 - Multiprocessors 7

Low-end MP Winter 2006 CSE 548 - Multiprocessors 8

MIMD High-end higher bandwidth, multiple-path interconnect more scalable more complex cache coherency protocol (if shared memory) longer latencies physically distributed memory non-uniform memory access (NUMA machine) could have processor clusters SGI Challenge, Convex Examplar, Cray T3D, IBM SP-2, Intel Paragon Winter 2006 CSE 548 - Multiprocessors 9

High-end MP Winter 2006 CSE 548 - Multiprocessors 10

Comparison of Issue Capabilities Winter 2006 CSE 548 - Multiprocessors 11

MIMD Programming Models Address space organization for physically distributed memory distributed shared memory 1 global address space multicomputers private address space/processor Inter-processor communication shared memory accessed via load/store instructions SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2 message passing explicit communication by sending/receiving messages TMC CM-5, Intel Paragon, IBM SP-2 Winter 2006 CSE 548 - Multiprocessors 12

Shared Memory vs. Message Passing Shared memory + simple parallel programming model global shared address space not worry about data locality but get better performance when program for data placement lower latency when data is local but can do data placement if it is crucial, but don t have to hardware maintains data coherence synchronize to order processor s accesses to shared data like uniprocessor code so parallelizing by programmer or compiler is easier can focus on program semantics, not interprocessor communication Winter 2006 CSE 548 - Multiprocessors 13

Shared Memory vs. Message Passing Shared memory + low latency (no message passing software) but overlap of communication & computation latency-hiding techniques can be applied to message passing machines + higher bandwidth for small transfers but usually the only choice Winter 2006 CSE 548 - Multiprocessors 14

Shared Memory vs. Message Passing Message passing + abstraction in the programming model encapsulates the communication costs but more complex programming model additional language constructs need to program for nearest neighbor communication + no coherency hardware + good throughput on large transfers but what about small transfers? + more scalable (memory latency doesn t scale with the number of processors) but large-scale SM has distributed memory also hah! so you re going to adopt the message-passing model? Winter 2006 CSE 548 - Multiprocessors 15

Shared Memory vs. Message Passing Why there was a debate little experimental data not separate implementation from programming model can emulate one paradigm with the other Who won? MP on SM machine message buffers in local (to each processor) memory copy messages by ld/st between buffers SM on MP machine ld/st becomes a message copy sloooooooooow Winter 2006 CSE 548 - Multiprocessors 16