What is a parallel computer?

Similar documents
Convergence of Parallel Architecture

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms

Parallel Programming Models and Architecture

Conventional Computer Architecture. Abstraction

Lecture 1: Parallel Architecture Intro

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Computer Architecture

Evolution and Convergence of Parallel Architectures

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

ECE 669 Parallel Computer Architecture

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile?

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Three basic multiprocessing issues

An Introduction to Parallel Programming

Convergence of Parallel Architectures

Chap. 4 Multiprocessors and Thread-Level Parallelism

ECE 669 Parallel Computer Architecture

Handout 3 Multiprocessor and thread level parallelism

Lecture 1: Introduction

CMSC 611: Advanced. Parallel Systems

Lecture 9: MIMD Architecture

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Parallel Arch. Review

Why Parallel Architecture

Parallel Architecture Fundamentals

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Learning Curve for Parallel Applications. 500 Fastest Computers

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

Computer Architecture Today (I)

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Uniprocessor Computer Architecture Example: Cray T3E

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Lecture 9: MIMD Architectures

Architecture or Parallel Computers CSC / ECE 506

Computer Architecture. Fall Dongkun Shin, SKKU

--Introduction to Culler et al. s Chapter 2, beginning 192 pages on software

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Lecture 1: CS/ECE 3810 Introduction

Lecture 9: MIMD Architectures

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Computer Architecture

EE382 Processor Design. Processor Issues for MP

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

WHY PARALLEL PROCESSING? (CE-401)

Outline Marquette University

Chapter 1: Perspectives

Parallelism. CS6787 Lecture 8 Fall 2017

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Computer Architecture

Computer Architecture

45-year CPU Evolution: 1 Law -2 Equations

ECE 588/688 Advanced Computer Architecture II

Scalable Distributed Memory Machines

CSE502: Computer Architecture CSE 502: Computer Architecture

Multi-Core Microprocessor Chips: Motivation & Challenges

Computer Architecture!

Arquitecturas y Modelos de. Multicore

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Design of Digital Circuits Lecture 21: GPUs. Prof. Onur Mutlu ETH Zurich Spring May 2017

Parallel Processors. Session 1 Introduction

ECE/CS 250 Computer Architecture. Summer 2016

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

RECAP. B649 Parallel Architectures and Programming

Lecture 1: Gentle Introduction to GPUs

EE/CSCI 451: Parallel and Distributed Computation

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Introducing Multi-core Computing / Hyperthreading

Parallel Architectures

COMPUTER ARCHITECTURE

Multiprocessors - Flynn s Taxonomy (1966)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Parallel Computer Architecture II

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Computer Systems Architecture

CS61C : Machine Structures

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Computer Architecture!

ECE 588/688 Advanced Computer Architecture II

Performance measurement. SMD149 - Operating Systems - Performance and processor design. Introduction. Important trends affecting performance issues

Review: Creating a Parallel Program. Programming for Performance

Multiple Context Processors. Motivation. Coping with Latency. Architectural and Implementation. Multiple-Context Processors.

Chapter Seven Morgan Kaufmann Publishers

NOW Handout Page 1. Recap: Performance Trade-offs. Shared Memory Multiprocessors. Uniprocessor View. Recap (cont) What is a Multiprocessor?

CS 475: Parallel Programming Introduction

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

CS758: Multicore Programming

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Computer Architecture

Yi Shi Fall 2017 Xi an Jiaotong University

CPS104 Computer Organization Lecture 1

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Architecture. Hwansoo Han

Transcription:

7.5 credit points Power 2 CPU L 2 $ IBM SP-2 node Instructor: Sally A. McKee General interconnection network formed from 8-port switches Memory bus Memory 4-way interleaved controller DRAM MicroChannel bus NIC I/O DMA i860 NI DRAM 1/20/2009 slide 1 What is a parallel computer? A parallel computer is a collection of processing elements that cooperate to solve large problems (fast) 1/20/2009 slide 2 1

Why parallel computers? Technology trends New performance demanding applications Killer microprocessors Economics A collection of killer microprocessors (integrated on the same chip) 1/20/2009 slide 3 Programming issues: What programming model? System interface Broad issues A parallel computer is a collection of processing elements that cooperate to solve large problems (fast) Architectural model issues: How big a collection? How powerful are the elements? How do the elements cooperate? Performance cross cutting issues: Impact of system design tradeoffs on application performance Impact of application design on performance 1/20/2009 slide 4 2

Goal and overview The goal of this course is to provide knowledge on Programming models and techniques for design of high-performance parallel programs the data parallel model the shared address-space model the message-passing model Design principles for parallel computers small-scale system design tradeoffs scalable system design tradeoffs interconnection networks 1/20/2009 slide 5 Course requirements and material Course requirements and material Course requirements: Lab and approved projects Final exam (written: concepts and problem solving) Course material Textbook: Culler, Singh, and Gupta Parallel Computer Architecture: A Hardware/Software Approach (available at Cremona) Draft chapters from forthcoming text NEED VOLUNTEERS FOR CLASS TESTING! Solutions manual (from web site) Copies of slides (from web site) http://www.cse.chalmers.se/edu/course/eda281 http://www.cse.chalmers.se/edu/year/2009/course/eda281/ Visit regularly 1/20/2009 slide 6 3

Overview of parallel computer technology: What it is? What it is for? What are the issues? Driving forces behind parallel computers (1.1) Evolution behind today s parallel computers (1.2) Fundamental design issues (1.3) Methodology for designing parallel programs (2.1 2.2) 1/20/2009 slide 7 Three driving forces Three driving forces Application demands (coarse-grain parallelism abounds): Scientific computing (e.g., modeling of phenomena in science) Engineering computing (e.g., CAD and design analysis) Commercial computing (e.g., media and information processing) Technology trends Transistor density growth high Clock frequency improvement moderate Architecture trends Diminishing returns on instruction-level parallelism 1/20/2009 slide 8 4

Parallelism in sequential programs for i = 0 to N-1 a[(i+1) mod N] := b[i] + c[i]; for i = 0 to N-1 d[i] := C*a[i]; Iteration: 0 1 2 N-1 Loop 1 a[1] a[2] a[0] Loop 2 a[0] a[1] a[n-1] data dependencies A sequential program on a superscalar processor: Programming model: Architecture: sequential instruction-level parallelism Gap between model register (memory) communication and architecture has pipeline interlocking for increased synchronization 1/20/2009 slide 9 A parallel programming model Extended semantics to express units of parallelism at the instruction level thread level program level communication and coordination between units of parallelism at the register level memory level I/O level 1/20/2009 slide 10 5

Programming model vs. parallel architecture CAD Databases Scientific modeling Parallel applications Multiprogramming Shared address Message passing Data parallel Programming models Compiler or library Operating system support Communication abstraction User/system boundary Communication harrdware Hardware/software boundary Physical communication medium Three key concepts Communication abstraction supports programming models Communication architecture (ISA plus primitives for comm/synch) Hardware/software boundary to define which parts of the communica architecture are implemented in hardware or software 1/20/2009 slide 11 Shared address space (SAS) model Programming model Parallelism parts of a program, called threads Communication and coordination among threads through a shared global address space for_all i = 0 to P-1 for j = i0[i] to in[i] a[(j+1) mod N] := Communication abstraction supported by HW/SW interface b[j] + c[j]; barrier; Memory for_all i = 0 to P-1 for j = i0[i] to in[i] d[j] := C*a[j]; P P P 1/20/2009 slide 12 6

Message passing model Programming model Process-level parallelism (private addresses) Communication and coordination via explicit messages for_all i = 0 to P-1 for j = i0[i] to in[i] for_all index i = (j+1) 0 to P-1 mod N; for a[index] j = i0[i] := to b[j] in[i] + c[j]; index if = j (j+1) = in[i] mod then N; a[index] send(a[index], := b[j] + (j+1) c[j]; mod P, a[j]); end_for send(a[index], (j+1) mod P); barrier; end_for barrier; for_all i = 0 to P-1 for_all j k = i0[i] 0 to P-1 to in[i] for j if = i0[k] j = i0[i] to in[k] then recv(tmp,(p+j-1) mod P); mod P, a[j]); d[j] d[j] := := C*tmp;} C * tmp;} end_for 1/20/2009 slide 13 Data parallel systems Programming model Operations performed in parallel on each element of data structure Logically, single thread of control performing sequential or parallel steps Conceptually, a processor associated with each data element Architectural model Array of many simple, cheap processors with little memory each (processors don t sequence through instructions) Attached to a control processor that issues instructions Specialized and general communication, cheap global synchronization Original motivations Matches simple differential equation solvers Centralizes high cost of instruction fetch/sequencing Control processor PE PE PE PE PE PE 1/20/2009 slide 14 PE PE PE 7

Pros and cons of data parallelism Example parallel (i:0->n-1) a[(i+1) mod N] := b[i] + c[i]; parallel (i:0->n-1) d[i] := C * a[i]; Evolution and convergence: Popular when cost savings of centralized sequencer is high Parallelism is limited to specialized regular computations Much of parallelism can be exploited at instruction level Coarser levels of parallelism can be exposed for multiprocessors and message-passing machines New data parallel programming model: SPMD Single-Program Multiple-Data 1/20/2009 slide 15 A generic parallel architecture A generic modern multiprocessor (shared address or message passing architecture) Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus a communication assist Network interface and communication controller Scalable network Convergence allows lots of innovation, now within framework Integration of assist with node, what operations, how efficiently... 1/20/2009 slide 16 8

Summary The course in a nutshell: Efficient mapping of algorithms to models (Chs. 2-3) Design principles of small-scale shared address space machines (Chs. 5-6) Design principles for large-scale message passing and shared-address space machines (Chs. 7, 10) Design principles for memory systems in large-scale multiprocessors (Ch. 8) 1/20/2009 slide 17 9