Lecture 1: Parallel Architecture Intro

Similar documents
Lecture 1: Introduction

Lecture 1: Introduction

Convergence of Parallel Architecture

Shared Symmetric Memory Systems

Lecture 24: Virtual Memory, Multiprocessors

First, the need for parallel processing and the limitations of uniprocessors are introduced.

Memory Systems in Pipelined Processors

What is a parallel computer?

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiprocessors - Flynn s Taxonomy (1966)

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

Lecture 9: MIMD Architecture

Three basic multiprocessing issues

Multiprocessors & Thread Level Parallelism

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Processor Architecture and Interconnect

Computer parallelism Flynn s categories

Lecture 9: MIMD Architectures

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Computer Systems Architecture

COSC 6385 Computer Architecture - Multi Processor Systems

Computer Systems Architecture

Lecture 24: Memory, VM, Multiproc

Lecture 9: MIMD Architectures

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Limitations of parallel processing

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Uniprocessor Computer Architecture Example: Cray T3E

EE382 Processor Design. Processor Issues for MP

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

CSCI 4717 Computer Architecture

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

ECE 5970 Chip-Level Interconnection Networks Lecture 1: Course Overview

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

WHY PARALLEL PROCESSING? (CE-401)

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Parallel Architecture. Hwansoo Han

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Handout 3 Multiprocessor and thread level parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Operating Systems, Fall Lecture 9, Tiina Niklander 1

Lecture 25: Multiprocessors

COSC4201 Multiprocessors

Parallel Computing Platforms

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Interconnect Routing

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Chapter 18 Parallel Processing

Comp. Org II, Spring

Processor Architecture

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Parallel Processing & Multicore computers

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

Comp. Org II, Spring

Distributed Systems CS /640

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014

Lecture 11: Large Cache Design

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Computer Architecture

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Conventional Computer Architecture. Abstraction

Chapter 5. Multiprocessors and Thread-Level Parallelism

Multi-Processor / Parallel Processing

Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

PCS - Part Two: Multiprocessor Architectures

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)

Why Multiprocessors?

Issues in Multiprocessors

CSC501 Operating Systems Principles. OS Structure

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

CS30002: Operating Systems. Arobinda Gupta Spring 2017

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile?

ECE 669 Parallel Computer Architecture

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Organization. Chapter 16

Transcription:

Lecture 1: Parallel Architecture Intro Course organization: ~13 lectures based on textbook ~10 lectures on recent papers ~5 lectures on parallel algorithms and multi-thread programming New topics: interconnection networks, transactional memories, power efficiency, CMP cache hierarchies Text: Parallel Computer Architecture, Culler, Singh, Gupta Reference texts: Principles and Practices of Interconnection Networks, Dally & Towles Introduction to Parallel Algorithms and Architectures, Leighton 1

More Logistics Sign up for the class mailing list (cs7820) Projects: simulation-based, creative, be prepared to spend time towards end of semester more details on simulators in a couple of weeks Grading: 50% project 10% multi-thread programming assignments 20% paper critiques 20% take-home final 2

Parallel Architecture History A parallel computer is a collection of processing elements that communicate and cooperate to solve large problems fast Up to 80 s: intended for large applications, costing millions of dollars 90 s: every organization owns a cluster or an SMP, costing an order of magnitude less 21 st century: most desktops/workstations will have multiple cores and simultaneous threads on a single die perhaps, motivated by designers inability to scale single core performance 3

Parallel Architecture Trends Source: Mark Hill, Ravi Rajwar 4

CMP/SMT Papers CMP/SMT/Multiprocessor papers in recent conferences: 2001 2002 2003 2004 2005 2006 2007 ISCA: 3 5 8 6 14 17 HPCA: 4 6 7 3 11 13 14 5

Bottomline Expect an increase in multiprocessor papers Performance stagnates unless we learn to transform traditional applications into parallel threads It s all about the data! Data management: distribution, coherence, consistency It s also about the programming model: onus on application writer / compiler / hardware 6

Communication Architecture At first, the hardware for shared memory and message passing were different today, a single parallel machine can efficiently support different programming models CAD Database Scientific modeling Multiprogramming Shared Message address passing Compilation or library OS support Communication hardware Parallel applications Programming models User/System boundary HW/SW boundary 7

Shared Memory Architectures Key differentiating feature: the address space is shared, i.e., any processor can directly address any memory location and access them with load/store instructions Cooperation is similar to a bulletin board a processor writes to a location and that location is visible to reads by other threads 8

Shared Address Space Process P1 Shared Private Process P2 Shared Private Shared Pvt P1 Pvt P2 Pvt P3 Process P3 Shared Physical address space Private Virtual address space of each process 9

Symmetric Multiprocessors (SMP) A collection of processors, a collection of memory: both are connected through some interconnect (usually, the fastest possible) Symmetric because latency for any processor to access any memory is constant uniform memory access (UMA) Proc 1 Proc 2 Proc 3 Proc 4 Mem 1 Mem 2 Mem 3 Mem 4 10

Distributed Memory Multiprocessors Each processor has local memory that is accessible through a fast interconnect The different nodes are connected as I/O devices with (potentially) slower interconnect Local memory access is a lot faster than remote memory non-uniform memory access (NUMA) Advantage: can be built with commodity processors and many applications will perform well thanks to locality Proc 1 Mem 1 Proc 2 Mem 2 Proc 3 Mem 3 Proc 4 Mem 4 11

Message Passing Programming model that can apply to clusters of workstations, SMPs, and even a uniprocessor Sends and receives are used for effecting the data transfer usually, each process ends up making a copy of data that is relevant to it Each process can only name local addresses, other processes, and a tag to help distinguish between multiple messages 12

Topics Programming models Synchronization Snooping based coherence protocols Directory based coherence protocols Consistency models Transactional memories Interconnection networks Papers (caching, speculation, interconnects, power, etc.) Parallel algorithms 13

Title Bullet 14