EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Size: px
Start display at page:

Download "EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors"

Transcription

1 EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors Professor Sherief Reda Electrical Sciences and Computer Engineering School of Engineering Brown University Spring 2011 S. Reda EN164 Sp 11 [ material from Patterson & Hennessy, 4 th ed and Harris 1 st ed ] 1

2 Parallel computing Need for improved performance for high-performance computing (HPC) applications (e.g., weather simulations, protein folding, etc) and planetary-scale web services (e.g., Google search, YouTube, etc) Hope to speed-up code by a factor equal to the number of cores or processors Difficulties Partitioning Coordination Communications overhead S. Reda EN164 Sp 11 2

3 Factorial program example 10! 10x9x8x7x6 5x4x3x2x1.x. Need to partition algorithm and re-code to leverage more than one core processor Need to communicate data Need to co-ordinate and synchronize S. Reda EN164 Sp 11 3

4 Multi-core processors L1 D & I caches L2 cache Moore s Law + ILP Wall + Power Wall -> Multi-core processors Cores with private L1 and L2 caches and share L3 cache and bus to DRAM S. Reda EN164 Sp 11 4

5 Multi-processor systems two processor slots Processors share DRAM Memory controller (north bridge) arbitrates memory requests S. Reda EN164 Sp 11 Servers do not share memory Servers are connected by network 5

6 1. Communication using shared memory SMP: shared memory multiprocessor Hardware provides single physical address space for all processors Synchronize shared variables using locks Memory access time UMA (uniform) vs. NUMA (nonuniform) S. Reda EN164 Sp 11 6

7 Cache coherence problem Suppose two CPU cores share a physical address space Time step Write-through caches Event CPU As cache CPU Bs cache Memory CPU A reads X CPU B reads X CPU A writes 1 to X S. Reda EN164 Sp 11 7

8 Defining cache coherency Informally: Reads return most recently written value Formally: P writes X; P reads X (no intervening writes) read returns written value P 1 writes X; P 2 reads X (sufficiently later) read returns written value c.f. CPU B reading X after step 3 in example P 1 writes X, P 2 writes X all processors see writes in the same order End up with the same final value for X S. Reda EN164 Sp 11 8

9 Maintaining cache coherence with snooping protocol Cache gets exclusive access to a block when it is to be written Broadcasts an invalidate message on the bus Subsequent read in another cache misses Owning cache supplies updated value CPU activity Bus activity CPU As cache CPU Bs cache Memory CPU A reads X Cache miss for X 0 0 CPU B reads X Cache miss for X CPU A writes 1 to X Invalidate for X 1 0 CPU B read X Cache miss for X S. Reda EN164 Sp 11 9

10 False shares Block size plays an important role in cache coherency Large blocks can cause false sharing where two unrelated shared variables are located in the same cache block, the full block is exchanged between processors even though the processor are accessing different variables False sharing could be eliminated by carefully writing the code and compilation S. Reda EN164 Sp 11 10

11 2. Communication using message passing Each processor has private physical address space Hardware sends/receives messages between cores/ processors using on-chip and/or off-chip More scalable than shared memories but harder to program S. Reda EN164 Sp 11 11

12 Interconnection networks (on-chip between cores or off-chip between processors or servers) Network topologies Arrangements of processors, switches, and links Bus Ring N-cube (N = 3) 2D Mesh Fully connected S. Reda EN164 Sp 11 12

13 3. Coordination using synchronization primitives Two processors sharing an area of memory P1 writes, then P2 reads Data race if P1 and P2 don t synchronize Result depends of order of accesses Example: two threads want to increment a global variable Mutual exclusion (i.e., mutex) algorithms are used in parallel programming to prevent the simultaneous use of a common resource, such as a global variable, by code called critical sections. S. Reda EN164 Sp 11 13

14 Hardware support for mutex Hardware support required Atomic read/write memory operation No other access to the location allowed between the read and write Could be a single instruction E.g., atomic swap of register memory Or an atomic pair of instructions S. Reda EN164 Sp 11 14

15 Synchronization in MIPS Load linked: ll rt, offset(rs) Store conditional: sc rt, offset(rs) Succeeds if location not changed since the ll Returns 1 in rt Fails if location is changed Returns 0 in rt Example: atomic swap (to test/set lock variable) try: add $t0,$zero,$s4 ;copy exchange value ll $t1,0($s1) ;load linked bne $t1,$0, try sc $t0,0($s1) ;store conditional beq $t0,$zero,try ;branch store fails add $s4,$zero,$t1 ;put load value in $s4 S. Reda EN164 Sp 11 15

16 Summary of HW design issues for parallel processing 1. Need HW support for memory coherence 2. Need HW support for synchronization 3. Need HW support for communication (on-chip and off-chip) S. Reda EN164 Sp 11 16

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 7.2: Multicore TLP (1) Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson & Hennessy,

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

M4 Parallelism. Implementation of Locks Cache Coherence

M4 Parallelism. Implementation of Locks Cache Coherence M4 Parallelism Implementation of Locks Cache Coherence Outline Parallelism Flynn s classification Vector Processing Subword Parallelism Symmetric Multiprocessors, Distributed Memory Machines Shared Memory

More information

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)

More information

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Synchronization Material in this lecture in Henessey and Patterson, Chapter 8 pgs. 694-708 Some material from David Patterson s slides for CS 252 at Berkeley 1 Multiprogramming and Multiprocessing

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory

More information

EE382 Processor Design. Processor Issues for MP

EE382 Processor Design. Processor Issues for MP EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Page 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology

Page 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology CS252 Graduate Computer Architecture Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency Review: Multiprocessor Basic issues and terminology Communication:

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains: The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share

More information

Cache Coherence and Atomic Operations in Hardware

Cache Coherence and Atomic Operations in Hardware Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Page 1. Cache Coherence

Page 1. Cache Coherence Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

PARALLEL MEMORY ARCHITECTURE

PARALLEL MEMORY ARCHITECTURE PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Chapter 5: Thread-Level Parallelism Part 1

Chapter 5: Thread-Level Parallelism Part 1 Chapter 5: Thread-Level Parallelism Part 1 Introduction What is a parallel or multiprocessor system? Why parallel architecture? Performance potential Flynn classification Communication models Architectures

More information

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions Representing Instructions Instructions are encoded in binary Called machine code MIPS instructions Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

EEC 581 Computer Architecture. Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6)

EEC 581 Computer Architecture. Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6) EEC 581 Computer rchitecture Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6) Chansu Yu Electrical and Computer Engineering Cleveland State University cknowledgement Part of class notes

More information

CS5460: Operating Systems

CS5460: Operating Systems CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

SHARED-MEMORY COMMUNICATION

SHARED-MEMORY COMMUNICATION SHARED-MEMORY COMMUNICATION IMPLICITELY VIA MEMORY PROCESSORS SHARE SOME MEMORY COMMUNICATION IS IMPLICIT THROUGH LOADS AND STORES NEED TO SYNCHRONIZE NEED TO KNOW HOW THE HARDWARE INTERLEAVES ACCESSES

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Lecture 13. Shared memory: Architecture and programming

Lecture 13. Shared memory: Architecture and programming Lecture 13 Shared memory: Architecture and programming Announcements Special guest lecture on Parallel Programming Language Uniform Parallel C Thursday 11/2, 2:00 to 3:20 PM EBU3B 1202 See www.cse.ucsd.edu/classes/fa06/cse260/lectures/lec13

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-5 Code Translation-3 Heap, Storage options, Addressing modes, Concurrent data access, Linking & loading Shakil M. Khan (adapted from Prof. Roumani) So Far Registers

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Chapter 5. Thread-Level Parallelism

Chapter 5. Thread-Level Parallelism Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Synchronization. P&H Chapter 2.11 and 5.8. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Synchronization. P&H Chapter 2.11 and 5.8. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Synchronization P&H Chapter 2.11 and 5.8 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Parallelism and Synchronization How do I take advantage of multiple processors;

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

The Cache-Coherence Problem

The Cache-Coherence Problem The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 4 Multiprocessors and Thread-Level Parallelism --II CA Lecture08 - multiprocessors and TLP (cwliu@twins.ee.nctu.edu.tw) 09-1 Review Caches contain all information on

More information

EN164: Design of Computing Systems Lecture 09: Processor / ISA 2

EN164: Design of Computing Systems Lecture 09: Processor / ISA 2 EN164: Design of Computing Systems Lecture 09: Processor / ISA 2 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Chapter-4 Multiprocessors and Thread-Level Parallelism

Chapter-4 Multiprocessors and Thread-Level Parallelism Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns

More information

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Branch to a labeled instruction if a condition is true Otherwise, continue sequentially beq rs, rt, L1 if (rs == rt) branch to

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Shared Memory. SMP Architectures and Programming

Shared Memory. SMP Architectures and Programming Shared Memory SMP Architectures and Programming 1 Why work with shared memory parallel programming? Speed Ease of use CLUMPS Good starting point 2 Shared Memory Processes or threads share memory No explicit

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152!

More information

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review

More information

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections ) Lecture: Coherence, Synchronization Topics: directory-based coherence, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory) keeps track

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

EN164: Design of Computing Systems Lecture 11: Processor / ISA 4

EN164: Design of Computing Systems Lecture 11: Processor / ISA 4 EN164: Design of Computing Systems Lecture 11: Processor / ISA 4 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency

More information

Intro to Multiprocessors

Intro to Multiprocessors The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple

More information

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex

More information

Control Instructions

Control Instructions Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set

More information

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O 6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

ECE Sample Final Examination

ECE Sample Final Examination ECE 3056 Sample Final Examination 1 Overview The following applies to all problems unless otherwise explicitly stated. Consider a 2 GHz MIPS processor with a canonical 5-stage pipeline and 32 general-purpose

More information

Parallelism, Multicore, and Synchronization

Parallelism, Multicore, and Synchronization Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

CS 61C: Great Ideas in Computer Architecture. Synchronization, OpenMP

CS 61C: Great Ideas in Computer Architecture. Synchronization, OpenMP CS 61C: Great Ideas in Computer Architecture Synchronization, OpenMP Guest Lecturer: Justin Hsia 3/15/2013 Spring 2013 Lecture #22 1 Review of Last Lecture Multiprocessor systems uses shared memory (single

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory

More information

Operating Systems, Fall Lecture 9, Tiina Niklander 1

Operating Systems, Fall Lecture 9, Tiina Niklander 1 Multiprocessor Systems Multiple processor systems Ch 8.1 8.3 1 Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor,

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

CS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency

CS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency CS 61C: Great Ideas in Computer Architecture TLP and Cache Coherency Instructor: Bernhard Boser and Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa16 11/10/16 Fall 2016 -- Lecture #20 1 New-School

More information