CSC526: Parallel Processing Fall 2016

Size: px
Start display at page:

Download "CSC526: Parallel Processing Fall 2016"

Transcription

1 CSC526: Parallel Processing Fall 2016 WEEK 5: Caches in Multiprocessor Systems * Addressing * Cache Performance * Writing Policy * Cache Coherence (CC) Problem * Snoopy Bus Protocols PART 1: HARDWARE Dr. Soha S. Zaghloul 1

2 INTRODUCTION Most multiprocessor systems use private caches associated with different processors as depicted in the following figure: Processors P1 P2 P3 Pn Caches C1 C2 C3 Cn Interconnection Network (Bus, crossbar, etc ) Main Memory M1 M2 M3 Mn I/O Channels Disks D1 D2 D3 Dn Dr. Soha S. Zaghloul 2 2

3 ADDRESSING (1) Caches may be addressed in one of two ways: Physical addressing data in the cache are accessed using their physical addresses. Virtual addressing data in the cache are accessed using their virtual addresses. Dr. Soha S. Zaghloul 3 3

4 ADDRESSING (2) PHYSICAL (1) UNIFIED CACHE The following figure depicts the organization of a physical address unified cache: CPU VA PA PA MMU Cache D/I D/I Main Memory The Memory Management Unit (MMU) translates a virtual address into corresponding physical address. A Unified cache contains both data and instructions. A cache hit occurs when the required address is found in the cache. Otherwise, we have a cache miss. After a cache miss, a whole block is loaded from main memory into the cache. Dr. Soha S. Zaghloul 4 4

5 ADDRESSING (3) PHYSICAL (2) SPLIT CACHE The following figure depicts the organization of a physical address split multi-level data cache: MMU PA PA PA VA CPU Data PA Instruction Level-1 D-Cache I-Cache Data PA Level-2 D-Cache Data Instruction Main Memory Level-2 cache has higher capacity than Level-1 cache. For example, 256 KB and 64 KB respectively. At any point of time, Level-1 cache is a subset of Level-2 cache. Usually, the Level-1 cache is put on-chip (ie. with the processor on the same chip). Dr. Soha S. Zaghloul 5 5

6 ADDRESSING (4) VIRTUAL (1) UNIFIED CACHE The following figure depicts the organization of a virtual address unified cache: VA MMU PA CPU VA Cache Main Memory D/I D/I Both cache access and MMU address translation are performed in parallel. However, the PA is not used unless memory access is needed. Dr. Soha S. Zaghloul 6 6

7 ADDRESSING (5) VIRTUAL (2) SPLIT CACHE The following figure depicts the organization of a virtual address split cache: Instruction VA I-Cache Instruction CPU MMU PA Main Memory Data VA D-Cache Data Dr. Soha S. Zaghloul 7 7

8 ADDRESSING (6) PHYSICAL VS. VIRTUAL The following points highlights the pros & cons of both addressing modes: Physical addressing Pros: Cons: No need to perform cache flushing since PA are uniques No aliasing problems: two VAs are mapped to the same PA The slowdown in accessing the cache till the MMU translates the VA into PA Virtual addressing Pros: Cons: Faster access to cache, since MMU translation is performed in parallel with cache access. The aliasing problem Multiple processes may have the same range of VAs This may be solved by flushing the entire cache. However, this may result in a poor performance The drawback of PA may be alleviated if the MMU and the cache are integrated on the same chip as the CPU. Most system designs use PA for (1) its simplicity; (2) it requires less intervention from the OS as compared to the VA. Dr. Soha S. Zaghloul 8 8

9 CACHE PERFORMANCE (1) The performance of a cache is measured by its hit ratio: Number of cache hits Hit Ratio (HR) = Total number of cache access Number of cache misses Miss Ratio (MR) = Total number of cache access Miss Ratio = 1 Hit Ratio For a multi-level cache, the access time (T) to each level should be considered: T caches = HR L1 * T L1 + MR L1 (T L1 + T L2 ) //The average access time for L1-cache To calculate the overall memory system performance, the access time to the main memory (T M ) should also be considered: T overall = HR L1 * T L1 + HR L2 (T L1 + T L2 ) + MR L2 (TL2 + T M ) Dr. Soha S. Zaghloul 9 9

10 CACHE PERFORMANCE (2) NUMERICAL EAMPLE CPU Access time = 0.01 μs Level-1 D-Cache Access time = 0.1 μs Level-2 D-Cache Assume HR L1 = What is the L1-cache performance? T = 0.95* ( ) = μs Dr. Soha S. Zaghloul 10 10

11 WRITING POLICIES (1) PROBLEM DEFINITION (1) SCENARIO (1) CPU = 300 W 0 =150 W 1 W B 0 2 W 3 tag w o =150 w300 1 w 2 w 3 tag tag w o w 1 w 2 w 3 w o w 1 w 2 w W 0 W 1 W 2 W 3 W 0 W 1 W 2 W 3 Main Memory B i B j Dr. Soha S. Zaghloul 11 11

12 WRITING POLICIES (2) PROBLEM DEFINITION (2) SCENARIO (2) I/O MODULE = 300 W 0 =150 W300 1 W B 0 2 W 3 tag w o =150 w 1 w 2 w 3 tag w o w 1 w 2 w 3 tag w o w 1 w 2 w W 0 W 1 W 2 W 3 W 0 W 1 W 2 W 3 Main Memory B i B j Dr. Soha S. Zaghloul 12 12

13 WRITING POLICIES (3) SOLUTION (1) The aim of a writing policy is to keep the data consistent between cache and memory. Two main writing policies are followed in caches design: Write-through Write-back Dr. Soha S. Zaghloul 13 13

14 WRITING POLICIES (4) SOLUTION (2) WRITE THROUGH CPU = 300 = 300 W 0 =150 W300 1 W B 0 2 W 3 W 0 tag w o =150 w300 1 w 2 w 3 W 1 tag w o w 1 w 2 w W 2 3 W tag w o w 1 w 2 w W Every time a word is updated in the cache, it is written through (reflected) 0 in the W main memory W 2 This technique is simple. However, it increases the memory traffic W 3 Main Memory B i B j Dr. Soha S. Zaghloul 14 14

15 WRITING POLICIES (5) SOLUTION (3) WRITE BACK W 0 =150 W300 1 B 0 W 2 W 3 W 0 tag w o =150 w300 1 w 2 w 3 W 1 B tag w o w 1 w 2 w W i 2 When a cache line is updated, a status bit (update bit) 3 is set to 1. W tag w o w 1 w 2 w 3 3 When the cache line is to be replaced, it is copied to the main memory if its update bit is equal -----to W This technique minimizes memory accesses (traffic). W 1 B W j 2 However, some memory locations become invalid W 3 Main Memory In addition, write back imposes that the I/O module accesses the memory through the cache. Dr. Soha S. Zaghloul 15 15

16 CACHE COHERENCE PROBLEM (1) In a multiprocessor system, data inconsistency may occur between a cache and main memory; or amongst local caches of different processors. Multiple caches may have different copies of the same memory block since multiple processors operate asynchronously and independently. Such situation is known as cache coherence problem. Cache coherence problem may be caused by: Data sharing Process migration I/O that bypasses caches (DRAM) Dr. Soha S. Zaghloul 16 16

17 CACHE COHERENCE PROBLEM (2) DATA SHARING Consider the following scenario: Processors P1 P2 P1 P2 P1 P2 Private caches Main Memory is a data shared between both processors. Before update, the three copies of are consistent. P1 updates to. Assuming a write-through policy, the update is immediately reflected onto the main memory. However, the copy in P2 is inconsistent. P1 updates to. Assuming a write-back policy, the update is not immediately reflected onto the main memory. The copy in P2 is also inconsistent. Dr. Soha S. Zaghloul 17 17

18 CACHE COHERENCE PROBLEM (3) PROCESS MIGRATION Consider the following scenario: Processors P1 P2 P1 P2 P1 P2 Private caches Main Memory is a data used by P1. P1 is to be migrated to P2. P2 updates to after migration. Assuming a write-through policy, the update is immediately reflected onto the main memory. However, the copy in P1 is inconsistent. P1 updates to after process migration. Assuming a write-back policy, the update is not immediately reflected onto the main memory. The copy in P2 is also inconsistent. Dr. Soha S. Zaghloul 18 18

19 CACHE COHERENCE PROBLEM (4) I/O Consider the following scenario. Processors P1 P2 Processors P1 P2 Processors P1 P2 Private caches Private caches Private caches Main Memory Input Output I/O Main Input Memory I/O Main Memory I/O Output When the I/O bypasses the cache, a cache coherence problem may occur: When the I/O processor loads a new value of into the main memory, bypassing the cache, the values of in the processor private caches become obsolete. P1 updates to. Write-back caches are used, so the update is not immediately reflected onto the memory. When the memory outputs the value of directly to the I/O bypassing the cache, it outputs an obsolete value. Dr. Soha S. Zaghloul 19 19

20 CACHE COHERENCE PROBLEM (5) SOLUTION Two main approaches are commonly used to solve the cache coherence problem: Snoopy bus protocols Directory-based protocols Dr. Soha S. Zaghloul 20 20

21 SNOOPY BUS PROTOCOLS (1) INTRODUCTION (1) A bus is a convenient Interconnection Network (I/N) topology for ensuring cache coherence. A bus allows all interconnected processors in the system to observe ongoing memory transactions. If a bus transaction threatens the consistent state of local caches, the cache controller can take appropriate actions to invalidate the local copy. Two practices are implemented to maintain the cache coherence: Write-invalidate policy: When a local cache block is updated, all blocks with the same address in remote caches are invalidated. Write-update policy: When a local cache block is updated, the new data block is broadcast to all caches containing a copy of the same block. Snoopy protocols achieve data consistency among the caches and shared memory through a bus watching mechanism. The following figure illustrates the policies mentioned above Dr. Soha S. Zaghloul 21 21

22 SNOOPY BUS PROTOCOLS (2) INTRODUCTION (2) The memory copy is updated. All copies of in the caches are invalidated (I). Invalidated blocks are called dirty, meaning that they should not be used. Main Memory Write- Invalidate I I Bus P1 P2 P3 Caches Processors P1 P2 P3 Initial State The new block contents is broadcast via the bus to all caches and hence updated. With write-through caches, the memory copy is also updated. With write-back caches, the memory is updated later upon block replacement Write Update P1 P2 P3 Dr. Soha S. Zaghloul 22 22

23 SNOOPY BUS PROTOCOLS (3) STATE DIAGRAM A state diagram is used to depict all transactions of the write-invalidate protocol implemented in both write-through and write-back caches: The states in the diagram represent those of the cache block Two processors are denoted: a local processor (i) and a remote processor (j) Six operations may take place in such an environment; namely: Read the cache block in the local processor R(i) Read the cache block in the remote processor R(j) Write (modify) the cache block in the local processor W(i) Write (modify) the cache block in the remote processor W(j) Replacing the cache block in the local processor Z(i) Replacing the cache block in the remote processor Z(j) Dr. Soha S. Zaghloul 23 23

24 SNOOPY BUS PROTOCOLS (4) WRITE-THROUGH CACHES (1) A block belonging to a write-through cache has one of two states: Valid (V) or Invalid (I). A cache block in the invalid state means either it is dirty or unavailable in the processor s cache. R(i) R(j) W(i) Z(j) Let us first consider the Valid state: W(j) Z(i) Local Read R(i): does not affect the status of the local cache block. Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): does not affect the status of the local cache block. Remote Write (modification) W(j): causes the copy of local cache block to be dirty Inv Local Replace Z(i): the cache block is no more available in local processor Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 24 24

25 SNOOPY BUS PROTOCOLS (4) WRITE-THROUGH CACHES (2) R(j) W(j) Z(i) Z(j) R(i) W(j) W(i) Z(i) R(i) R(j) W(i) Z(j) Let us now consider the Invalid state: Local Read R(i): cache miss the block is fetched from memory and becomes valid. Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): refreshes the local cache block Valid. Remote Write (modification) W(j): does not affect the status of the local cache block. Local Replace Z(i): the cache block is still unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 25 25

26 SNOOPY BUS PROTOCOLS (5) WRITE-BACK CACHES (1) The state diagram in write-back caches represents three states; namely, the valid (V), the Read-Only (RO) and the Read-Write (RW). The Invalid state designates that the cache block is either dirty or unavailable in the local cache. RO state: Many caches can contain the RO copies of a block. RW state: Only one processor in the whole system may have a cache block in the RW state. The processor that performs a write is in the RW state. Dr. Soha S. Zaghloul 26 26

27 SNOOPY BUS PROTOCOLS (6) WRITE-BACK CACHES (2) R(i) W(i) Let us first consider the Invalid state: R(j) W(j) Z(i) Z(j) Local Read R(i): refreshes the local cache with a RO copy RO Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): refreshes the local cache block with a RW copy RW Remote Write (modification) W(j): does not affect the status of the local cache block. Local Replace Z(i): the cache block is still unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 27 27

28 SNOOPY BUS PROTOCOLS (7) WRITE-BACK CACHES (3) W(i) R(i) R(j) Z(j) W(i) Let us now consider the RO state: R(i) Z(i) W(j) R(j) W(j) Z(i) Z(j) Local Read R(i): does not change the state of the local cache block Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): The last processor to write the cache block RW Remote Write (modification) W(j): this makes the local cache block dirty Inv Local Replace Z(i): the cache block becomes dirty Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 28 28

29 SNOOPY BUS PROTOCOLS (8) WRITE-BACK CACHES (4) R(i) W(i) R(i) W(i) R(j) R(j) Z(j) Z(j) W(i) W(j) Z(i) R(i) Z(i) W(j) R(j) W(j) Z(i) Z(j) Finally, let us consider the RW state: Local Read R(i): does not change the state of the local cache block Remote Read R(j): memory is updated (write-back) memory is in RW & cache block RO Local Write (modification) W(i): does not change the state of the local cache block Remote Write (modification) W(j): this makes the local cache block dirty Inv Local Replace Z(i): the cache block becomes unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 29 29

30 FURTHER READINGS Cache/Memory addressing Mapping functions Replacement policies Directory-based protocol Dr. Soha S. Zaghloul 30 30

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In

More information

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Cache Coherence in Bus-Based Shared Memory Multiprocessors Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Performance metrics for caches

Performance metrics for caches Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:

More information

Page 1. Cache Coherence

Page 1. Cache Coherence Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale

More information

Virtual Memory. Samira Khan Apr 27, 2017

Virtual Memory. Samira Khan Apr 27, 2017 Virtual Memory Samira Khan Apr 27, 27 Virtual Memory Idea: Give the programmer the illusion of a large address space while having a small physical memory So that the programmer does not worry about managing

More information

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Lecture 21: Virtual Memory. Spring 2018 Jason Tang Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O 6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor

More information

The Cache Write Problem

The Cache Write Problem Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

MEMORY. Objectives. L10 Memory

MEMORY. Objectives. L10 Memory MEMORY Reading: Chapter 6, except cache implementation details (6.4.1-6.4.6) and segmentation (6.5.5) https://en.wikipedia.org/wiki/probability 2 Objectives Understand the concepts and terminology of hierarchical

More information

CMSC 611: Advanced. Distributed & Shared Memory

CMSC 611: Advanced. Distributed & Shared Memory CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor

More information

Memory hierarchy and cache

Memory hierarchy and cache Memory hierarchy and cache QUIZ EASY 1). What is used to design Cache? a). SRAM b). DRAM c). Blend of both d). None. 2). What is the Hierarchy of memory? a). Processor, Registers, Cache, Tape, Main memory,

More information

Portland State University ECE 588/688. Cache Coherence Protocols

Portland State University ECE 588/688. Cache Coherence Protocols Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write

More information

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must COMP9242 Advanced OS S2/2017 W03: Caches: What Every OS Designer Must Know @GernotHeiser Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2015 Previous class What is logical address? Who use it? Describes a location in the logical address space Compiler and CPU

More information

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3. Copyright Notice COMP9242 Advanced OS S2/2018 W03: s: What Every OS Designer Must Know @GernotHeiser These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to share

More information

High Performance Multiprocessor System

High Performance Multiprocessor System High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

Advanced OpenMP. Lecture 3: Cache Coherency

Advanced OpenMP. Lecture 3: Cache Coherency Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple

More information

12 Cache-Organization 1

12 Cache-Organization 1 12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Lecture 24: Board Notes: Cache Coherency

Lecture 24: Board Notes: Cache Coherency Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will

More information

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy

More information

Logical Diagram of a Set-associative Cache Accessing a Cache

Logical Diagram of a Set-associative Cache Accessing a Cache Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to

More information

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

Introduction. Memory Hierarchy

Introduction. Memory Hierarchy Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

CSE 351. Virtual Memory

CSE 351. Virtual Memory CSE 351 Virtual Memory Virtual Memory Very powerful layer of indirection on top of physical memory addressing We never actually use physical addresses when writing programs Every address, pointer, etc

More information

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics

More information

Effect of memory latency

Effect of memory latency CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

PARALLEL MEMORY ARCHITECTURE

PARALLEL MEMORY ARCHITECTURE PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

2-Level Page Tables. Virtual Address Space: 2 32 bytes. Offset or Displacement field in VA: 12 bits

2-Level Page Tables. Virtual Address Space: 2 32 bytes. Offset or Displacement field in VA: 12 bits -Level Page Tables Virtual Address (VA): bits Offset or Displacement field in VA: bits Virtual Address Space: bytes Page Size: bytes = KB Virtual Page Number field in VA: - = bits Number of Virtual Pages:

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100 Consistency in SMs CU-1 CU-2 A 200 Cache-1

More information

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura CSE 451: Operating Systems Winter 2013 Page Table Management, TLBs and Other Pragmatics Gary Kimura Moving now from Hardware to how the OS manages memory Two main areas to discuss Page table management,

More information

P6/Linux Memory System Nov 11, 2009"

P6/Linux Memory System Nov 11, 2009 P6/Linux Memory System Nov 11, 2009" REMEMBER" 2! 3! Intel P6" P6 Memory System" DRAM" external system bus (e.g. PCI)" L2" cache" cache bus! bus interface unit" inst" TLB" instruction" fetch unit" L1"

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Virtual Memory. Computer Systems Principles

Virtual Memory. Computer Systems Principles Virtual Memory Computer Systems Principles Objectives Virtual Memory What is it? How does it work? Virtual Memory Address Translation /7/25 CMPSCI 23 - Computer Systems Principles 2 Problem Lots of executing

More information

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Review: Computer Organization

Review: Computer Organization Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory

More information

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first Cache Memory memory hierarchy CPU memory request presented to first-level cache first if data NOT in cache, request sent to next level in hierarchy and so on CS3021/3421 2017 jones@tcd.ie School of Computer

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor

More information

Cache Memory. Content

Cache Memory. Content Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth

More information

Page Which had internal designation P5

Page Which had internal designation P5 Intel P6 Internal Designation for Successor to Pentium Which had internal designation P5 Fundamentally Different from Pentium 1 Out-of-order, superscalar operation Designed to handle server applications

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018 irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Shared Memory Architectures. Approaches to Building Parallel Machines

Shared Memory Architectures. Approaches to Building Parallel Machines Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache

More information

Scalable Cache Coherence

Scalable Cache Coherence Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient

More information

Lecture 11: Large Cache Design

Lecture 11: Large Cache Design Lecture 11: Large Cache Design Topics: large cache basics and An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al., ASPLOS 02 Distance Associativity for High-Performance

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

ECSE 425 Lecture 30: Directory Coherence

ECSE 425 Lecture 30: Directory Coherence ECSE 425 Lecture 30: Directory Coherence H&P Chapter 4 Last Time Snoopy Coherence Symmetric SMP Performance 2 Today Directory- based Coherence 3 A Scalable Approach: Directories One directory entry for

More information