CSC526: Parallel Processing Fall 2016
|
|
- Stephen Tate
- 5 years ago
- Views:
Transcription
1 CSC526: Parallel Processing Fall 2016 WEEK 5: Caches in Multiprocessor Systems * Addressing * Cache Performance * Writing Policy * Cache Coherence (CC) Problem * Snoopy Bus Protocols PART 1: HARDWARE Dr. Soha S. Zaghloul 1
2 INTRODUCTION Most multiprocessor systems use private caches associated with different processors as depicted in the following figure: Processors P1 P2 P3 Pn Caches C1 C2 C3 Cn Interconnection Network (Bus, crossbar, etc ) Main Memory M1 M2 M3 Mn I/O Channels Disks D1 D2 D3 Dn Dr. Soha S. Zaghloul 2 2
3 ADDRESSING (1) Caches may be addressed in one of two ways: Physical addressing data in the cache are accessed using their physical addresses. Virtual addressing data in the cache are accessed using their virtual addresses. Dr. Soha S. Zaghloul 3 3
4 ADDRESSING (2) PHYSICAL (1) UNIFIED CACHE The following figure depicts the organization of a physical address unified cache: CPU VA PA PA MMU Cache D/I D/I Main Memory The Memory Management Unit (MMU) translates a virtual address into corresponding physical address. A Unified cache contains both data and instructions. A cache hit occurs when the required address is found in the cache. Otherwise, we have a cache miss. After a cache miss, a whole block is loaded from main memory into the cache. Dr. Soha S. Zaghloul 4 4
5 ADDRESSING (3) PHYSICAL (2) SPLIT CACHE The following figure depicts the organization of a physical address split multi-level data cache: MMU PA PA PA VA CPU Data PA Instruction Level-1 D-Cache I-Cache Data PA Level-2 D-Cache Data Instruction Main Memory Level-2 cache has higher capacity than Level-1 cache. For example, 256 KB and 64 KB respectively. At any point of time, Level-1 cache is a subset of Level-2 cache. Usually, the Level-1 cache is put on-chip (ie. with the processor on the same chip). Dr. Soha S. Zaghloul 5 5
6 ADDRESSING (4) VIRTUAL (1) UNIFIED CACHE The following figure depicts the organization of a virtual address unified cache: VA MMU PA CPU VA Cache Main Memory D/I D/I Both cache access and MMU address translation are performed in parallel. However, the PA is not used unless memory access is needed. Dr. Soha S. Zaghloul 6 6
7 ADDRESSING (5) VIRTUAL (2) SPLIT CACHE The following figure depicts the organization of a virtual address split cache: Instruction VA I-Cache Instruction CPU MMU PA Main Memory Data VA D-Cache Data Dr. Soha S. Zaghloul 7 7
8 ADDRESSING (6) PHYSICAL VS. VIRTUAL The following points highlights the pros & cons of both addressing modes: Physical addressing Pros: Cons: No need to perform cache flushing since PA are uniques No aliasing problems: two VAs are mapped to the same PA The slowdown in accessing the cache till the MMU translates the VA into PA Virtual addressing Pros: Cons: Faster access to cache, since MMU translation is performed in parallel with cache access. The aliasing problem Multiple processes may have the same range of VAs This may be solved by flushing the entire cache. However, this may result in a poor performance The drawback of PA may be alleviated if the MMU and the cache are integrated on the same chip as the CPU. Most system designs use PA for (1) its simplicity; (2) it requires less intervention from the OS as compared to the VA. Dr. Soha S. Zaghloul 8 8
9 CACHE PERFORMANCE (1) The performance of a cache is measured by its hit ratio: Number of cache hits Hit Ratio (HR) = Total number of cache access Number of cache misses Miss Ratio (MR) = Total number of cache access Miss Ratio = 1 Hit Ratio For a multi-level cache, the access time (T) to each level should be considered: T caches = HR L1 * T L1 + MR L1 (T L1 + T L2 ) //The average access time for L1-cache To calculate the overall memory system performance, the access time to the main memory (T M ) should also be considered: T overall = HR L1 * T L1 + HR L2 (T L1 + T L2 ) + MR L2 (TL2 + T M ) Dr. Soha S. Zaghloul 9 9
10 CACHE PERFORMANCE (2) NUMERICAL EAMPLE CPU Access time = 0.01 μs Level-1 D-Cache Access time = 0.1 μs Level-2 D-Cache Assume HR L1 = What is the L1-cache performance? T = 0.95* ( ) = μs Dr. Soha S. Zaghloul 10 10
11 WRITING POLICIES (1) PROBLEM DEFINITION (1) SCENARIO (1) CPU = 300 W 0 =150 W 1 W B 0 2 W 3 tag w o =150 w300 1 w 2 w 3 tag tag w o w 1 w 2 w 3 w o w 1 w 2 w W 0 W 1 W 2 W 3 W 0 W 1 W 2 W 3 Main Memory B i B j Dr. Soha S. Zaghloul 11 11
12 WRITING POLICIES (2) PROBLEM DEFINITION (2) SCENARIO (2) I/O MODULE = 300 W 0 =150 W300 1 W B 0 2 W 3 tag w o =150 w 1 w 2 w 3 tag w o w 1 w 2 w 3 tag w o w 1 w 2 w W 0 W 1 W 2 W 3 W 0 W 1 W 2 W 3 Main Memory B i B j Dr. Soha S. Zaghloul 12 12
13 WRITING POLICIES (3) SOLUTION (1) The aim of a writing policy is to keep the data consistent between cache and memory. Two main writing policies are followed in caches design: Write-through Write-back Dr. Soha S. Zaghloul 13 13
14 WRITING POLICIES (4) SOLUTION (2) WRITE THROUGH CPU = 300 = 300 W 0 =150 W300 1 W B 0 2 W 3 W 0 tag w o =150 w300 1 w 2 w 3 W 1 tag w o w 1 w 2 w W 2 3 W tag w o w 1 w 2 w W Every time a word is updated in the cache, it is written through (reflected) 0 in the W main memory W 2 This technique is simple. However, it increases the memory traffic W 3 Main Memory B i B j Dr. Soha S. Zaghloul 14 14
15 WRITING POLICIES (5) SOLUTION (3) WRITE BACK W 0 =150 W300 1 B 0 W 2 W 3 W 0 tag w o =150 w300 1 w 2 w 3 W 1 B tag w o w 1 w 2 w W i 2 When a cache line is updated, a status bit (update bit) 3 is set to 1. W tag w o w 1 w 2 w 3 3 When the cache line is to be replaced, it is copied to the main memory if its update bit is equal -----to W This technique minimizes memory accesses (traffic). W 1 B W j 2 However, some memory locations become invalid W 3 Main Memory In addition, write back imposes that the I/O module accesses the memory through the cache. Dr. Soha S. Zaghloul 15 15
16 CACHE COHERENCE PROBLEM (1) In a multiprocessor system, data inconsistency may occur between a cache and main memory; or amongst local caches of different processors. Multiple caches may have different copies of the same memory block since multiple processors operate asynchronously and independently. Such situation is known as cache coherence problem. Cache coherence problem may be caused by: Data sharing Process migration I/O that bypasses caches (DRAM) Dr. Soha S. Zaghloul 16 16
17 CACHE COHERENCE PROBLEM (2) DATA SHARING Consider the following scenario: Processors P1 P2 P1 P2 P1 P2 Private caches Main Memory is a data shared between both processors. Before update, the three copies of are consistent. P1 updates to. Assuming a write-through policy, the update is immediately reflected onto the main memory. However, the copy in P2 is inconsistent. P1 updates to. Assuming a write-back policy, the update is not immediately reflected onto the main memory. The copy in P2 is also inconsistent. Dr. Soha S. Zaghloul 17 17
18 CACHE COHERENCE PROBLEM (3) PROCESS MIGRATION Consider the following scenario: Processors P1 P2 P1 P2 P1 P2 Private caches Main Memory is a data used by P1. P1 is to be migrated to P2. P2 updates to after migration. Assuming a write-through policy, the update is immediately reflected onto the main memory. However, the copy in P1 is inconsistent. P1 updates to after process migration. Assuming a write-back policy, the update is not immediately reflected onto the main memory. The copy in P2 is also inconsistent. Dr. Soha S. Zaghloul 18 18
19 CACHE COHERENCE PROBLEM (4) I/O Consider the following scenario. Processors P1 P2 Processors P1 P2 Processors P1 P2 Private caches Private caches Private caches Main Memory Input Output I/O Main Input Memory I/O Main Memory I/O Output When the I/O bypasses the cache, a cache coherence problem may occur: When the I/O processor loads a new value of into the main memory, bypassing the cache, the values of in the processor private caches become obsolete. P1 updates to. Write-back caches are used, so the update is not immediately reflected onto the memory. When the memory outputs the value of directly to the I/O bypassing the cache, it outputs an obsolete value. Dr. Soha S. Zaghloul 19 19
20 CACHE COHERENCE PROBLEM (5) SOLUTION Two main approaches are commonly used to solve the cache coherence problem: Snoopy bus protocols Directory-based protocols Dr. Soha S. Zaghloul 20 20
21 SNOOPY BUS PROTOCOLS (1) INTRODUCTION (1) A bus is a convenient Interconnection Network (I/N) topology for ensuring cache coherence. A bus allows all interconnected processors in the system to observe ongoing memory transactions. If a bus transaction threatens the consistent state of local caches, the cache controller can take appropriate actions to invalidate the local copy. Two practices are implemented to maintain the cache coherence: Write-invalidate policy: When a local cache block is updated, all blocks with the same address in remote caches are invalidated. Write-update policy: When a local cache block is updated, the new data block is broadcast to all caches containing a copy of the same block. Snoopy protocols achieve data consistency among the caches and shared memory through a bus watching mechanism. The following figure illustrates the policies mentioned above Dr. Soha S. Zaghloul 21 21
22 SNOOPY BUS PROTOCOLS (2) INTRODUCTION (2) The memory copy is updated. All copies of in the caches are invalidated (I). Invalidated blocks are called dirty, meaning that they should not be used. Main Memory Write- Invalidate I I Bus P1 P2 P3 Caches Processors P1 P2 P3 Initial State The new block contents is broadcast via the bus to all caches and hence updated. With write-through caches, the memory copy is also updated. With write-back caches, the memory is updated later upon block replacement Write Update P1 P2 P3 Dr. Soha S. Zaghloul 22 22
23 SNOOPY BUS PROTOCOLS (3) STATE DIAGRAM A state diagram is used to depict all transactions of the write-invalidate protocol implemented in both write-through and write-back caches: The states in the diagram represent those of the cache block Two processors are denoted: a local processor (i) and a remote processor (j) Six operations may take place in such an environment; namely: Read the cache block in the local processor R(i) Read the cache block in the remote processor R(j) Write (modify) the cache block in the local processor W(i) Write (modify) the cache block in the remote processor W(j) Replacing the cache block in the local processor Z(i) Replacing the cache block in the remote processor Z(j) Dr. Soha S. Zaghloul 23 23
24 SNOOPY BUS PROTOCOLS (4) WRITE-THROUGH CACHES (1) A block belonging to a write-through cache has one of two states: Valid (V) or Invalid (I). A cache block in the invalid state means either it is dirty or unavailable in the processor s cache. R(i) R(j) W(i) Z(j) Let us first consider the Valid state: W(j) Z(i) Local Read R(i): does not affect the status of the local cache block. Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): does not affect the status of the local cache block. Remote Write (modification) W(j): causes the copy of local cache block to be dirty Inv Local Replace Z(i): the cache block is no more available in local processor Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 24 24
25 SNOOPY BUS PROTOCOLS (4) WRITE-THROUGH CACHES (2) R(j) W(j) Z(i) Z(j) R(i) W(j) W(i) Z(i) R(i) R(j) W(i) Z(j) Let us now consider the Invalid state: Local Read R(i): cache miss the block is fetched from memory and becomes valid. Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): refreshes the local cache block Valid. Remote Write (modification) W(j): does not affect the status of the local cache block. Local Replace Z(i): the cache block is still unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 25 25
26 SNOOPY BUS PROTOCOLS (5) WRITE-BACK CACHES (1) The state diagram in write-back caches represents three states; namely, the valid (V), the Read-Only (RO) and the Read-Write (RW). The Invalid state designates that the cache block is either dirty or unavailable in the local cache. RO state: Many caches can contain the RO copies of a block. RW state: Only one processor in the whole system may have a cache block in the RW state. The processor that performs a write is in the RW state. Dr. Soha S. Zaghloul 26 26
27 SNOOPY BUS PROTOCOLS (6) WRITE-BACK CACHES (2) R(i) W(i) Let us first consider the Invalid state: R(j) W(j) Z(i) Z(j) Local Read R(i): refreshes the local cache with a RO copy RO Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): refreshes the local cache block with a RW copy RW Remote Write (modification) W(j): does not affect the status of the local cache block. Local Replace Z(i): the cache block is still unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 27 27
28 SNOOPY BUS PROTOCOLS (7) WRITE-BACK CACHES (3) W(i) R(i) R(j) Z(j) W(i) Let us now consider the RO state: R(i) Z(i) W(j) R(j) W(j) Z(i) Z(j) Local Read R(i): does not change the state of the local cache block Remote Read R(j): does not affect the status of the local cache block. Local Write (modification) W(i): The last processor to write the cache block RW Remote Write (modification) W(j): this makes the local cache block dirty Inv Local Replace Z(i): the cache block becomes dirty Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 28 28
29 SNOOPY BUS PROTOCOLS (8) WRITE-BACK CACHES (4) R(i) W(i) R(i) W(i) R(j) R(j) Z(j) Z(j) W(i) W(j) Z(i) R(i) Z(i) W(j) R(j) W(j) Z(i) Z(j) Finally, let us consider the RW state: Local Read R(i): does not change the state of the local cache block Remote Read R(j): memory is updated (write-back) memory is in RW & cache block RO Local Write (modification) W(i): does not change the state of the local cache block Remote Write (modification) W(j): this makes the local cache block dirty Inv Local Replace Z(i): the cache block becomes unavailable Inv. Remote Replace Z(j): does not affect the status of the local cache block. Dr. Soha S. Zaghloul 29 29
30 FURTHER READINGS Cache/Memory addressing Mapping functions Replacement policies Directory-based protocol Dr. Soha S. Zaghloul 30 30
1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationPerformance metrics for caches
Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationVirtual Memory. Samira Khan Apr 27, 2017
Virtual Memory Samira Khan Apr 27, 27 Virtual Memory Idea: Give the programmer the illusion of a large address space while having a small physical memory So that the programmer does not worry about managing
More informationLecture 21: Virtual Memory. Spring 2018 Jason Tang
Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationCache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O
6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor
More informationThe Cache Write Problem
Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationVirtual Memory, Address Translation
Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationMEMORY. Objectives. L10 Memory
MEMORY Reading: Chapter 6, except cache implementation details (6.4.1-6.4.6) and segmentation (6.5.5) https://en.wikipedia.org/wiki/probability 2 Objectives Understand the concepts and terminology of hierarchical
More informationCMSC 611: Advanced. Distributed & Shared Memory
CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor
More informationMemory hierarchy and cache
Memory hierarchy and cache QUIZ EASY 1). What is used to design Cache? a). SRAM b). DRAM c). Blend of both d). None. 2). What is the Hierarchy of memory? a). Processor, Registers, Cache, Tape, Main memory,
More informationPortland State University ECE 588/688. Cache Coherence Protocols
Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write
More informationCOMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must
COMP9242 Advanced OS S2/2017 W03: Caches: What Every OS Designer Must Know @GernotHeiser Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to
More informationCharacteristics of Mult l ip i ro r ce c ssors r
Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes
More informationCIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015
CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2015 Previous class What is logical address? Who use it? Describes a location in the logical address space Compiler and CPU
More informationCOMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.
Copyright Notice COMP9242 Advanced OS S2/2018 W03: s: What Every OS Designer Must Know @GernotHeiser These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to share
More informationHigh Performance Multiprocessor System
High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationAdvanced OpenMP. Lecture 3: Cache Coherency
Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More information12 Cache-Organization 1
12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationLecture 24: Board Notes: Cache Coherency
Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationLogical Diagram of a Set-associative Cache Accessing a Cache
Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to
More informationMultiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory
Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationIntroduction. Memory Hierarchy
Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationCSE 351. Virtual Memory
CSE 351 Virtual Memory Virtual Memory Very powerful layer of indirection on top of physical memory addressing We never actually use physical addresses when writing programs Every address, pointer, etc
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Cache Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics
More informationEffect of memory latency
CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationChapter Seven. Idea: create powerful computers by connecting many smaller ones
Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationPARALLEL MEMORY ARCHITECTURE
PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last
More informationCIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017
CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More information2-Level Page Tables. Virtual Address Space: 2 32 bytes. Offset or Displacement field in VA: 12 bits
-Level Page Tables Virtual Address (VA): bits Offset or Displacement field in VA: bits Virtual Address Space: bytes Page Size: bytes = KB Virtual Page Number field in VA: - = bits Number of Virtual Pages:
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100 Consistency in SMs CU-1 CU-2 A 200 Cache-1
More informationCSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura
CSE 451: Operating Systems Winter 2013 Page Table Management, TLBs and Other Pragmatics Gary Kimura Moving now from Hardware to how the OS manages memory Two main areas to discuss Page table management,
More informationP6/Linux Memory System Nov 11, 2009"
P6/Linux Memory System Nov 11, 2009" REMEMBER" 2! 3! Intel P6" P6 Memory System" DRAM" external system bus (e.g. PCI)" L2" cache" cache bus! bus interface unit" inst" TLB" instruction" fetch unit" L1"
More informationDr e v prasad Dt
Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationVirtual Memory. Computer Systems Principles
Virtual Memory Computer Systems Principles Objectives Virtual Memory What is it? How does it work? Virtual Memory Address Translation /7/25 CMPSCI 23 - Computer Systems Principles 2 Problem Lots of executing
More informationAddress Translation. Tore Larsen Material developed by: Kai Li, Princeton University
Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationReview: Computer Organization
Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor
More informationChapter 18. Parallel Processing. Yonsei University
Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types
More information(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)
+ (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory
More informationMultiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory
Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationCaches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first
Cache Memory memory hierarchy CPU memory request presented to first-level cache first if data NOT in cache, request sent to next level in hierarchy and so on CS3021/3421 2017 jones@tcd.ie School of Computer
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationMul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014
Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor
More informationCache Memory. Content
Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationMulticore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh
Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth
More informationPage Which had internal designation P5
Intel P6 Internal Designation for Successor to Pentium Which had internal designation P5 Fundamentally Different from Pentium 1 Out-of-order, superscalar operation Designed to handle server applications
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationVirtual Memory. Kevin Webb Swarthmore College March 8, 2018
irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationLecture 11: Large Cache Design
Lecture 11: Large Cache Design Topics: large cache basics and An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al., ASPLOS 02 Distance Associativity for High-Performance
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationPortland State University ECE 588/688. Directory-Based Cache Coherence Protocols
Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationECSE 425 Lecture 30: Directory Coherence
ECSE 425 Lecture 30: Directory Coherence H&P Chapter 4 Last Time Snoopy Coherence Symmetric SMP Performance 2 Today Directory- based Coherence 3 A Scalable Approach: Directories One directory entry for
More information