Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Similar documents
Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Main Memory Supporting Caches

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 5. Memory Technology

CSE 2021: Computer Organization

V. Primary & Secondary Memory!

CSE 2021: Computer Organization

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

COMPUTER ORGANIZATION AND DESIGN

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Memory Hierarchy. Slides contents from:

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Improving Cache Performance

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

COSC3330 Computer Architecture Lecture 19. Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Computer Systems Laboratory Sungkyunkwan University

Improving Cache Performance

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Cache Memory and Performance

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The University of Adelaide, School of Computer Science 13 September 2018

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Slides contents from:

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Cycle Time for Non-pipelined & Pipelined processors

Advanced Memory Organizations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang

Adapted from David Patterson s slides on graduate computer architecture

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Memory Hierarchy Y. K. Malaiya

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

LECTURE 5: MEMORY HIERARCHY DESIGN

Copyright 2012, Elsevier Inc. All rights reserved.

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Design (Appendix B and Chapter 2)

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

Copyright 2012, Elsevier Inc. All rights reserved.

Topic 18: Virtual Memory

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Cache introduction. April 16, Howard Huang 1

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

MEMORY. Objectives. L10 Memory

Memory Hierarchy: Caches, Virtual Memory

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Lecture-14 (Memory Hierarchy) CS422-Spring

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Solutions for Chapter 7 Exercises

Page 1. Memory Hierarchies (Part 2)

Welcome to Part 3: Memory Systems and I/O

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

CPSC 330 Computer Organization

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Introduction to cache memories

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

COMPUTER ORGANIZATION AND DESIGN ARM

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

LECTURE 11. Memory Hierarchy

Memory Hierarchy: The motivation

Transcription:

M6 Memory Hierarchy

Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Events on a Cache Miss

Events on a Cache Miss Stall the pipeline.

Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss:

Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory.

Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access.

Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access. 3.Fill the cache entry: write the data from memory into the cache block, fill the tag field from the address, turn the valid bit on.

Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access. 3.Fill the cache entry: write the data from memory into the cache block, fill the tag field from the address, turn the valid bit on. 4. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache.

Inside a Cache PROCESSOR CACHE MAIN MEMORY

Inside a Cache PROCESSOR CACHE MAIN MEMORY... 0 1 N 1 DATA

Inside a Cache PROCESSOR CACHE MAIN MEMORY... 0 1 N 1 Line (Block) DATA

Inside a Cache PROCESSOR CACHE MAIN MEMORY... 0 1 N 1 Cache RAM TAGS Line (Block) DATA

Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER... 0 1 N 1 Cache RAM TAGS Line (Block) DATA

Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER... 0 1 N 1 Cache RAM TAGS Line (Block) DATA

Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER... 0 1 N 1 Lookup Logic Cache RAM TAGS Line (Block) DATA

Cache Access Time PROCESSOR CACHE MAIN MEMORY

Cache Access Time PROCESSOR CACHE MAIN MEMORY

Cache Access Time Hit Time PROCESSOR CACHE MAIN MEMORY

Cache Access Time PROCESSOR CACHE MAIN MEMORY

Cache Access Time PROCESSOR CACHE MAIN MEMORY

Cache Access Time Miss Penalty PROCESSOR CACHE MAIN MEMORY

Cache Access Time Hit Time, Miss Penalty PROCESSOR CACHE MAIN MEMORY Average Memory Access Time = Hit Time + Miss rate Miss penalty Stall Time

Processor Performance No Cache P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? Assuming no memory data access: P Main Memory Access P Main

Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? Assuming no memory data access: CPI no-cache = 1 + #stall cycles 1 + 500 = 501 P Main Memory Access P Main

Processor + Cache Performance Hit Rate = 0.95

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =?

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles=0.05 500=25

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles=0.05 500=25 CPIwith-cache = 26

Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles=0.05 500=25 CPIwith-cache = 26 Increase in performance = 501/26 = 19.3

Cache Terms cache hit An access where the data is found in the cache.

Cache Terms cache hit An access where the data is found in the cache. cache miss -- an access which isn t

Cache Terms cache hit An access where the data is found in the cache. cache miss -- an access which isn t hit time -- time to access the cache miss penalty -- time to move data from lower level to upper, then to cpu

Cache Terms hit rate -- percentage of cache hits

Cache Terms hit rate -- percentage of cache hits miss rate -- (1 hit rate)

Cache Terms cache block size or cache line size -- the amount of data that gets transferred on a cache miss.

Cache Terms instruction cache -- cache that only holds instructions. data cache -- cache that only caches data.

Cache Performance

Cache Performance

Cache Performance

Cache Performance

Cache Performance

Cache Performance Assume the miss rate of an instruction cache is 2% and the miss rate of the data cache is 4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. Assume the frequency of all loads and stores is 36%. Average memory stall cycles for (a) Instruction Cache (b) Data Cache?

Multilevel Caches Primary cache attached to CPU Small, but fast

Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory

Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L-2 cache misses

Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L-2 cache misses Some high-end systems include L-3 cache

Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns

Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = Effective CPI =

Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = 100ns/0.25ns = 400 cycles Effective CPI = 1 + 0.02 400 = 9

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5%

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty =

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty =

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI =

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI = 1 + 0.02 20 + 0.005 400 = 3.4 Performance ratio =

Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI = 1 + 0.02 20 + 0.005 400 = 3.4 Performance ratio = 9/3.4 = 2.6

Multilevel Cache Considerations Primary cache L-2 cache Results

Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Results

Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results

Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results L-1 cache usually smaller than a single cache L-1 block size smaller than L-2 block size

Memory Hierachy Recap What programmers want: Unlimited amounts of memory with low latency

Memory Hierachy Recap What programmers want: Unlimited amounts of memory with low latency Reality: Memory latency is huge Reality: Fast memory technology is more expensive per bit than slower memory

Memory Hierachy Recap Solution: organize memory system into a hierarchy

Memory Hierachy Recap Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it

Memory Hierarchy Levels Block (aka line): unit of copying

Memory Hierarchy Levels If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses

Memory Hierarchy Levels If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses = 1 hit ratio Then accessed data supplied from upper level

Memory Hierarchy

Memory Hierarchy Disk Storage Disk Storage Main Memory Main Memory L3 C a c h e L3 C a c h e L2 C a c h e L2 C a c h e L1 C a c h e L1 C a c h e PP Register Reference Register Reference L1/L2/L3 Cache Reference L1/L2/L3 Cache Reference Memory Reference Memory Reference Disk Reference Disk Reference

Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page

Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page 1000 1000 B 64 64 KB KB 256KB 256KB 1MB 1MB 4 16 4 16 MB MB 4-16 4-16 GB GB 4-16 4-16 TB TB

Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page 1000 1000 B 64 64 KB KB 256KB 256KB 1MB 1MB 4 16 4 16 MB MB 4-16 4-16 GB GB 4-16 4-16 TB TB 300ps 300ps 1 ns ns 3 -- 10 10 ns ns 10 10 -- 25 25 ns ns 50 50 -- 100 100 ns ns 5 -- 10 10 ms ms

Memory Hierarchy Other Topics Main Memory, DRAM Virtual Memory Non-Volatile Memory Persistent NVM

Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.