Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Similar documents
Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

COSC 6385 Computer Architecture - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Modern Computer Architecture

Advanced Computer Architecture

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

CS152 Computer Architecture and Engineering Lecture 17: Cache System

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CPE 631 Lecture 04: CPU Caches

Course Administration

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Handout 4 Memory Hierarchy

EE 4683/5683: COMPUTER ARCHITECTURE

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CSC Memory System. A. A Hierarchy and Driving Forces

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy

14:332:331. Week 13 Basics of Cache

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Computer Architecture Spring 2016

Question?! Processor comparison!

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

ECE ECE4680

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

ארכיטקטורת יחידת עיבוד מרכזי ת

14:332:331. Week 13 Basics of Cache

Memory Hierarchy: The motivation

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Page 1. Memory Hierarchies (Part 2)

Memory hierarchy review. ECE 154B Dmitri Strukov

Page 1. Multilevel Memories (Improving performance using a little cash )

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

Memory Hierarchy: Motivation

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Lecture 11. Virtual Memory Review: Memory Hierarchy

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

ECE468 Computer Organization and Architecture. Virtual Memory

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

ECE4680 Computer Organization and Architecture. Virtual Memory

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

Recap: Machine Organization

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CSE 502 Graduate Computer Architecture. Lec 5-6 Memory Hierarchy Review

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Memory Technologies. Technology Trends

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

The Memory Hierarchy & Cache

CSE 502 Graduate Computer Architecture. Lec 7-10 App B: Memory Hierarchy Review

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 8. Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS3350B Computer Architecture

Memory Hierarchy Review

Aleksandar Milenkovich 1

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Memory Hierarchy. Slides contents from:

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

CS654 Advanced Computer Architecture. Lec 2 - Introduction

CS61C : Machine Structures

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Memory Hierarchy: Caches, Virtual Memory

Review from last lecture. EECS 252 Graduate Computer Architecture. Lec 4 Memory Hierarchy Review. Outline. Example Standard Deviation: Last time

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Introduction to OpenMP. Lecture 10: Caches

Memory Hierarchy. Slides contents from:

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Cray XE6 Performance Workshop

ECE232: Hardware Organization and Design

Recall: Paging. Recall: Paging. Recall: Paging. CS162 Operating Systems and Systems Programming Lecture 13. Address Translation, Caching

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Introduction. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Transcription:

Memory Hierarchy Maurizio Palesi Maurizio Palesi 1

References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio Palesi 2

Who Cares About the Memory Hierarchy? Performance 1000 100 10 1 Processor DRAM Memory Gap (latency) Moore s Law 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 µproc CPU 60%/yr. (2X/1.5yr) Processor Memory Performance Gap: (grows 50% / year) DRAM DRAM 9%/yr. (2X/10 yrs) Maurizio Palesi 3

Maurizio Palesi 4

Maurizio Palesi 5

Maurizio Palesi 6

Levels of the Memory Hierarchy CPU Registers 100s Bytes <10s ns Cache K Bytes 10 100 ns 1 0.1 cents/bit Main Memory M Bytes 200ns 500ns $.0001.00001 cents /bit Disk G Bytes, 10 ms (10,000,000 ns) 10 5 10 6 cents/bit Tape infinite sec min 10 8 Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape Staging Xfer Unit prog./compiler 1 8 bytes cache cntl 8 128 bytes OS 512 4K bytes user/operator Mbytes Faster Larger Maurizio Palesi 7

What is a Cache? Small, fast storage used to improve average access time to slow memory Exploits spatial and temporal locality In computer architecture, almost everything is a cache! Registers a cache on variables First level cache a cache on second level cache Second level cache a cache on memory Memory a cache on disk (virtual memory) TLB a cache on page table Branch prediction a cache on prediction information? Maurizio Palesi 8

The Principle of Locality The Principle of Locality Program access a relatively small portion of the address space at any instant of time Two Different Types of Locality Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 20 years, HW relied on locality for speed Maurizio Palesi 9

Exploit Locality By taking advantage of the principle of locality Present the user with as much memory as is available in the cheapest technology Provide access at the speed offered by the fastest technology DRAM is slow but cheap and dense Good choice for presenting the user with a BIG memory system SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time Maurizio Palesi 10

General Principes Locality Temporal Locality: referenced again soon Spatial Locality: nearby items referenced soon Locality + smaller HW is faster = memory hierarchy Levels: each smaller, faster, more expensive/byte than level below Inclusive: data found in top also found in the bottom Definitions Upper is closer to processor Block is the minimum unit that is present or not in upper level Address = Block frame address + block offset address Maurizio Palesi 11

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor Hit Time << Miss Penalty (500 instructions on 21264!) To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y Maurizio Palesi 12

Cache Measures Average memory access time = Hit time + Miss rate x Miss penalty [ns or clocks] Miss penalty: time to replace a block from lower level, including time to replace in CPU access time: time to lower level = f(latency to lower level) transfer time: time to transfer block =f(bw between upper & lower levels) Maurizio Palesi 13

Block Size Tradeoff In general, larger block size take advantage of spatial locality BUT Larger block size means larger miss penalty Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up Too few cache blocks In general, Average Access Time = Hit Time + Miss Penalty x Miss Rate Miss Penalty Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size Block Size Block Size Maurizio Palesi 14

4 Questions for Memory Hierarchy Q1: Where can a block be placed in the upper level? (Block placement) Fully Associative, Set Associative, Direct Mapped Q2: How is a block found if it is in the upper level? (Block identification) Tag/Block Q3: Which block should be replaced on a miss? (Block replacement) Random, LRU Q4: What happens on a write? (Write strategy) Write Back or Write Through (with Write Buffer) Maurizio Palesi 15

Q1: Where can a block be placed in the upper level? Fully associative: block 12 can go anywhere Direct mapped: block 12 can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 (12 mod 4) Block no. 0 1 2 3 4 5 6 7 Block no. 0 1 2 3 4 5 6 7 Block no. 0 1 2 3 4 5 6 7 Block frame address Set 0 Set 1 Set 2 Set 3 Block no. 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Maurizio Palesi 16

Q2: How Is a Block Found If It Is in the Upper Level? Tag on each block No need to check index or block offset Increasing associativity shrinks index, expands tag Full Associative: No index Direct Mapped : Large index Maurizio Palesi 17

Cache Direct Mapped Main Memory 0 7 Cache 00000 ( 0) 00001 ( 1) 00111 ( 7) 01000 ( 8) 01111 (15) 10000 (16) 10111 (23) 11000 (24) Part. 0 Part. 1 Part. 2 Part. 3 11111 (31) Maurizio Palesi 18

Cache Direct Mapped Main Memory 0 7 0 7 tag Cache 00000 ( 0) 00001 ( 1) 00111 ( 7) 01000 ( 8) 01111 (15) 10000 (16) 10111 (23) 11000 (24) Part. 0 Part. 1 Part. 2 Part. 3 11111 (31) Maurizio Palesi 19

Q3: Which Block Should be Replaced on a Miss? Easy for Direct Mapped S.A. or F.A.: Random (large associativities) LRU (smaller associativities) Associativity 2 way 4 way 8 way Size LRU RND LRU RND LRU RND 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% Maurizio Palesi 20

Q4: What Happens on a Write? Write through: The information is written to both the block in the cache and to the block in the lower level memory Write back: The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced Is block clean or dirty? Pros and Cons of each WT: read misses cannot result in writes (because of replacements) WB: no writes of repeated writes WT always combined with write buffers so that don t wait for lower level memory Maurizio Palesi 21

Write Buffer for Write Through Processor Cache DRAM A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries: 4 Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Memory system designer s nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturation Write buffer Maurizio Palesi 22

How a Block is Found in Cache tag index offset CPU address Cache tag Cache data compare Hit/miss Data bus CPU Maurizio Palesi 23

How a Block is Found in Cache Two sets of Address tags and data RAM Use address bits to select correct DRAM 2:1 Mux for the way Maurizio Palesi 24

Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved Conflict If block placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses Maurizio Palesi 25

3Cs Absolute Miss Rate (SPEC92) 0.14 0.12 0.1 0.08 0.06 0.04 1-way 2-way 4-way 8-way Capacity 0.02 0 1 2 4 8 16 32 64 128 Cache Size (KB) Compulsory Maurizio Palesi 26

2:1 Cache Rule miss rate 1 way associative cache size X = miss rate 2 way associative cache size X/2 0.14 0.12 0.1 0.08 0.06 0.04 1-way 2-way 4-way 8-way Capacity 0.02 0 1 2 4 8 16 32 64 128 Cache Size (KB) Compulsory Maurizio Palesi 27

3Cs Relative Miss Rate 100% 80% 60% 1-way 2-way 4-way 8-way Conflict 40% Capacity 20% 0% 1 2 4 8 16 32 Cache Size (KB) 64 128 Compulsory Maurizio Palesi 28