Chapter 8 Memory Hierarchy and Cache Memory

Size: px

Start display at page:

Download "Chapter 8 Memory Hierarchy and Cache Memory"

Maud Newton
5 years ago
Views:

1 Chapter 8 Memory Hierarchy and Cache Memory Digital Design and Computer Architecture: ARM Edi*on Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <1>

2 Chapter 8 :: Topics Introduc*on Memory Performance Memory Hierarchy Caches Virtual Memory Memory-Mapped I/O Summary Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <2>

3 Introduc>on Computer performance depends on: Processor performance Memory system performance Memory Interface CLK CLK MemWrite WE Processor Address Write Memory Read Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <3>

4 Processor-Memory Gap In prior chapters, assumed memory access takes 1 clock cycle but hasn t been true since the 198 s Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <4>

5 Memory System Challenge Make memory system appear as fast as processor Use hierarchy of memories Ideal memory: Fast Cheap (inexpensive) Large (capacity) But can only choose two! Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <5>

6 Memory Hierarchy Technology Price / GB Access Time (ns) Bandwidth (GB/s) Cache SRAM $1, Speed Main Memory DRAM $ SSD $1 1,.5 Virtual Memory HDD $.1 1,,.1 Capacity Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <6>

7 Locality Exploit locality to make memory accesses fast Temporal Locality: Locality in >me If data used recently, likely to use it again soon How to exploit: keep recently accessed data in higher levels of memory hierarchy Spa*al Locality: Locality in space If data used recently, likely to use nearby data soon How to exploit: when access data, bring nearby data into higher levels of memory hierarchy too Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <7>

8 Memory Performance Hit: data found in that level of memory hierarchy Miss: data not found (must go to next level) Hit Rate Miss Rate = # hits / # memory accesses = 1 Miss Rate = # misses / # memory accesses = 1 Hit Rate Average memory access time (AMAT): average time for processor to access data AMAT = t cache + MR cache [t MM + MR MM (t VM )] Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <8>

9 Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <9>

10 Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Hit Rate = 125/2 =.625 Miss Rate = 75/2 =.375 = 1 Hit Rate Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <1>

11 Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <11>

12 Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? AMAT = t cache + MR cache (t MM ) = [ (1)] cycles = 38.5 cycles Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <12>

Gene Amdahl, 1922- Amdahl s Law: the effort spent increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of

13 Gene Amdahl, Amdahl s Law: the effort spent increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of overall performance Co-founded 3 companies, including one called Amdahl Corpora>on in 197 Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <13>

14 Cache Highest level in memory hierarchy Fast (typically ~ 1 cycle access >me) Ideally supplies most data to processor Usually holds most recently accessed data Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <14>

15 Cache Design Ques>ons What data is held in the cache? How is data found? What data is replaced? Focus on data loads, but stores follow same principles Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <15>

16 What data is held in the cache? Ideally, cache an>cipates needed data and puts it in cache But impossible to predict future Use past to predict future temporal and spa>al locality: Temporal locality: copy newly accessed data into cache Spa*al locality: copy neighboring data into cache too Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <16>

17 Cache Terminology Capacity (C): number of data bytes in cache Block size (b): bytes of data brought into cache at once Number of blocks (B = C/b): number of blocks in cache: B = C/b Degree of associa*vity (N): number of blocks in a set Number of sets (S = B/N): each memory address maps to exactly one cache set Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <17>

18 How is data found? Cache organized into S sets Each memory address maps to exactly one set Caches categorized by # of blocks in a set: Direct mapped: 1 block per set N-way set associa*ve: N blocks per set Fully associa*ve: all cache blocks in 1 set Examine each organiza>on for a cache with: Capacity (C = 8 words) Block size (b = 1 word) So, number of blocks (B = 8) Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <18>

19 Example Cache Parameters C = 8 words (capacity) b = 1 word (block size) So, B = 8 (# of blocks) Ridiculously small, but will illustrate organiza*ons Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <19>

20 Direct Mapped Cache Address mem[xff...fc] mem[xff...f8] mem[xff...f4] mem[xff...f] mem[xff...ec] mem[xff...e8] mem[xff...e4] mem[xff...e] mem[x...24] mem[x..2] mem[x..1c] mem[x...18] mem[x...14] mem[x...1] mem[x...c] mem[x...8] mem[x...4] mem[x...] 2 3 Word Main Memory 2 3 Word Cache Set Number 7 (111) 6 (11) 5 (11) 4 (1) 3 (11) 2 (1) 1 (1) () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <2>

21 Direct Mapped Cache Hardware Memory Address Set 27 3 Byte Offset V 8-entry x ( )-bit SRAM = Hit Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <21>

22 Direct Mapped Cache Performance Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] DONE SUB R, R, #1 B LOOP... Set 1 3 Byte Offset V 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate =? Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <22>

23 Direct Mapped Cache Performance Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] DONE SUB R, R, #1 B LOOP... Set 1 3 Byte Offset V 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate = 3/15 = 2% Temporal Locality Compulsory Misses Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <23>

24 Direct Mapped Cache: Conflict Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # DONE BEQ DONE...1 LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP Set 1 3 Byte Offset V 1... Miss Rate =? mem[x...4] mem[x...24] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <24>

25 Direct Mapped Cache: Conflict Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # DONE BEQ DONE...1 LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP Set 1 3 Byte Offset V 1... mem[x...4] mem[x...24] Miss Rate = 1/1 = 1% Conflict Misses Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <25>

26 N-Way Set Associa>ve Cache Memory Address Set 28 2 Byte Offset V Way 1 Way V = = Hit 1 Hit 1 32 Hit 1 Hit Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <26>

27 N-Way Set Associa>ve Performance ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP V Way 1 Way V Miss Rate =? Set 3 Set 2 Set 1 Set Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <27>

28 N-Way Set Associa>ve Performance ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP V Way 1 Way Miss Rate = 2/1 = 2% Associativity reduces conflict misses V mem[x...24] 1... mem[x...4] Set 3 Set 2 Set 1 Set Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <28>

29 Fully Associa>ve Cache V V V V V V V V Reduces conflict misses Expensive to build Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <29>

30 Spa>al Locality? Increase block size: Block size, b = 4 words C = 8 words Direct mapped (1 block per set) Number of blocks, B = 2 (C/b = 8/4 = 2) Memory Address Block Byte Set Offset Offset 27 2 V Set 1 Set = 32 Hit Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <3>

31 Cache with Larger Block Size Memory Address Block Byte Set Offset Offset 27 2 V Set 1 Set = 32 Hit Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <31>

32 Direct Mapped Cache Performance ARM assembly code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] SUB R, R, #1 B LOOP Miss Rate =? Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <32>

33 Direct Mapped Cache Performance ARM assembly code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] SUB R, R, #1 B LOOP Memory Address Block Byte Set Offset Offset V 27 Miss Rate = 1/15 = 6.67% Larger blocks reduce compulsory misses through spatial locality 1... mem[x...c] mem[x...8] mem[x...4] mem[x...] Set 1 Set = 32 Hit Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <33>

34 Cache Organiza>on Recap Capacity: C Block size: b Number of blocks in cache: B = C/b Number of blocks in a set: N Number of sets: S = B/N Organiza*on Number of Ways (N) Direct Mapped 1 B Number of Sets (S = B/N) N-Way Set Associa*ve 1 < N < B B / N Fully Associa*ve B 1 Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <34>

35 Capacity Misses Cache is too small to hold all data of interest at once If cache full: program accesses data X & evicts data Y Capacity miss when access Y again How to choose Y to minimize chance of needing it again? Least recently used (LRU) replacement: the least recently used block in a set evicted Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <35>

36 Types of Misses Compulsory: first >me data accessed Capacity: cache too small to hold all data of interest Conflict: data of interest maps to same loca>on in cache Miss penalty: >me it takes to retrieve a block from lower level of hierarchy Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <36>

37 LRU Replacement ARM Assembly Code MOV R, # LDR R1, [R, #4] LDR R2, [R, #x24] LDR R3, [R, #x54] Way 1 Way V U V Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <37>

38 LRU Replacement ARM Assembly Code MOV R, # LDR R1, [R, #4] LDR R2, [R, #x24] LDR R3, [R, #x54] (a) Way 1 Way V U V mem[x...24] 1... mem[x...4] Way 1 Way Set 3 (11) Set 2 (1) Set 1 (1) Set () (b) V U V mem[x...24] mem[x...54] Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <38>

39 Cache Summary What data is held in the cache? Recently used data (temporal locality) Nearby data (spa>al locality) How is data found? Set is determined by address of data Word within block also determined by address In associa>ve caches, data could be in one of several ways What data is replaced? Least-recently used way in the set Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <39>

40 Miss Rate Trends Bigger caches reduce capacity misses Greater associativity reduces conflict misses Adapted Digital Design from Patterson and Computer & Hennessy, Architecture: Computer Architecture: ARM Edi>on A Quantitative 215 Approach, Chapter 2118 <4>

41 Miss Rate Trends Bigger blocks reduce compulsory misses Bigger blocks increase conflict misses Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <41>

42 Mul>level Caches Larger caches have lower miss rates, longer access >mes Expand memory hierarchy to mul>ple levels of caches Level 1: small and fast (e.g. 16 KB, 1 cycle) Level 2: larger and slower (e.g. 256 KB, 2-6 cycles) Most modern PCs have L1, L2, and L3 cache Digital Design and Computer Architecture: ARM Edi>on 215 Chapter 8 <42>

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction