Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Size: px

Start display at page:

Download "Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>"

Coleen Terry
5 years ago
Views:

1 Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <1>

2 Chapter 8 :: Topics Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <2>

3 Introduction Computer performance depends on: Processor performance Memory system performance Memory Interface CLK CLK MemWrite WE Processor Address WriteData Memory ReadData Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <3>

4 Processor-Memory Gap In prior chapters, assumed memory access takes 1 clock cycle but hasn t been true since the 198 s Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <4>

5 Memory System Challenge Make memory system appear as fast as processor Use hierarchy of memories Ideal memory: Fast Cheap (inexpensive) Large (capacity) But can only choose two! Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <5>

6 Memory Hierarchy Technology Price / GB Access Time (ns) Bandwidth (GB/s) Cache SRAM $1, Speed Main Memory DRAM $ SSD $1 1,.5 Virtual Memory HDD $.1 1,,.1 Capacity Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <6>

7 Locality Exploit locality to make memory accesses fast Temporal Locality: Locality in time If data used recently, likely to use it again soon How to exploit: keep recently accessed data in higher levels of memory hierarchy Spatial Locality: Locality in space If data used recently, likely to use nearby data soon How to exploit: when access data, bring nearby data into higher levels of memory hierarchy too Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <7>

8 Memory Performance Hit: data found in that level of memory hierarchy Miss: data not found (must go to next level) Hit Rate Miss Rate = # hits / # memory accesses = 1 Miss Rate = # misses / # memory accesses = 1 Hit Rate Average memory access time (AMAT): average time for processor to access data AMAT = t cache + MR cache [t MM + MR MM (t VM )] Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <8>

9 Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <9>

10 Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Hit Rate = 125/2 =.625 Miss Rate = 75/2 =.375 = 1 Hit Rate Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <1>

11 Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <11>

12 Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? AMAT = t cache + MR cache (t MM ) = [ (1)] cycles = 38.5 cycles Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <12>

Gene Amdahl, 1922- Amdahl s Law: the effort spent increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of

13 Gene Amdahl, Amdahl s Law: the effort spent increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of overall performance Co-founded 3 companies, including one called Amdahl Corporation in 197 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <13>

14 Cache Highest level in memory hierarchy Fast (typically ~ 1 cycle access time) Ideally supplies most data to processor Usually holds most recently accessed data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <14>

15 Cache Design Questions What data is held in the cache? How is data found? What data is replaced? Focus on data loads, but stores follow same principles Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <15>

16 What data is held in the cache? Ideally, cache anticipates needed data and puts it in cache But impossible to predict future Use past to predict future temporal and spatial locality: Temporal locality: copy newly accessed data into cache Spatial locality: copy neighboring data into cache too Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <16>

17 Cache Terminology Capacity (C): number of data bytes in cache Block size (b): bytes of data brought into cache at once Number of blocks (B = C/b): number of blocks in cache: B = C/b Degree of associativity (N): number of blocks in a set Number of sets (S = B/N): each memory address maps to exactly one cache set Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <17>

18 How is data found? Cache organized into S sets Each memory address maps to exactly one set Caches categorized by # of blocks in a set: Direct mapped: 1 block per set N-way set associative: N blocks per set Fully associative: all cache blocks in 1 set Examine each organization for a cache with: Capacity (C = 8 words) Block size (b = 1 word) So, number of blocks (B = 8) Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <18>

19 Example Cache Parameters C = 8 words (capacity) b = 1 word (block size) So, B = 8 (# of blocks) Ridiculously small, but will illustrate organizations Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <19>

20 Direct Mapped Cache Address mem[xff...fc] mem[xff...f8] mem[xff...f4] mem[xff...f] mem[xff...ec] mem[xff...e8] mem[xff...e4] mem[xff...e] mem[x...24] mem[x..2] mem[x..1c] mem[x...18] mem[x...14] mem[x...1] mem[x...c] mem[x...8] mem[x...4] mem[x...] 2 3 Word Main Memory 2 3 Word Cache Set Number 7 (111) 6 (11) 5 (11) 4 (1) 3 (11) 2 (1) 1 (1) () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <2>

21 Direct Mapped Cache Hardware Memory Address Tag Set 27 3 Byte Offset V Tag Data 8-entry x ( )-bit SRAM = Hit Data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <21>

22 Direct Mapped Cache Performance Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] DONE SUB R, R, #1 B LOOP Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate =? Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <22>

23 Direct Mapped Cache Performance Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] DONE SUB R, R, #1 B LOOP Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate = 3/15 = 2% Temporal Locality Compulsory Misses Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <23>

24 Direct Mapped Cache: Conflict Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # DONE BEQ DONE Tag...1 LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP Set 1 3 Byte Offset V 1 Tag... Data Miss Rate =? mem[x...4] mem[x...24] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <24>

25 Direct Mapped Cache: Conflict Memory Address ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, # DONE BEQ DONE Tag...1 LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP Set 1 3 Byte Offset V 1 Tag... Data mem[x...4] mem[x...24] Miss Rate = 1/1 = 1% Conflict Misses Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <25>

26 N-Way Set Associative Cache Memory Address Tag Set 28 2 Byte Offset V Tag Way 1 Way Data V Tag Data = = Hit 1 Hit Hit Hit Data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <26>

27 N-Way Set Associative Performance ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP V Tag Way 1 Way Data V Miss Rate =? Tag Data Set 3 Set 2 Set 1 Set Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <27>

28 N-Way Set Associative Performance ARM Assembly Code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #x4] LDR R3, [R1, #x24] SUB R, R, #1 B LOOP V Tag Way 1 Way Data Miss Rate = 2/1 V Tag Data mem[x...24] 1... mem[x...4] = 2% Associativity reduces conflict misses Set 3 Set 2 Set 1 Set Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <28>

29 Fully Associative Cache V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data Reduces conflict misses Expensive to build Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <29>

30 Spatial Locality? Increase block size: Block size, b = 4 words C = 8 words Direct mapped (1 block per set) Number of blocks, B = 2 (C/b = 8/4 = 2) Memory Address Tag Block Byte Set Offset Offset 27 2 V Tag 27 Data Set 1 Set = 32 Hit Data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <3>

31 Cache with Larger Block Size Memory Address Block Byte Tag Set Offset Offset 27 2 V Tag 27 Data Set 1 Set = 32 Hit Data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <31>

32 Direct Mapped Cache Performance ARM assembly code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] SUB R, R, #1 B LOOP Miss Rate =? Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <32>

33 Direct Mapped Cache Performance ARM assembly code MOV R, #5 MOV R1, # LOOP CMP R, DONE BEQ DONE LDR R2, [R1, #4] LDR R3, [R1, #12] LDR R4, [R1, #8] SUB R, R, #1 B LOOP Memory Address Tag Block Byte Set Offset Offset V Tag 27 Miss Rate = 1/15 Larger blocks 1... mem[x...c] = 6.67% reduce compulsory misses through spatial locality Data mem[x...8] mem[x...4] mem[x...] Set 1 Set = 32 Hit Data Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <33>

34 Cache Organization Recap Capacity: C Block size: b Number of blocks in cache: B = C/b Number of blocks in a set: N Number of sets: S = B/N Organization Number of Ways (N) Direct Mapped 1 B N-Way Set Associative 1 < N < B B / N Fully Associative B 1 Number of Sets (S = B/N) Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <34>

35 Capacity Misses Cache is too small to hold all data of interest at once If cache full: program accesses data X & evicts data Y Capacity miss when access Y again How to choose Y to minimize chance of needing it again? Least recently used (LRU) replacement: the least recently used block in a set evicted Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <35>

36 Types of Misses Compulsory: first time data accessed Capacity: cache too small to hold all data of interest Conflict: data of interest maps to same location in cache Miss penalty: time it takes to retrieve a block from lower level of hierarchy Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <36>

37 LRU Replacement ARM Assembly Code MOV R, # LDR R1, [R, #4] LDR R2, [R, #x24] LDR R3, [R, #x54] Way 1 Way V U Tag Data V Tag Data Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <37>

38 LRU Replacement ARM Assembly Code MOV R, # LDR R1, [R, #4] LDR R2, [R, #x24] LDR R3, [R, #x54] (a) Way 1 Way V U Tag Data V Tag Data mem[x...24] 1... mem[x...4] Way 1 Way Set 3 (11) Set 2 (1) Set 1 (1) Set () (b) V U Tag Data V Tag mem[x...24] Data mem[x...54] Set 3 (11) Set 2 (1) Set 1 (1) Set () Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <38>

39 Cache Summary What data is held in the cache? Recently used data (temporal locality) Nearby data (spatial locality) How is data found? Set is determined by address of data Word within block also determined by address In associative caches, data could be in one of several ways What data is replaced? Least-recently used way in the set Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <39>

40 Miss Rate Trends Bigger caches reduce capacity misses Greater associativity reduces conflict misses Adapted Digital Design from Patterson and Computer & Hennessy, Architecture: Computer Architecture: ARM Edition A Quantitative 215 Approach, Chapter <4>

41 Miss Rate Trends Bigger blocks reduce compulsory misses Bigger blocks increase conflict misses Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <41>

42 Multilevel Caches Larger caches have lower miss rates, longer access times Expand memory hierarchy to multiple levels of caches Level 1: small and fast (e.g. 16 KB, 1 cycle) Level 2: larger and slower (e.g. 256 KB, 2-6 cycles) Most modern PCs have L1, L2, and L3 cache Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <42>

43 Intel Pentium III Die Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <43>

44 Virtual Memory Gives the illusion of bigger memory Main memory (DRAM) acts as cache for hard disk Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <44>

45 Memory Hierarchy Technology Price / GB Access Time (ns) Bandwidth (GB/s) Cache SRAM $1, Speed Main Memory DRAM $ SSD $1 1,.5 Virtual Memory HDD $.1 1,,.1 Capacity Physical Memory: DRAM (Main Memory) Virtual Memory: Hard drive Slow, Large, Cheap Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <45>

46 Hard Disk Magnetic Disks Read/Write Head Takes milliseconds to seek correct location on disk Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <46>

47 Virtual Memory Virtual addresses Programs use virtual addresses Entire virtual address space stored on a hard drive Subset of virtual address data in DRAM CPU translates virtual addresses into physical addresses (DRAM addresses) Data not in DRAM fetched from hard drive Memory Protection Each program has own virtual to physical mapping Two programs can use same virtual address for different data Programs don t need to be aware others are running One program (or virus) can t corrupt memory used by another Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <47>

48 Cache/Virtual Memory Analogues Cache Block Block Size Block Offset Miss Tag Virtual Memory Page Page Size Page Offset Page Fault Virtual Page Number Physical memory acts as cache for virtual memory Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <48>

49 Virtual Memory Definitions Page size: amount of memory transferred from hard disk to DRAM at once Address translation: determining physical address from virtual address Page table: lookup table used to translate virtual addresses to physical addresses Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <49>

50 Virtual & Physical Addresses Most accesses hit in physical memory But programs have the large capacity of virtual memory Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <5>

51 Address Translation Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <51>

52 Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <52>

53 Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes Organization: Virtual address: 31 bits Physical address: 27 bits Page offset: 12 bits # Virtual pages = 2 31 /2 12 = 2 19 (VPN = 19 bits) # Physical pages = 2 27 /2 12 = 2 15 (PPN = 15 bits) Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <53>

54 Virtual Memory Example 19-bit virtual page numbers 15-bit physical page numbers Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <54>

55 Virtual Memory Example What is the physical address of virtual address x247c? Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <55>

56 Virtual Memory Example What is the physical address of virtual address x247c? VPN = x2 VPN x2 maps to PPN x7fff 12-bit page offset: x47c Physical address = x7fff47c Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <56>

57 How to perform translation? Page table Entry for each virtual page Entry fields: Valid bit: 1 if page in physical memory Physical page number: where the page is located Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <57>

58 Page Table Example Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 VPN is index into page table V 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address Physical Page Number x7fff Page Table C Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <58>

59 Page Table Example 1 What is the physical address of virtual address x5f2? V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <59>

60 Page Table Example 1 What is the physical address of virtual address x5f2? VPN = 5 Virtual Address Entry 5 in page table VPN 5 => physical page 1 Physical address: x1f2 Virtual Page Number x5 19 Page Offset F2 12 V 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address Physical Page Number Page Table x1 F2 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <6>

61 Page Table Example 2 Virtual Address What is the physical address of virtual address x73e? Virtual Page Number x7 19 Page Offset 3E V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Hit 15 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <61>

62 Page Table Example 2 What is the physical address of virtual address x73e? VPN = 7 Entry 7 is invalid Virtual Address Virtual page must be paged into physical memory from disk Virtual Page Number x7 19 Page Offset 3E V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Hit 15 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <62>

63 Page Table Challenges Page table is large usually located in physical memory Load/store requires 2 main memory accesses: one for translation (page table read) one to access data (after translation) Cuts memory performance in half Unless we get clever Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <63>

64 Translation Lookaside Buffer (TLB) Small cache of most recent translations Reduces # of memory accesses for most loads/stores from 2 to 1 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <64>

65 TLB Page table accesses: high temporal locality Large page size, so consecutive loads/stores likely to access same page TLB Small: accessed in < 1 cycle Typically entries Fully associative > 99 % hit rates typical Reduces # of memory accesses for most loads/stores from 2 to 1 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <65>

66 Example 2-Entry TLB Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 Entry 1 Entry V Virtual Page Number Physical Page Number 1 x7fffd x 1 x2 x7fff V Virtual Page Number Physical Page Number TLB = = Hit 1 Hit 1 Hit 1 Hit 15 Physical Address x7fff 47C 12 Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <66>

67 Memory Protection Multiple processes (programs) run at once Each process has its own page table Each process can use entire virtual address space A process can only access physical pages mapped in its own page table Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <67>

68 Virtual Memory Summary Virtual memory increases capacity A subset of virtual pages in physical memory Page table maps virtual pages to physical pages address translation A TLB speeds up address translation Different page tables for different programs provides memory protection Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 <68>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.