Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Size: px

Start display at page:

Download "Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies."

Ethel Hopkins
6 years ago
Views:

1 M6 Memory Hierarchy

2 Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

3 Events on a Cache Miss

4 Events on a Cache Miss Stall the pipeline.

5 Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss:

6 Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory.

7 Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access.

8 Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access. 3.Fill the cache entry: write the data from memory into the cache block, fill the tag field from the address, turn the valid bit on.

9 Events on a Cache Miss Stall the pipeline. Steps on an I-Cache miss: 1. Send the PC value to the memory. 2.Instruct main memory to perform a read and wait for the memory to complete its access. 3.Fill the cache entry: write the data from memory into the cache block, fill the tag field from the address, turn the valid bit on. 4. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache.

10 Inside a Cache PROCESSOR CACHE MAIN MEMORY

11 Inside a Cache PROCESSOR CACHE MAIN MEMORY N 1 DATA

12 Inside a Cache PROCESSOR CACHE MAIN MEMORY N 1 Line (Block) DATA

13 Inside a Cache PROCESSOR CACHE MAIN MEMORY N 1 Cache RAM TAGS Line (Block) DATA

14 Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER N 1 Cache RAM TAGS Line (Block) DATA

15 Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER N 1 Cache RAM TAGS Line (Block) DATA

16 Inside a Cache PROCESSOR CACHE MAIN MEMORY CONTROLLER N 1 Lookup Logic Cache RAM TAGS Line (Block) DATA

17 Cache Access Time PROCESSOR CACHE MAIN MEMORY

18 Cache Access Time PROCESSOR CACHE MAIN MEMORY

19 Cache Access Time Hit Time PROCESSOR CACHE MAIN MEMORY

20 Cache Access Time PROCESSOR CACHE MAIN MEMORY

21 Cache Access Time PROCESSOR CACHE MAIN MEMORY

22 Cache Access Time Miss Penalty PROCESSOR CACHE MAIN MEMORY

23 Cache Access Time Hit Time, Miss Penalty PROCESSOR CACHE MAIN MEMORY Average Memory Access Time = Hit Time + Miss rate Miss penalty Stall Time

24 Processor Performance No Cache P Main Memory Access P Main

25 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns P Main Memory Access P Main

26 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles P Main Memory Access P Main

27 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = P Main Memory Access P Main

28 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 P Main Memory Access P Main

29 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? P Main Memory Access P Main

30 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? Assuming no memory data access: P Main Memory Access P Main

31 Processor Performance No Cache 5GHz processor, cycle time = 0.2ns Memory access time = 100ns = 500 cycles Ignoring memory access, Clocks Per Instruction (CPI) = 1 What is the CPI including memory accesses? Assuming no memory data access: CPI no-cache = 1 + #stall cycles = 501 P Main Memory Access P Main

32 Processor + Cache Performance Hit Rate = 0.95

33 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle

34 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles

35 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =?

36 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles= =25

37 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles= =25 CPIwith-cache = 26

38 Processor + Cache Performance Hit Rate = 0.95 L1 Access Time = 0.2 ns = 1 cycle CPIwith-cache = 1 + #stall cycles #stall cycles =? stall cycles= Miss Rate Miss Penalty stall cycles= =25 CPIwith-cache = 26 Increase in performance = 501/26 = 19.3

39 Cache Terms cache hit An access where the data is found in the cache.

40 Cache Terms cache hit An access where the data is found in the cache. cache miss -- an access which isn t

41 Cache Terms cache hit An access where the data is found in the cache. cache miss -- an access which isn t hit time -- time to access the cache miss penalty -- time to move data from lower level to upper, then to cpu

42 Cache Terms hit rate -- percentage of cache hits

43 Cache Terms hit rate -- percentage of cache hits miss rate -- (1 hit rate)

44 Cache Terms cache block size or cache line size -- the amount of data that gets transferred on a cache miss.

45 Cache Terms instruction cache -- cache that only holds instructions. data cache -- cache that only caches data.

46 Cache Performance

47 Cache Performance

48 Cache Performance

49 Cache Performance

50 Cache Performance

51 Cache Performance Assume the miss rate of an instruction cache is 2% and the miss rate of the data cache is 4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. Assume the frequency of all loads and stores is 36%. Average memory stall cycles for (a) Instruction Cache (b) Data Cache?

52 Multilevel Caches Primary cache attached to CPU Small, but fast

53 Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory

54 Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L-2 cache misses

55 Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L-2 cache misses Some high-end systems include L-3 cache

56 Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns

57 Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = Effective CPI =

58 Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = 100ns/0.25ns = 400 cycles Effective CPI = = 9

59 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5%

60 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty =

61 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles

62 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss

63 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty =

64 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI =

65 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI = = 3.4 Performance ratio =

66 Example (cont.) Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 400 cycles CPI = = 3.4 Performance ratio = 9/3.4 = 2.6

67 Multilevel Cache Considerations Primary cache L-2 cache Results

68 Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Results

69 Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results

70 Multilevel Cache Considerations Primary cache Focus on minimal hit time L-2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results L-1 cache usually smaller than a single cache L-1 block size smaller than L-2 block size

71 Memory Hierachy Recap What programmers want: Unlimited amounts of memory with low latency

72 Memory Hierachy Recap What programmers want: Unlimited amounts of memory with low latency Reality: Memory latency is huge Reality: Fast memory technology is more expensive per bit than slower memory

73 Memory Hierachy Recap Solution: organize memory system into a hierarchy

74 Memory Hierachy Recap Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it

75 Memory Hierarchy Levels Block (aka line): unit of copying

76 Memory Hierarchy Levels If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses

77 Memory Hierarchy Levels If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses = 1 hit ratio Then accessed data supplied from upper level

78 Memory Hierarchy

79 Memory Hierarchy Disk Storage Disk Storage Main Memory Main Memory L3 C a c h e L3 C a c h e L2 C a c h e L2 C a c h e L1 C a c h e L1 C a c h e PP Register Reference Register Reference L1/L2/L3 Cache Reference L1/L2/L3 Cache Reference Memory Reference Memory Reference Disk Reference Disk Reference

80 Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page

81 Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page B KB KB 256KB 256KB 1MB 1MB MB MB GB GB TB TB

82 Memory Hierarchy PP L1 L1 C a a c c h e e L2 L2 C a c h e L3 L3 C a c h e Main Main Memory Disk Disk Storage Register Register Reference Reference L1/L2/L3 L1/L2/L3 Cache Cache Reference Reference Memory Memory Reference Reference Disk Disk Reference Reference 32B 32B 64B 64B Word Word 16B 16B 64B 64B Cache Cache Block Block 4KB 4KB 16KB 16KB Page Page B KB KB 256KB 256KB 1MB 1MB MB MB GB GB TB TB 300ps 300ps 1 ns ns ns ns ns ns ns ns ms ms

83 Memory Hierarchy Other Topics Main Memory, DRAM Virtual Memory Non-Volatile Memory Persistent NVM

84 Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar