Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Cache memory idea Use a small faster memory, a cache memory, to store recently used data instead of always accessing the slow main memory CPU Data Cache Data Main memory Fast: 1 ns Slow: 7 ns Minne 2

Increased average performance Using no cache: All accesses must go to main memory Average access time: 7 ns Using cache: Cache hit: 1 ns access time Cache miss: 7 ns access time If 95% of all accesses hit then: Average access time = 0,95 1 + 0,05 7 = 1,3 ns 95% - is that realistic? Minne 3

Locality Instructions and data are not randomly referenced! Because of: Loops, subroutines, Data structure, arrays, 200: lw $3, 0($2) 204: lw $4, 100($2) 208: slti $6, $2, 100 212: add $5, $3, $4 216: sw $5, 100($2) 220: addi $2, $2, 4 224: bne $6, $0, -28 Address Time Minne 4

Locality Spatial locality If one address is accessed, it is probable that a nearby address soon will be accessed Temporal locality If one address is accessed, it is probable that the same address soon will be accessed again Locality can be exploited to keep a small subset of data and instructions that are likely to be accessed soon in small and fast storage Minne 5

Memory System Performance P Facts Large is slow, small is fast Memory performance cannot keep up with processors Average access time = h 1 T 1 + h 2 T 2 = h 1 T 1 + (1-h 1 )T 2 h x = hit rate = probability of finding data in memory x (1-h x ) = m x = miss rate for memory x T x = hit time = access time for memory x (T y -T x ) = miss penalty for memory x M 1 M 2 Minne 6

Memory Hierarchy Address Address Address CPU Data Cache Data Main memory Data Secondary memory Registers 128-512 B cycle time SRAM 8-2048 KB a few ns h=80-99.9% DRAM 32-512 MB 10:s of ns h>99.9999% Magnetic disk >0.5 GB milliseconds h=100% Minne 7

Levels CPU Address Address Address Cache Main memory Data Data Data Secondary memory Division into levels Highest level Closest to CPU Smallest Fastest Middle levels One or more levels (sometimes more cache levels) Lowest level Farthest from CPU Largest Slowest Minne 8

The Memory Hierarchy is Normally a True Hierarchy CPU Address Address Address Cache Main memory Data Data Data Secondary memory Each level contains a subset of the contents of lower levels Cache contains subset of main memory < Main memory contains subset of secondary memory < Secondary memory contains all addressable data and instructions Minne 9

Blocks Address Address Address CPU 4-8 B 16-256 B 4-64 KB Data are transferred in block of different sizes at different levels. Block sizes are adapted to exploit spatial locality and to optimize block transfer times. Minne 10

Using Hierarchical Memory CPU CPU needs a word or byte Secondary memory always contains all addressable data and instrustions Minne 11

Using Hierarchical Memory Address CPU CPU needs a word or byte Word address is sent to highest level (cache) Minne 12

Using Hierarchical Memory CPU CPU needs a word or byte No block containing the word is found Minne 13

Using Hierarchical Memory Address CPU CPU needs a word or byte Required block address is sent to next lower level (main memory) Minne 14

Using Hierarchical Memory CPU CPU needs a word or byte No larger block containing the needed block is found Minne 15

Using Hierarchical Memory Address CPU CPU needs a word or byte Address to required larger block is sent to next lower level (secondary memory) Minne 16

Using Hierarchical Memory CPU CPU needs a word or byte Required block is found! Minne 17

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 18

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 19

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 20

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 21

Using Hierarchical Memory CPU The needed word is copied to the CPU Cache contains block with copy of the word Main memory contains block with copy of the cache block Secondary memory contains block with copy of main memory block Minne 22

Using Hierarchical Memory Address CPU CPU needs same word again Minne 23

Using Hierarchical Memory Address CPU Required word is copied to the CPU Required word is found directly in cache Thanks to locality, data and instructions are usually found close to the CPU! Minne 24

Four Important Questions For each level in the memory hierarchy, the following questions need to be answered: Where to place a block? How to find a block? Which block to replace? What happens on a write? Minne 25

Cache Memory The memory level(s) between main memory and the CPU Has a fixed number of block places Each block place can contain blocks from a subset of all blocks in main memory The address of a block determines which block place it can be stored in Minne 26

Cache Addressing Address from CPU typically 32-64 bits Points out one byte 31 CPU address 0 The word that contains this byte is fetched from memory 31 Word address (30 bits) Byte in word 2 0 4 bytes/word Minne 27

Cache Addressing Find the right cache block for the addressed word Block size typically 1-64 words (4-256 B) 31 Block number/address 4 Byte offset 0 16 bytes/block Typically 8-2048 KB of total cache capacity, 256-16384 block places Minne 28

Cache Addressing and then find the corresponding block place Least significant part of block address is used as index to a block place 31 14 Index 4 Byte offset 0 1K block places 16 bytes/block Minne 29

Cache Addressing The rest of the block address (tag) is stored together with the block, because indexes are shared 31 Tag 14 Index 4 Byte offset 0 256K blocks mapped to the same place 1K block places 16 bytes/block Many blocks compete for the same block place Minne 30

Cache Memory Implementation 31 Tag 14 Index 4 Offset 0 Wrd Byte Index 0 1 2 selected 1021 1022 1023 Valid Tag Data (block=4 words) = & Hit Data (1 word) Minne 31

What Happens on a Cache Miss? Any valid block already at the searched block s index is thrown out (sometimes it must be copied back to main memory) The searched block is read from main memory and written into the cache memory at the searched address index, and needed word are delivered to the CPU This procedure takes much longer than if the block had been found in the cache immediately (cache hit) Minne 32

Block num A Problem with Caches 00001 Memory 00101 01001 01101 10001 Block place 000 001 010 011 100 101 110 111 Cache 10101

Associativity Many blocks (256K in the example) share index and compete for the same block place 31 Tag 14 Index 4Wrd Byte0 Problem if several such blocks are needed simultaneously! Solution: Use multiple parallel cache memories! each index points out a set of block places each block can be stored anywhere within its set several blocks with the same index can be used simultaneously number of block places per set = associativity Minne 34

Associativity Direct mapped cache 1 block/set (as described earlier) k-way set associative cache k blocks/set (equivalent to k parallel caches) typically k=2 or 4 Fully associative cache all block places are part of a single set expensive and unusual except for certain small and specialized caches Minne 35

Two-Way Set Associative Cache Two parallel direct mapped caches: Index 0 1 2 1021 1022 1023 Valid Tag Data (block=4 words) Index 0 1 2 1021 1022 1023 Valid Tag Data (block=4 words) Minne 36

Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 0 1 2 selected 1021 1022 1023 = = & & Minne 37

Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 = & Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 = & =1 Hit Data (1 word) Minne 38

Which Block to Replace? Replacement algorithms LRU (Least Recently Used) throw out the block that has not been accessed for the longest time (requires extra Referenced bits) because of temporal locality, this is the best approximation of saving the blocks that will soon be accessed again Random randomly choose a block Minne 39

What Happens on a Write? Two main strategies: Write through all written to the cache, is also written directly to the main memory (via a Store buffer ) Write back (copy back) write new data only in the cache memory when a block is thrown out, it is written to the main memory if it was modified in the cache memory (requires and extra Dirty bit per block) Minne 40

Cache Memory Performance Hit rate (miss rate) depends primarily on: Cache memory size Block size Associativity Average access time also affected by hit time and miss penalty NB! Methods the improve hit rate can easily affect access times so that the overall performance effect may be negative Minne 41

Virtual Memory Extends the memory hierarchy to include main memory and mass storage Main memory as cache for data in secondary memory Large block sizes (kilobytes), often called pages Physical addresses for main memory Virtual addresses for secondary memory Virtual-physical translation performed in processor Allows different processes to use independent address spaces => possibility for protection More about this in Appendix C (C.4 and C.5) and later in the course