Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Size: px

Start display at page:

Download "Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)"

Claud Barker
6 years ago
Views:

1 Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

2 Cache memory idea Use a small faster memory, a cache memory, to store recently used data instead of always accessing the slow main memory CPU Data Cache Data Main memory Fast: 1 ns Slow: 7 ns Minne 2

3 Increased average performance Using no cache: All accesses must go to main memory Average access time: 7 ns Using cache: Cache hit: 1 ns access time Cache miss: 7 ns access time If 95% of all accesses hit then: Average access time = 0, ,05 7 = 1,3 ns 95% - is that realistic? Minne 3

4 Locality Instructions and data are not randomly referenced! Because of: Loops, subroutines, Data structure, arrays, 200: lw $3, 0($2) 204: lw $4, 100($2) 208: slti $6, $2, : add $5, $3, $4 216: sw $5, 100($2) 220: addi $2, $2, 4 224: bne $6, $0, -28 Address Time Minne 4

5 Locality Spatial locality If one address is accessed, it is probable that a nearby address soon will be accessed Temporal locality If one address is accessed, it is probable that the same address soon will be accessed again Locality can be exploited to keep a small subset of data and instructions that are likely to be accessed soon in small and fast storage Minne 5

6 Memory System Performance P Facts Large is slow, small is fast Memory performance cannot keep up with processors Average access time = h 1 T 1 + h 2 T 2 = h 1 T 1 + (1-h 1 )T 2 h x = hit rate = probability of finding data in memory x (1-h x ) = m x = miss rate for memory x T x = hit time = access time for memory x (T y -T x ) = miss penalty for memory x M 1 M 2 Minne 6

Memory Hierarchy Address Address Address CPU Data Cache Data Main memory Data Secondary memory Registers 128-512 B cycle

7 Memory Hierarchy Address Address Address CPU Data Cache Data Main memory Data Secondary memory Registers B cycle time SRAM KB a few ns h= % DRAM MB 10:s of ns h> % Magnetic disk >0.5 GB milliseconds h=100% Minne 7

8 Levels CPU Address Address Address Cache Main memory Data Data Data Secondary memory Division into levels Highest level Closest to CPU Smallest Fastest Middle levels One or more levels (sometimes more cache levels) Lowest level Farthest from CPU Largest Slowest Minne 8

9 The Memory Hierarchy is Normally a True Hierarchy CPU Address Address Address Cache Main memory Data Data Data Secondary memory Each level contains a subset of the contents of lower levels Cache contains subset of main memory < Main memory contains subset of secondary memory < Secondary memory contains all addressable data and instructions Minne 9

10 Blocks Address Address Address CPU 4-8 B B 4-64 KB Data are transferred in block of different sizes at different levels. Block sizes are adapted to exploit spatial locality and to optimize block transfer times. Minne 10

11 Using Hierarchical Memory CPU CPU needs a word or byte Secondary memory always contains all addressable data and instrustions Minne 11

12 Using Hierarchical Memory Address CPU CPU needs a word or byte Word address is sent to highest level (cache) Minne 12

13 Using Hierarchical Memory CPU CPU needs a word or byte No block containing the word is found Minne 13

14 Using Hierarchical Memory Address CPU CPU needs a word or byte Required block address is sent to next lower level (main memory) Minne 14

15 Using Hierarchical Memory CPU CPU needs a word or byte No larger block containing the needed block is found Minne 15

16 Using Hierarchical Memory Address CPU CPU needs a word or byte Address to required larger block is sent to next lower level (secondary memory) Minne 16

17 Using Hierarchical Memory CPU CPU needs a word or byte Required block is found! Minne 17

18 Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 18

19 Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 19

20 Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 20

21 Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 21

22 Using Hierarchical Memory CPU The needed word is copied to the CPU Cache contains block with copy of the word Main memory contains block with copy of the cache block Secondary memory contains block with copy of main memory block Minne 22

23 Using Hierarchical Memory Address CPU CPU needs same word again Minne 23

24 Using Hierarchical Memory Address CPU Required word is copied to the CPU Required word is found directly in cache Thanks to locality, data and instructions are usually found close to the CPU! Minne 24

25 Four Important Questions For each level in the memory hierarchy, the following questions need to be answered: Where to place a block? How to find a block? Which block to replace? What happens on a write? Minne 25

26 Cache Memory The memory level(s) between main memory and the CPU Has a fixed number of block places Each block place can contain blocks from a subset of all blocks in main memory The address of a block determines which block place it can be stored in Minne 26

27 Cache Addressing Address from CPU typically bits Points out one byte 31 CPU address 0 The word that contains this byte is fetched from memory 31 Word address (30 bits) Byte in word bytes/word Minne 27

28 Cache Addressing Find the right cache block for the addressed word Block size typically 1-64 words (4-256 B) 31 Block number/address 4 Byte offset 0 16 bytes/block Typically KB of total cache capacity, block places Minne 28

29 Cache Addressing and then find the corresponding block place Least significant part of block address is used as index to a block place Index 4 Byte offset 0 1K block places 16 bytes/block Minne 29

30 Cache Addressing The rest of the block address (tag) is stored together with the block, because indexes are shared 31 Tag 14 Index 4 Byte offset 0 256K blocks mapped to the same place 1K block places 16 bytes/block Many blocks compete for the same block place Minne 30

31 Cache Memory Implementation 31 Tag 14 Index 4 Offset 0 Wrd Byte Index selected Valid Tag Data (block=4 words) = & Hit Data (1 word) Minne 31

32 What Happens on a Cache Miss? Any valid block already at the searched block s index is thrown out (sometimes it must be copied back to main memory) The searched block is read from main memory and written into the cache memory at the searched address index, and needed word are delivered to the CPU This procedure takes much longer than if the block had been found in the cache immediately (cache hit) Minne 32

33 Block num A Problem with Caches Memory Block place Cache 10101

34 Associativity Many blocks (256K in the example) share index and compete for the same block place 31 Tag 14 Index 4Wrd Byte0 Problem if several such blocks are needed simultaneously! Solution: Use multiple parallel cache memories! each index points out a set of block places each block can be stored anywhere within its set several blocks with the same index can be used simultaneously number of block places per set = associativity Minne 34

35 Associativity Direct mapped cache 1 block/set (as described earlier) k-way set associative cache k blocks/set (equivalent to k parallel caches) typically k=2 or 4 Fully associative cache all block places are part of a single set expensive and unusual except for certain small and specialized caches Minne 35

36 Two-Way Set Associative Cache Two parallel direct mapped caches: Index Valid Tag Data (block=4 words) Index Valid Tag Data (block=4 words) Minne 36

37 Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) Index Valid Tag Data (block=4 words) selected selected = = & & Minne 37

38 Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) selected = & Index Valid Tag Data (block=4 words) selected = & =1 Hit Data (1 word) Minne 38

39 Which Block to Replace? Replacement algorithms LRU (Least Recently Used) throw out the block that has not been accessed for the longest time (requires extra Referenced bits) because of temporal locality, this is the best approximation of saving the blocks that will soon be accessed again Random randomly choose a block Minne 39

40 What Happens on a Write? Two main strategies: Write through all written to the cache, is also written directly to the main memory (via a Store buffer ) Write back (copy back) write new data only in the cache memory when a block is thrown out, it is written to the main memory if it was modified in the cache memory (requires and extra Dirty bit per block) Minne 40

41 Cache Memory Performance Hit rate (miss rate) depends primarily on: Cache memory size Block size Associativity Average access time also affected by hit time and miss penalty NB! Methods the improve hit rate can easily affect access times so that the overall performance effect may be negative Minne 41

42 Virtual Memory Extends the memory hierarchy to include main memory and mass storage Main memory as cache for data in secondary memory Large block sizes (kilobytes), often called pages Physical addresses for main memory Virtual addresses for secondary memory Virtual-physical translation performed in processor Allows different processes to use independent address spaces => possibility for protection More about this in Appendix C (C.4 and C.5) and later in the course

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed