Cache Memory. Content

Size: px

Start display at page:

Download "Cache Memory. Content"

Margaret Willis
6 years ago
Views:

1 Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update Policy Cache Memory 2 1

2 Memory Hierarchy Tradeoff cost speed Memory split in hierarchical levels re gis ters Access probability capacity Access time spped Cost/bit cache main memory secundary memoy Request sent to the next level below until it is carried out. Cache Memory 3 Cache operation overview CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot Cache Memory 4 2

3 Cache Read Operation Cache Memory 5 Cache and Main Memory from now on Cache Memory 6 3

4 Cache Addressing Where does cache sit? Between processor and virtual memory management unit Between MMU and main memory Logical cache (virtual cache) stores data using virtual addresses Processor accesses cache directly, not through physical cache Cache access faster, before MMU address translation Virtual addresses use same address space for different applications Must flush cache on each context switch Physical cache stores data using main memory physical addresses Cache Memory 7 Principle of locality Spatial The processor tends to access few restricted areas of the address space. Temporal The processor tends to access in the near future addresses accessed in the recent past.. Cache Memory 8 4

5 Definitions Hit : access served by the cache Miss: access not served by the cache Hitratio: proportion of accesses served by the cache Missratio: proportion of accesses not served by the cache Clearly m+h =1 number of accesses served by the cach h = total number of accesses number of accesses not served by the cach m = total number of accesses Cache Memory 9 Definitions Example: Let h be the hitratio t hit the access time on a hit t miss the access time on a miss The average memory access time t will be: t = h t hit + (1-h) t miss t miss t hit h 0 1 Cache Memory 10 5

6 Definitions Block All set of 2 b bytes in consecutive addresses, starting in addresses whose b least significant bits are zero. Note that the addresses of the bytes belonging to the same block are coincident to the left of the b least significant bytes. block 0 address content block The data exchange between the cache and the main memory is carried out block-byblock. Does it make sense? Cache Memory 11 Fully Associative Cache Architeture lines L lines valid bit indicates if the line contains a valid memory block copy TAG contains the number of the block copied in that line VALUE contains a copy of the memory block 2 L -1 Cache Memory 12 6

7 Fully Associative Cache Operation ADDRESS GENERATED BY THE CPU a-1 b b-1 0 Block number of the addressed points to the byte/word byte/word in the The cache controller compares the block number block and the TAG field of all lines simultaneously (associative search). If a TAG matches the block number and the valid bit is on, it is a hit, otherwise it is a miss. The b least significant bits are used as points to the byte/word within the block. b Cache Memory 13 Fully Associative Cache Problem To compare the block number with the TAG fields of all cache lines simultaneously (associative search) one needs lots of comparators. Consequence Fully associative design is only used for small capacity caches.. Cache Memory 14 7

8 Direct Mapped Caches Basic Idea: Assign each main memory block to a single cache line. main memory blocks f cache lines Cache Memory 15 Direct Mapped Caches Basic Idea: Each main memory block can only be loaded into the cache line it is mapped to. Thus, it will be no more necessary to check all lines but just one. Cache Memory 16 8

9 Direct Mapped Caches Operation: ADDRESS GENERATED BY THE CPU a-1 b+l b+l-1 b b-1 0 L b to be compared with the TAG field points toa cache line points to a byte/work within the block The cache controller compares the address field to the left with the TAG field of the (single) cache line defined by the L bits. Cache Memory 17 Direct Mapped Caches Problem: Some lines may be often requested by different blocks, while other lines are rarely requested, which implies in a non-optimal use of the cache capacity. Cache Memory 18 9

10 Set Associative Caches Basic Idea: Instead of assigning each main memory block to a single cache line, assign each block to a set (associative) of cache lines. main memory blocks f associative sets cache lines Cache Memory 19 Set Associative Caches Basic Idea: A block may be loaded into any cache line of the associative set it is assigned to. main memory blocks f associative sets cache lines Cache Memory 20 10

11 Set Associative Caches Architecture v TAG VALUE v TAG VALUE v TAG VALUE set S sets 2 S -1 line 0 line 1 line 2 c -1 Cache Memory 21 Set Associative Caches Operation: assume that there are 2 S sets ADDRESS GENERATED BY THE CPU a-1 b+s b+s-1 b b-1 0 S b to be compared with the TAG field points toa set points to a byte/work within the block The cache controller compares the address field to the left with the TAG field of all rows of the associative set defined by the S bits (associative search). Cache Memory 22 11

12 Set Associative Caches Fully Associative Caches: Are set associative caches with a single associative set. Direct Mapped Caches: Are set associative caches whose associative sets contain each a single line. Cache Memory 23 Set Associative Caches Set size: Keeping the overal cache capacity constant and changing the number of lines/set. missratio (h) Above 4 lines/set the missratio does not change significativelly Lines/set Direct mapped eigth-way Fully associative two-way tour-way Cache Memory 24 12

13 Set Associative Caches Hit ratio k direct 2-way 4-way 8-way 16-way 2k 4k 8k 16k Cache size(bytes) 32k 64k 128k 256k 512k 1M Cache Memory 25 Replacement Policy Least Recently Used - LRU The least recently used line will be merged from cache to make room for a new main memory block. Pseudo LRU Example: a four-way set associative cache points to the least recently used half bit I 0 =0 =1 =0 bit I =1 1 =0 bit I =1 2 points to the least recently used line in this half points to the least recently used line in this half The least recently used line of the least recently used half is elected to leave the cache. Cache Memory 26 13

14 Replacement Policy Example: A four-way set associative The lines in the set are initially empty LRU only accesses to the set a b c d a e b e f a b c d a e b e a ba cb d a ead bad c d a b c Pseudo LRU only accesses to the set a b c d a e b e f c d d e e e cba cab dab dba dba a b b a a Cache Memory 27 Main Memory Update Policy Write Through All writes are carried out in the cache and in the main memory. The CPU does not halt until the main memory is updated. Problem Lots of traffic specially harmful in multiprocessors 15% of memory references are writes. Cache Memory 28 14

15 Main Memory Update Policy Write Back Each cache line has a bit (dirty) that indicates when set (=1), that te block copy in the cache differ from the main memory. When the block is brought from main memory into the cache, dirty =0 All writes are performed in the cache only and, in this case, dirty=1. The main memory is updated when the block selected for replacement has dirty=1. I/O must access main memory through cache Cache Memory 29 Multilevel Caches High logic density enables caches on chip Faster than bus access Frees bus for other transfers Common to use both on and off chip cache L1 on chip, L2 off chip in static RAM L2 access much faster than DRAM or ROM L2 often uses separate data path L2 may now be on chip Resulting in L3 cache Bus access or now on chip Cache Memory 30 15

16 Multilevel Caches(L1 & L2) a hit is counted in either cache only advantageous if L2 > L1 Cache Memory 31 Unified v Split Caches One cache for data and instructions or two, one for data and one for instructions Advantages of unified cache Higher hit rate Balances load of instruction and data fetch Only one cache to design & implement Advantages of split cache Eliminates cache contention between instruction fetch/decode unit and execution unit Important in pipelining Cache Memory 32 16

17 Exercises Exercise 1 A cache with 64 Kbyte capacity operates with 8 byte blocks and is organized in associative sets having 4 lines each. What is the number of the associative set that may contain a copy of the byte in the main memory address 3B EF 56 H? Cache Memory 33 Exercises Exercise 2 Determine the address of the byte stored in the VALUE field in the byte indicated by the shadowed box. It is a 8-way set associative cache with 256 Kbytes capacity and 8 byte blocks. It is further known that the byte is stored in the associative set number 24H and the TAG field has the value 18H. TAG VALUE Cache Memory 34 17

18 Exercises Exercise 3 How many bits are required to represent all relevant configurations of the eight lines of a single associative set in a cache, which uses the LRU policy? And for the pseudo-lru? Cache Memory 35 Exercises Exercise 4 The physical address space of processor is 4 Gbyte (2 30 byte) memory. Its cache stores up to 1 Mbyte (2 20 byte), operates with 256 byte blocks and is organized in associative sets containing four lines each. How many main memory blocks map into a single associative set? Cache Memory 36 18

19 Simulators De William Stallings Cache Memory 37 MEMÓRIA CACHE FIM Cache Memory 38 19

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)