Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Size: px

Start display at page:

Download "Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH"

Curtis Golden
6 years ago
Views:

1 PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

2 Caches Slides courtesy of Chris evison With modifications by Alyce Brady and athan Sprague

3 PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

4 Memory Main Memory (DRAM) much slower than processors Gap has widened over the past 20+ years, continues to widen Memory comes in different technologies: fastest is also the most expensive Slowest Least expensive Mag. Disk: 5-20 million ns $0.50-$2.00/GB Fastest Most expensive SRAM:.5-5 ns $4000-$10000/GB

5 Using Memory Temporal locality - likely to reuse data soon WH? Spatial locality - likely to access data close by WH?

6 Using Memory Temporal locality - likely to reuse data soon (loops) Spatial locality - likely to access data close by (sequential nature of program instructions, data access) Use small amounts of expensive memory, larger amounts of less expensive memory Memory hierarchy used to hide memory access latency

7 Memory Hierarchy First Level Cache Second Level Cache Main Memory Secondary Memory (Disk)

8 Using the Memory Hierarchy Check if data is in fast memory (hit); if not (miss), fetch a block from slower memory Block may be a single word or several words Fetch time is miss penalty Hit rate is fraction of memory accesses that are hits; miss rate = 1 - hit rate If locality is high, hit rate gets closer to 1 Effect: memory is almost as fast as fastest level, as big as biggest (slowest) memory

9 Cache Issues How is cache organized and addressed? When cache is written to, how is memory image updated? When cache is full and new items need to be put in the cache -- what is removed and replaced?

10 Direct Mapped: Any given memory address can only be found in one cache address (although many memory addresses will map to the same cache address)

11 Cache Example (Direct Mapped) 256 Byte Cache (64 4-byte words) Each Cache line or block holds one word (4 bytes) Byte in cache is addressed by lowest two bits of address Cache line is addressed by next six bits in address Each Cache line has a tag matching the high 24 bits of the memory address

12 line address Byte Address tag

13 Address line address Byte Address tag

14 Address line address Byte Address tag

15 Cache Access 1. Find Cache line address (bits 2-7) 2. Compare tag to high 24 bits if matched, cache hit» find Byte address, read or write item if not matched, cache miss, go to memory» for a read: retrieve item and write to cache, then use» for a write: write to memory (or to cache, see below) 3. Direct mapped cache -- every address can only go to one cache line! 4. What happens when cache is written to?

16 Write Policy Write Through Write to memory and to cache Time to write to memory could delay instruction» write buffer can hide this latency Write Back (also called Copy Back) Write only to cache» mark cache line as dirty, using an additional bit When cache line is replaced, if dirty, then write back to memory

17 Line address valid Byte Address tag Desired Address 88 = = = = = = = =

18 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = = = = = = =

19 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = = = = = =

20 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = = = = =

21 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = HIT! 64 = = = =

22 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = HIT! 64 = MISS fetch 12 = = =

23 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = HIT! 64 = MISS fetch 12 = MISS fetch 64 = =

24 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = HIT! 64 = MISS fetch 12 = MISS fetch 64 = HIT! 72 =

25 Line address valid Byte Address tag Desired Address 88 = MISS fetch 104 = MISS fetch 88 = HIT! 104 = HIT! 64 = MISS fetch 12 = MISS fetch 64 = HIT! 72 = MISS fetch (wrong tag)

26 Accelerating Memory Access Wider bus Wider memory access

27 Accelerating Memory Access How can Cache take advantage of faster memory access? Store more than one word at a time on each line in the cache Any cache miss brings the whole line containing the item into the cache Takes advantage of spatial locality next item needed is likely to be at the next address

28 Cache with multi-wordline 256 Byte cache byte words Each block (line) contains four words (16 bytes) 2 bits to address byte in word 2 bits to address word in line Cache contains sixteen four-word blocks 4 bits to address cache block (line) Each cache line has tag field for upper 24 bits of address

29 Address line address tag Word Address Byte Address

30 Address line address tag Word Address Byte Address Hit To Control MUX Data

31 PC Instruction Cache 4 M U X Registers Sign Ext M U X Sh L 2 Data Cache M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK AS DU DB A D D A L U Mem Addr Mem Data Hit Mem Addr Mem Data Hit

32 Instruction Cache Hit / Miss Hit or Miss: Instruction is fetched from Cache and placed in Pipeline buffer register PC is latched into Memory Address Register Hit: Control sees hit, execution continues Mem Addr unused

33 Instruction Cache Hit / Miss Miss Control sees miss, execution stalls PC reset to PC - 4 Values fetched from registers are unused Memory Read cycle started, using Mem Addr Memory Read completes Value stored in cache, new tag written Instruction execution restarts, cache hit

34 Speeding Up Cache execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty We can improve performance by Reducing miss rate Reducing miss penalty More flexible placement of blocks Miss rate Caching the Cache Miss penalty

35 Associative Cache Fully Associative Blocks can be stored in any cache line. All tags must be checked on every access. Set Associative Two or more (power of 2) lines for each address More than one item with same cache line address can be in cache Tags for all lines in set must be checked, any match means a hit, if none match, a miss

36 Two-way set associative cache line address tag Word Address Byte Address

37 Address Valid tag Word Address Byte Address Hit MUX Data

38 Replacement Policy ecessary to select a line to replace when using associative caches Least Recently Used (LRU) is one possibility

39 Multi-level Cache Put two (or more) levels of cache between CPU and memory Focus of top level cache is on keeping hit time low Focus of lower level cache is on keeping miss penalty low for top level cache - low miss rate is more important than hit time.

40 Cache Summary - types Direct Mapped Each line in cache takes one address Line size may accommodate several words Set Associative Sets of lines serve the same address eeds replacement policy for which line to purge when set is full More flexible, but more complex

41 Cache Summary Cache Hit Item is found in the cache CPU continues at full speed eed to verify valid and tag match Cache Miss Item must be retrieved from memory Whole Cache line is retrieved CPU stalls for memory access

42 Cache Summary Write Policies Write Through (always write to memory) Write Back (uses dirty bit) Associative Cache Replacement Policy LRU (Least Recently Used) Random

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY