registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Size: px

Start display at page:

Download "registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp."

Patricia Thompson
6 years ago
Views:

1 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third Fourth Edition: 5.1, 5.2, 5.3

2 13 3 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches hierarchy Memory Rel. speed: 1 registers MEMORY ADDRESS data not in registers data 1-2 on-chip cache is it here? N 2-5 off-chip cache is it here? N main memory: real address space part of virtual addr. sp. is it here? N ,000 disk: rest of virtual addr. sp. files, etc. is it here? N? long-term storage devices get it

3 13 4 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches Memory location CPU DISK MAIN MEMORY L2 L1 INST L1 DATA REGISTERS

4 block: amount of information transferred (in bytes or words) hit: the block is present miss: the block is not present miss penalty: time (in clock cycles) to fetch a block from the lower level 13 5 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches Basic concepts data locality: temporal locality spatial locality hit rate: fraction of times a requested block is found hit time: time to fetch a block that is present miss rate: fraction of times a requested block is not present rate = 100% - hit rate) (miss

5 Cache mappings fully-associative cache 13 6 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches Size of cache < size of main memory CPU cache main memory direct-mapped cache set-associative cache

6 13 7 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches Direct-mapped cache cache BLOCK address (cache INDEX) Each memory block is mapped to exactly one block in the cache. memory BLOCK address

7 The cache index 13 8 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches Many different memory blocks map to a single cache block which block? Use the memory adress' lower bits to index the cache. cache index = (memory block address) % (cache size in blocks) 1: 32-block main memory, 8-block cache (we consider block Example addresses). The memory block address is... bits. To index the cache we need... bits the lower... bits of the memory block address. The memory block 0 maps to the cache location... The memory block 110 maps to the cache location...

8 13 9 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches 2: 128-byte main memory, 8-block cache, 4-byte (= 1 word) cache Example size (we consider byte addresses). block... -bit memory byte address... -bit cache (block) index... bits to address the byte within the block The memory addresses 000, 0, 010, and 011 all map memory byte address cache index cache block (4 bytes) to the same cache block...

9 13 10 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches The Tag field Many different memory blocks map to a single cache block how do we know which memory block is in the cache block? To each cache line we add a tag that contains the remaining part (upper bits) of the address 3: 32-block main memory, 8-block cache. Memory blocks 000, Example 100, and 110 all map to the same cache block... 0, tag memory block address cache index cache block (4 bytes)

10 The Valid bit CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches The CPU performs many different tasks, and the memory contents change how do we know if a cache block is good"? To each cache line we add a valid bit to indicate whether the content of the block corresponds to what the CPU is actually looking for. For instance, after a reset, all valid bits are reset - no block contains useful information.

11 13 12 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches memory byte address tag index CPU 1-word block, direct-mapped cache index V TAG DATA DATA HIT mem. address [b] cache line size [b] bits for index cache data size [B] bits for tag total cache size [b]

12 13 13 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches Cache trace with block address 32-block memory, 8-block cache, the memory address is a block address Address dec bin Hit/Miss INDEX V TAG DATA

13 13 14 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Direct-mapped caches Cache trace with byte address 256-byte memory, 32-byte cache, 4-byte cache block, memory byte addressing Address Hit/Miss dec bin INDEX V TAG DATA

14 read misses write misses CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache reads and writes Cache reads and writes In our CPU, Instruction Memory and Data Memory are actually cache memories. On a memory access, hits are straightforward to handle. Misses are more complex:

15 stall the CPU restart the instruction CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache reads and writes Read misses For instructions: stall the CPU send the original PC to memory (current PC-4) and wait write the cache entry (including tag and valid bit) restart the instruction For data: send the address to memory and wait write the cache entry (including tag and valid bit)

16 13 17 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache reads and writes Write misses What is a write miss? In a 1-word-block write-through cache, writes always hit. We do not need to know what was in the memory location, since the CPU is overwriting it anyway. Problem: inconsistency Solutions: write-through write-back

17 13 18 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache reads and writes Write-through Every time, write both the cache and the memory: write buffer CPU CACHE MEMORY simple slow (write buffer)

18 13 19 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache reads and writes Write-back Write only the cache. Write the entire block back into the memory only when R/ W R dec 22 Address bin 110 H/ M H the block needs to be replaced (dirty bit). Cache index 110 V T D DATA CPU CPU W Hit CACHE CACHE write buffer W W W W CPU CPU FLUSH BLOCK CACHE CACHE MEMORY R CPU CACHE CPU CACHE CPU CPU CPU FLUSH BLOCK CACHE CACHE CACHE

19 13 20 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Multi-word caches Using cache blocks larger than one word takes advantage of spatial locality. memory byte address 4-GB memory, 64-KB direct-mapped cache with 4-word data blocks (16-bytes) CPU index word offset data word hit tag V tag cache data block

20 13 21 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Exercise What is the total size in bytes of the cache in the previous slide?

21 13 22 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Hits/misses in a multi-word cache Just like the read misses on a single-word cache, except that the entire Read: is fetched. block We can not just write the word, tag, and valid bit without verifying Write: the block is the actual block we want to write to, since more than one whether memory block maps to the same cache block. We need to compare the tag for writes too. the tags match: we can write the word the tags do not match: we need to read the block from memory and then write the word

13 23 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Cache block size and miss rate up to a certain point, cache miss rate decreases with increasing block size after a

22 13 23 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Cache block size and miss rate up to a certain point, cache miss rate decreases with increasing block size after a certain point, cache miss rate increases with increasing block size spatial locality decreases with block size the miss penalty increases with block size (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED)

23 c) time for transferring each word CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Miss penalty (= additional clock cycles) Has three components: a) sending the address to memory b) latency to initiate the memory transfer Example on the textbook: a) = 1 clock cycle, b) = 15 clock cycles, c) = 1 clock cycle With a 4-word block cache and a 1-word memory bus, the the miss penalty on a standard DRAM is: On an SDRAM or with an interleaved memory organization is:

24 Static and Dynamic RAMs CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches

25 13 26 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches DRAM diagram RAS CAS Add WE ADDRESS BUFFER ROW DECODER COLUMN DECODER SENSE AMPS and I/O

26 13 27 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Synchronous DRAM timing CK RAS CAS trcd WE = READ Add DQS RA CA tcl DQ DO[CA] DO[CA+1] DO[CA+2] if BURST mode ONE ROW ACCESS

27 13 28 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Double Data Rate (DDR) DRAMs timing CK RAS CAS trcd WE = READ Add DQS RA CA tcl DQ D0 D1 D2 D3 if BURST mode

28 tcl: CAS Latency, time between the read command and data output valid trcd: RAS-to-CAS Delay, minimum time between RAS and CAS trp: RAS Precharge time, time the row decoder needs to precharge the row tras: Activate-to-Precharge time, minimum time before trying to change row CMDrate: Command Rate, minimum time between chip select and RAS (activate) CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Commercial SDRAM parameters T1 tcl trcd trp tras CMDrate All the above numbers are expressed in clock cycles.

29 13 30 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches Commercial SDRAM parameters diagram CK COMMAND ACTIVATE READ PRECHARGE ACTIVATE (set RAS) (set CAS) (set RAS) trcd tras trp tcl DQ

30 Memory bandwidth CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Multi-word caches If a single transfer to/from memory can transfer multiple words at a time, the miss penalty decreases CPU CPU CPU bus bus bus cache MUX/DEMUX bus bus bus MUX/DEMUX bus bus bus bus cache bus cache bus MEM MEM MEM Miss penalty for a 2-word block cache with a 2-word memory bus: Miss penalty for a 4-word block cache with a 4-word memory bus:

31 Cache associativity fully associative CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity What if the CPU keeps accessing two (or more) variables that map to the same location in a direct-mapped cache? More sophisticated strategy: n-way set-associative caches. direct-mapped ( 1-way set associative") n-way set associative

32 Cache associativity Two-way set associative cache cache SET index CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas

33 cache SET index 0 1 Cache associativity Four-way set associative cache CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas

34 13 35 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Eight-way set associative cache fully-associative cache any block can go anywhere

35 Cache associativity Direct-mapped cache 1-way set associative cache cache line index CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas

36 reduces the miss rate CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Pros and cons of increasing cache associativity Advantages: Disadvantages: requires more hardware requires a replacement policy Block replacement policy: Least Recently Used (LRU) or random implemented in hardware

37 13 38 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Exercise 1 For an 8-line, write-through, 2-way set-associative cache with LRU replacement and 1-word data block, trace the following sequence of addresses: block address dec binary H/M

38 a) how many lines are in the cache? c) what is the total cache size in bits? d) diagram a cache lookup CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Exercise 2 A computer system has 32-bit addresses and a 64-KB direct-mapped, write-back cache with 8-byte data block lines. how many bits total (including cache management bits) are in each line, b) minimum?

39 a) # of lines: b) # of bits per line c) total cache size d) cache lookup: CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Solution:

40 a) how many sets are in the cache? b) how many lines are in the cache? d) what is the total cache size in bits? e) diagram a cache lookup CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Exercise 3 Suppose the 64-KB cache in Exercise 2 was instead 2-way set associative with 8-byte lines. how many bits total (including cache management bits) are in each line, c) minimum?

41 a) # of sets b) # of lines c) # of bits per line d) total cache size e) cache lookup: CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Cache associativity Solution:

42 13 43 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches and performance Caches and performance 4: a computer has a CPI of 1.0 when there are no cache misses, Exercise a 100 MHz clock. Each instruction has on average 0.4 data memory and references. For each cache miss the instruction takes an additional 9 clock cycles to complete. what are the CPI 100% and the MIPS 100% rating with a cache and an 100% hit rate? unrealistic what are the CPI NOCACHE and the MIPS NOCACHE rating without a cache? what are the CPI and themips rating with a cache and a 90% rate on instructions and an 85% hit rate on data? hit

43 CPI NOCACHE = MIPS NOCACHE = CPI = MIPS = CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Caches and performance Solution: CPI 100% = MIPS 100% =

44 13 45 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas Homework Recommended exercises Third Edition: Ex , , 7.9, 7.12, 7.16, , Ex 7.32, 7.33, 7.35

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3