ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

Size: px
Start display at page:

Download "ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I"

Transcription

1 ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL Adapted from Dr. Chen-Huan Chiang (Intel) and Prof. Vishwani D. Agrawal (Auburn University) [Adapted from Computer Organization and Design, Patterson & Hennessy, 2014] 1/8/2017 ELEC / Lecture 6 1

2 Designing a Computer Control Input Datapath Central Processing Unit (CPU) or Processor Memory Output 1/8/2017 ELEC / Lecture 6 2

3 Types of Computer Memories From the cover of: A. S. Tanenbaum, Structured Computer Organization, Fifth Edition, Upper Saddle River, New Jersey: Pearson Prentice Hall, /8/2017 ELEC / Lecture 6 3

4 Random Access Memory (RAM) Address bits Address decoder Memory cell array Read/write circuits Data bits 1/8/2017 ELEC / Lecture 6 4

5 Six-Transistor SRAM Cell bit bit Word line Bit line Bit line 1/8/2017 ELEC / Lecture 6 5

6 Dynamic RAM (DRAM) Cell Bit line Word line Single-transistor DRAM cell Robert Dennard s 1967 invention 1/8/2017 ELEC / Lecture 6 6

7 Electronic Memory Devices Memory technology SRAM semiconductor memory DRAM semiconductor memory Flash semiconductor memory Magnetic disk Typical access time $ per GiB in ns $500 $ ns $10 $20 5,000 50,000 ns $0.75 $1.00 5,000,000 20,000,000 ns $0.05 $0.10 For more on memories: Semiconductor Memories: A Handbook of Design, Manufacture and Application, by Betty Prince, Wiley Emerging Memories: Technologies and Trends, by Betty Prince, Springer /8/2017 ELEC / Lecture 6 7

8 Year introduced Chip size DRAM Evolution $ per GiB Total access time to a new row/column Average column access time to existing row Kibibit $1,500, ns 150 ns Kibibit $500, ns 100 ns Mebibit $200, ns 40 ns Mebibit $50, ns 40 ns Mebibit $15, ns 30 ns Mebibit $10, ns 12 ns Mebibit $4, ns 10 ns Mebibit $1, ns 7 ns Mebibit $ ns 5 ns Gibibit $50 45 ns 1.25 ns Gibibit $30 40 ns 1 ns Gibibit $1 35 ns 0.8 ns 1/8/2017 ELEC / Lecture 6 8

9 Processor-Memory Performance Gap Performance Moore s Law Year µproc 55%/year (2X/1.5yr) Processor-Memory Performance Gap (grows 50%/year) DRAM 7%/year (2X/10yrs) 1/8/2017 ELEC / Lecture 6 9

10 Memory Performance Impact on Performance Suppose a processor executes at ideal CPI = % arith/logic, 30% ld/st, 20% control and that 10% of data memory operations miss with a 50 cycle miss penalty InstrMiss, 0.5 DataMiss, 1.5 CPI = ideal CPI + average stalls per instruction = 1.1(cycle) + ( 0.30 (datamemops/instr) x 0.10 (miss/datamemop) x 50 (cycle/miss) ) = 1.1 cycle cycle = /2.6=58% of the time the processor is stalled waiting for memory! A 1% instruction miss rate would add an additional 0.5 to the CPI 1/8/2017 ELEC / Lecture 6 10 Ideal CPI, 1.1

11 The Memory Hierarchy Goal Fact: Large memories are slow and fast memories are small How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)? With hierarchy With parallelism 1/8/2017 ELEC / Lecture 6 11

12 A Typical Memory Hierarchy By taking advantage of the principle of locality Can present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache Cache Second Level Cache (SRAM) Main Memory (DRAM) Secondary Memory (Disk) Speed (%cycles): ½ s 1 s 10 s 100 s 1,000 s Size (bytes): 100 s K s 10K s M s G s to T s Cost: highest lowest 1/8/2017 ELEC / Lecture 6 12

13 Characteristics of the Memory Hierarchy Increasing distance from the processor in access time Processor L1$ L2$ Main Memory 4-8 bytes (word) 8-32 bytes (block) 1 to 4 blocks Inclusive what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM Secondary Memory 1,024+ bytes (disk sector = page) (Relative) size of the memory at each level 1/8/2017 ELEC / Lecture 6 13

14 Introduction to Cache ($) 1/8/2017 ELEC / Lecture 6 14

15 The Locality Principle A program tends to access data that form a physical cluster in the memory multiple accesses may be made within the same block. Physical localities are temporal and may shift over longer periods of time data not used for some time is less likely to be used in the future. Upon miss, the least recently used (LRU) block can be overwritten by a new block. P. J. Denning, The Locality Principle, Communications of the ACM, vol. 48, no. 7, pp , July /8/2017 ELEC / Lecture 6 15

16 The Memory Hierarchy: Why Does it Work? Temporal Locality (Locality in Time): Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): Move blocks consisting of contiguous words to the upper levels To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y 1/8/2017 ELEC / Lecture 6 16

17 Data Locality, Cache, Blocks Increase block size to match locality size Cache Memory Increase cache size to include most blocks Block 1 Block 2 Data needed by a program 1/8/2017 ELEC / Lecture 6 17

18 The Memory Hierarchy: Terminology Hit: data is in some block in the upper level (Blk X) Hit Rate: the fraction of memory accesses found in the upper level Hit Time: Time to access the upper level which consists of Time to determine hit/miss + RAM access time To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y Miss: data is not in the upper level so needs to be retrieve from a block in the lower level (Blk Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty 1/8/2017 ELEC / Lecture 6 18

19 How is the Hierarchy Managed? Registers Memory By compiler (programmer?) Cache Main memory By the cache controller hardware Main memory Disks By the operating system (virtual memory) Virtual to physical address mapping assisted by the hardware (Translation Lookaside Buffer) By the programmer (files) 1/8/2017 ELEC / Lecture 6 19

20 Designs of Cache Two questions to answer (in terms of hardware): Q1: If the data is in the cache? By comparing tag Q2: How and where to find the data in the cache? By address mapping 1/8/2017 ELEC / Lecture 6 20

21 Direct mapped Direct Mapped Cache For each item of data at the lower level, there is exactly one location in the cache where it might be so lots of items at the lower level must share locations in the upper level Address mapping: (block address) modulo (# of blocks in the cache) Let s first consider block sizes of one word in the next example 1/8/2017 ELEC / Lecture 6 21

22 32-word word-addressable memory tag index (local address) Direct-Mapped Cache Cache of 8 blocks Block size = 1 word cache address: tag 1/8/ Main memory index memory address

23 32-word word-addressable memory tag index (local address) Direct-Mapped Cache Main memory Cache of 4 blocks Block size = 2 word cache address: tag index block offset block offset memory address 1/8/

24 32-word byte-addressable memory tag index Direct-Mapped Cache (Byte Address) Cache of 8 blocks Block size = 1 word Main memory cache address: tag index memory address byte offset 1/8/

25 Example: Direct Mapped Cache Consider the main memory word reference string Start with an empty cache - all blocks initially marked as not valid miss 1 miss 2 miss 3 miss 00 Mem(0) 00 Mem(0) 00 Mem(0) 00 Mem(0) 00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(2) 00 Mem(2) 00 Mem(3) 01 4 miss 3 hit 4 hit 15 miss 4 00 Mem(0) 01 Mem(4) 01 Mem(4) 01 Mem(4) 00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(3) 00 Mem(3) 00 Mem(3) Mem(3) 15 8 requests, 6 misses 1/8/2017 ELEC / Lecture 6 25

26 Finding a Word in Cache Memory address 32 words byte-address Tag b6 b5 b4 b3 b2 b1 b0 Index Valid 2-bit Index bit Tag Data byte offset Cache size 8 words Block size = 1 word 1 = hit 0 = miss = Data 1/8/2017 ELEC / Lecture 6 26

27 How Many Bits Cache Has? Consider a main memory: 32 words; byte address is 7 bits wide: b6 b5 b4 b3 b2 b1 b0 Each word is 32 bits wide Assume that cache block size is 1 word (32 bits data) and it contains 8 blocks. Cache requires, for each word: 2 bit tag, and one valid bit Total storage needed in cache = #blocks in cache (data bits/block + tag bits + valid bit) = 8 (32+2+1) = 280 bits Physical storage/data storage = 280/256 = /8/2017 ELEC / Lecture 6 27

28 A More Realistic Cache Consider 4 GB, byte-addressable main memory: 1Gwords; byte address is 32 bits wide: b31 b16 b15 b2 b1 b0 Each word is 32 bits wide Assume that cache block size is 1 word (32 bits data) and it contains 64 KB data, or 16K words, i.e., 16K blocks. Number of cache index bits = 14, because 16K = 2 14 Tag size = 32 byte offset #index bits = = 16 bits Cache requires, for each word: 16 bit tag, and one valid bit Total storage needed in cache = #blocks in cache (data bits/block + tag size + valid bits) = 2 14 ( ) = = bits = 784 Kb = 98 KB Physical storage/data storage = 98/64 = 1.53 But, need to increase the block size to match the size of locality. 1/8/2017 ELEC / Lecture 6 28

29 Block index Valid bit Data Organization in Cache Address mapping overhead bit tag means memory is 16 times larger than cache Tag Block offset word block Memory address of word in cache /8/2017 ELEC / Lecture 6 29

30 Cache Bits for 4-Word Block Consider 4 GB, byte-addressable main memory: 1Gwords; byte address is 32 bits wide: b31 b16 b15 b2 b1 b0 Each word is 32 bits wide Assume that cache block size is 4 words (128 bits data) and it contains 64 KB data, or 16K words, i.e., 4K blocks. Number of cache index bits = 12, because 4K = 2 12 Tag size = 32 byte offset #block offset bits #index bits = = 16 bits Cache requires, for each word: 16 bit tag, and one valid bit Total storage needed in cache = #blocks in cache (data bits/block + tag size + valid bit) = 2 12 ( ) = = bits =580 Kb = 72.5 KB Physical storage/data storage = 72.5/64 = /8/2017 ELEC / Lecture 6 30

31 4K Indexes Using Larger Cache Block (4 Words) Memory address 4GB = 1G words byte-address b31 b15 b14 b4 b3 b2 b1 b0 16 bit Tag 12 bit Index Val. 16-bit Data Index bit Tag (4 words=128 bits) byte offset 2 bit block offset Cache size 16K words Block size = 4 word = hit 0 = miss = M U X Data 1/8/2017 ELEC / Lecture 6 31

32 Limitations of Direct Mapping Consider the main memory word reference string Start with an empty cache - all blocks initially marked as not valid miss 4 miss 0 miss 4 miss Mem(0) 00 Mem(0) 0001 Mem(4) Mem(0) 4 0 miss 4 miss 0 miss 4 miss Mem(4) 00 Mem(0) 0001 Mem(4) 0 00 Mem(0) 8 requests, 8 misses Ping pong effect due to conflict misses - two memory locations that map into the same cache block

33 Set-Associative Cache Consider the main memory word reference string Start with an empty cache - all blocks initially marked as not valid miss 4 miss 0 hit 4 hit 000 Mem(0) 000 Mem(0) 000 Mem(0) 000 Mem(0) 010 Mem(4) 010 Mem(4) 010 Mem(4) 8 requests, 2 misses Solves the ping pong effect in a direct mapped cache Recall that ping-pong effect is due to conflict misses now two memory locations that map into the same cache set can co-exist in 2-way set associate cache 1/8/2017 ELEC / Lecture 6 33

34 32-word word-addressable memory tags index Two-Way Set-Associative Cache This block is needed Cache of 8 blocks Block size = 1 word Main memory cache address: tag index LRU block memory address byte offset 1/8/2017 ELEC / Lecture

35 32-word word-addressable memory 4. miss tags index Miss Rate: Two-Way Set-Associative Cache Memory references to addresses: 0, 8, 0, 6, 8, 16 Cache of 8 blocks 1. miss Block size = 1 word miss Main memory 2. miss 3. hit cache address: tag index 5. hit xxx xxx 001 xxx xxx xxx memory address byte offset 1/8/2017 ELEC / Lecture

36 32-word word-addressable memory tag index Miss Rate: Direct-Mapped Cache Memory references to addresses: 0, 8, 0, 6, 8, miss 4. miss 5. miss 6. miss Main memory 1. miss 2.miss Cache of 8 blocks Block size = 1 word cache address: tag index 00 / 01 / 00 / 10 xx xx xx xx xx 00 xx memory address byte offset 1/8/2017 ELEC / Lecture 6 36

37 2 to 1 MUX Two-Way Set-Associative Cache Memory address 32 words byte-address 3 bit tag b6 b5 b4 b3 b2 b1 b0 2 bit index byte offset Cache size 8 words Block size = 1 word V tag V tag V tag V tag data data data data V tag V tag V tag V tag data data data data = = 1 = hit 0 = miss Data 1/8/2017 ELEC / Lecture 6 37

38 32-word word-addressable memory tag Fully-Associative Cache (8-Way Set Associative) This block is needed Cache of 8 blocks Block size = 1 word Main memory cache address: tag LRU block memory address byte offset 1/8/

39 32-word word-addressable memory tag Miss Rate: Fully-Associative Cache Memory references to addresses: 0, 8, 0, 6, 8, miss 2. miss 6. miss Main memory 1. miss 3. hit Cache of 8 blocks Block size = 1 word 5. hit cache address: tag xxxxx xxxxx xxxxx xxxxx memory address byte offset 1/8/2017 ELEC / Lecture 6 39

40 Eight-Way Set-Associative Cache Memory address 32 words byte-address 5 bit Tag b6 b5 b4 b3 b2 b1 b0 byte offset Cache size 8 words Block size = 1 word V tag data V tag data V tag data V tag data V tag data V tag data V tag data V tag data = = = = = = = = 1 = hit 0 = miss 8 to 1 multiplexer Data 1/8/2017 ELEC / Lecture 6 40

41 Handling Cache Hits Read hits (Instruction-cache and Data-cache) (I$ and D$) this is what we want! Write hits (D$ only), two possible solutions Allow cache and memory to be inconsistent Write the data only into the cache block (write-back) Write-back the cache contents to the next level in the memory hierarchy when that cache block is evicted Need a dirty bit for each data cache block to tell if it needs to be written back to memory when it is evicted Require the cache and memory to be consistent Always write the data into both the cache block and the next level in the memory hierarchy (write-through) So don t need a dirty bit But it is very slow to write to the next level in the memory hierarchy Use a write buffer,» Only if the write buffer is full, stall required for a writethrough 1/8/

42 Write Buffer for Write-Through Caching Processor Cache DRAM write buffer Write buffer between the cache and main memory Processor: writes data into the cache and the write buffer Memory controller: writes contents of the write buffer to memory The write buffer is just a FIFO Typical number of entries: 4 Memory system designer s nightmare The rate at which writes are generated can be more than the rate at which the memory can accept them. This can happen when the writes occur in bursts. One solution is to use a write-back cache; Another is to use an L2 cache 1/8/2017 ELEC / Lecture 6 42

43 Handling Cache Misses Read misses (I$ and D$) Stall the entire pipeline, Fetch the block from the next level in the memory hierarchy Install it in the cache Send the requested word to the processor Then resume the pipeline 1/8/2017 ELEC / Lecture 6 43

44 Handling Cache Misses (cont.) Write misses (D$ only) 1. Stall the pipeline, fetch the block from next level in the memory hierarchy, install it in the cache (which may involve having to evict a dirty block if using a write-back cache), write the word from the processor to the cache, then let the pipeline resume Either of the below two write miss policy could be used with write through or write back 2. Write allocate A cache block is allocated on a write miss just write the word into the cache updating both the tag and data, no need to check for cache hit, no need to stall normally used in write-back caches, hoping that the subsequent writes to the block will be captured by the cache 3. No-write allocate skip the cache write and the block is only modified in the lower level memory just write the word to the write buffer (and eventually to the next memory level), no need to stall if the write buffer isn t full; must invalidate the cache block since it will be inconsistent (now holding stale data) normally used in write-through caches with a write buffer because subsequent writes to the block must still write through to the lower level memory 1/8/

45 Cache Misses Compulsory (cold start or process migration, first reference): First access to a block, cold fact of life, not a whole lot you can do about it If you are going to run millions of instruction, compulsory misses are insignificant Conflict (collision): Multiple memory locations mapped to the same cache location Solution 1: increase cache size Solution 2: increase associativity Capacity: Fully associative cache allows a memory block to be mapped to any cache block; In a direct mapped cache, a memory block maps to exactly one cache block Cache cannot contain all blocks accessed by the program Solution: increase cache size Ambiguity between Conflict and Capacity misses 1/8/2017 ELEC / Lecture 6 45

46 Miss Rate vs Block Size vs Cache Size Miss rate (%) 10 5 Cache size 8 KB 16 KB 64 KB 256 KB Block size (bytes) Temporal locality compromised. Detail next slide Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held in the same size cache is smaller (increasing capacity misses) 1/8/2017 ELEC / Lecture 6 46

47 Block Size Tradeoff Larger block sizes take advantage of spatial locality but Miss Rate If the block size is too big relative to the cache size, the miss rate will go up because # of blocks is fewer and compromise temporal locality Larger block size means larger miss penalty Latency to first word in block + transfer time for remaining words Exploits Spatial Locality Fewer blocks compromises Temporal Locality Miss Penalty Average Access Time Increased Miss Penalty & Miss Rate Block Size Block Size In general, Average Memory Access Time Reduced compulsary miss Increased conflict miss Block Size = Hit Time + Miss Penalty x Miss Rate

48 Block Size Tradeoff (cont.) Smaller blocks result reduced compulsory misses, but the increasing capacity misses dominate Smaller blocks result in more blocks which exploit spatial locality Larger blocks incurs increasing conflict misses Larger blocks result in fewer blocks which compromise temporal locality 1/8/2017 ELEC / Lecture 6 48

49 Multiword Block Considerations Read misses (I$ and D$) Processed the same as for single word blocks Except that a miss returns the entire block from memory Miss penalty grows as block size grows Requested word first requested word is transferred from the memory to the cache (and datapath) first Early restart datapath resumes execution as soon as the requested word of the block is returned Nonblocking cache allows the datapath to continue to access the cache while the cache is handling an earlier miss Write misses (D$) Can t use write allocate Recall: Write allocate just write the word into the cache updating both the tag and data, no need to check for cache hit, no need to stall Otherwise will end up with a garbled block in the cache e.g., for 4 word blocks, a new tag and one word of data from the new block, and three words of data from the old block so must fetch the block from memory first and pay the stall time

50 Cache Summary The Principle of Locality: Program likely to access a relatively small portion of the address space at any instant of time Temporal Locality: Locality in Time Spatial Locality: Locality in Space Three major categories of cache misses: Compulsory misses: sad facts of life. Example: cold start misses Conflict misses: increase cache size and/or associativity Nightmare Scenario: ping pong effect! Capacity misses: increase cache size Cache design space Total size, block size, associativity (replacement policy) Write-hit policy (write-through, write-back) Write-miss policy (write allocate, no-write allocate (i.e., write buffers)) 1/8/2017 ELEC / Lecture 6 50

51 Basic Cache Design I-cache Example Organized into blocks or lines Block Contents tag - extra bits to identify block (part of block address) data - data or instruction words contiguous memory locations Our example: One-word (4 byte) block size 30-bit tag Two blocks in cache E.g., address 0x = 0b LSBs for byte offset Rest bits are tag bits Cache b0 b1 CPU tag CPU0 data 0 tag CPU1 data 1 Main Memory 0x x x x C 0x /8/2017 ELEC / Lecture 6 51

52 I-Cache Example (2) Assume: r1==0, r2==1, r4==2 1 cycle for cache access 4 cycles for main. mem. access 1 cycle for instr. execution At cycle 1 - PC=0x00 Fetch instruction from memory look in cache (1 cycle) MISS - fetch from main mem (4 cycle penalty) M I S S Cache b0 b1 CPU CPU (empty) (empty) Main Memory 0x x x x C CPU (empty) (empty) L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 0x /8/2017 ELEC / Lecture 6 52

53 I-Cache Example (3) Cycle Address Op/Instr. r1 1-5 FETCH 0x 000 At cycle 6 6 0x 0 add r1,r1,r2 1 Execute instr. add r1,r1,r2 Cache b0 b1 0x 0 CPU CPU (empty) (empty) CPU L: add (empty) r1,r1,r2 (empty) Main Memory 0x x x x C L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 0x /8/2017 ELEC / Lecture 6 53

54 I-Cache Example (4) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x 4 At cycle 6 - PC=0x04 Fetch instruction from memory look in cache (1cycle) MISS - fetch from main mem (4 cycle penalty) M I S S Cache b0 b1 0x 0 CPU CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: add (empty) r1,r1,r2 (empty) L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 54

55 I-Cache Example (5) Cycle Address Op/Instr. r1 1-5 FETCH 0x x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 At cycle 11 Execute instr. bne r4,r1,l Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) Main Memory 0x x x x C CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 0x /8/2017 ELEC / Lecture 6 55

56 I-Cache Example (6) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 11 FETCH 0x 0 1 H I T Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l At cycle 11 - PC=0x00 Fetch instruction from memory HIT - instruction in cache Main Memory 0x x x x C 0x L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 56

57 I-Cache Example (7) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x 0 1 Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l 12 add r1,r1,2 2 At cycle 12 Execute add r1, r1, 2 Main Memory 0x x x x C L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 0x /8/2017 ELEC / Lecture 6 57

58 I-Cache Example (8) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x 0 1 H I T Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l 12 add r1,r1, FETCH 0x04 At cycle 12 - PC=0x04 Fetch instruction from memory HIT - instruction in cache Main Memory 0x x x x C 0x L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 58

59 I-Cache Example (9) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x04 13 bne r4, r1, L At cycle 13 Execute instr. bne r4, r1, L Branch not taken Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 59

60 I-Cache Example (10) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x04 13 bne r4, r1, L 13 FETCH 0x08 At cycle 13 - PC=0x08 Fetch Instruction from Memory MISS - not in cache M I S S Cache b0 b1 0x 0 CPU 0x 1 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: add (empty) r1,r1,r2 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 60

61 I-Cache Example (11) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x04 13 bne r4, r1, L FETCH 0x08 At cycle 17 - PC=0x08 Put instruction into cache Replace existing instruction Cache b0 b1 0x 0 0x 2 CPU 0x 1 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: sub add (empty) r1,r1,r1 r1,r1,r2 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 61

62 I-Cache Example (12) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x bne r4, r1, L FETCH 0x sub r1, r1, r1 0 At cycle 18 Execute sub r1, r1, r1 Cache b0 b1 0x 2 0x 1 CPU (empty) Main Memory 0x x x x C 0x CPU sub r1,r1,r1 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 62

63 I-Cache Example (13) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r x 4 FETCH 0x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x bne r4, r1, L FETCH 0x sub r1, r1, r FETCH 0x0C At cycle 18 Fetch instruction from memory MISS - not in cache M I S S Cache b0 b1 0x 2 CPU 0x 1 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: sub add (empty) r1,r1,r1 r1,r1,r2 bne (empty) r4,r1,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 63

64 I-Cache Example (14) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r x 4 FETCH 0x 4 bne r4,r1,l 1 12 FETCH 0x add r1,r1, FETCH 0x bne r4, r1, L FETCH 0x sub r1, r1, r FETCH 0x0C At cycle 22 Put instruction into cache Replace existing instruction Cache b0 b1 0x 0 0x 2 CPU 0x 1 0x 3 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU L: sub add (empty) r1,r1,r1 r1,r1,r2 bne j (empty) L r1,r2,l L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 64

65 I-Cache Example (15) Cycle Address Op/Instr. r1 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r FETCH 0x x 4 bne r3,r1,l 11 FETCH 0x x 8 add r1,r1, FETCH 0x x 4 bne r4,r1,l FETCH 0x x 8 sub r1,r1,r FETCH 0x..C 23 0x 8 j L At cycle 23 Execute j L Cache b0 b1 0x 2 CPU 0x 3 CPU (empty) (empty) Main Memory 0x x x x C 0x CPU sub (empty) r1,r1,r1 j (empty) L L: add r1,r1,r2 bne r4,r1,l sub r1,r1,r1 L: j L 1/8/2017 ELEC / Lecture 6 65

66 Compare No-cache vs. Cache NO CACHE Cycle Address Op/Instr. 1-5 FETCH 0x 0 6 0x 0 add r1,r1,r 6-10 FETCH 0x x 4 bne r3,r1,l FETCH 0x x 0 add r1,r1, FETCH 0x x 4 bne r3,r1,l FETCH 0x x 8 sub r1,r1,r FETCH 0x..C 31 0x C j L M M H H M M CACHE Cycle Address Op/Instr x 0 FETCH 0x 0 add r1,r1,r 6-10 FETCH 0x x 4 bne r3,r1,l 11 FETCH 0x x 0 add r1,r1,2 12 FETCH 0x x 4 bne r3,r1,l FETCH 0x x 8 sub r1,r1,r FETCH 0x..C 23 0x C j L 1/8/2017 ELEC / Lecture 6 66

67 Cache Miss and the MIPS Pipeline I -Cache miss PC IR IRex A B invalid IRm IRwb D Cache Miss R T 1/8/2017 ELEC / Lecture 6 67

68 Cache Miss and the MIPS Pipeline Instruction Fetch Compare in Cycle 1 Miss Detected in Cycle 2 Fetch Completes (Pipeline Restarts) IF EX MEM W STALL STALL IF EX MEM W Clock Cycle 1 Clock Cycle 2+N Clock Cycle 3+N Clock Cycle 4+N Clock Cycle 5+N Clock Cycle 6+N N: # cycles for 1/8/2017 accessing main memory 68

69 Cache Miss and the MIPS Pipeline Load Instruction Compare in Cycle 4 Miss Detected in Cycle 5 Load Completes (Pipeline Restarts) IF EX MEM W STALL STALL Clock Cycle 1 IF EX MEM W Clock Cycle 2 Clock Cycle 3 Clock Cycle 4 STALL Clock Cycle 5 STALL Clock Cycle 5+N Clock Cycle 6+N 1/8/2017 ELEC / Lecture 6 69

70 Improving Cache Performance 1/8/2017 ELEC / Lecture 6 70

71 Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Increasing distance from the processor in access time Processor L1$ L2$ Main Memory 4-8 bytes (word) 8-32 bytes (block) 1 to 4 blocks Inclusive what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM Secondary Memory 1,024+ bytes (disk sector = page) (Relative) size of the memory at each level

72 Review: Principle of Locality Temporal Locality Keep most recently accessed data items closer to the processor Spatial Locality Move blocks consisting of contiguous words to the upper levels Hit Time << Miss Penalty To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y Hit: data appears in some block in the upper level (Blk X) Hit Rate: the fraction of accesses found in the upper level Hit Time: RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a lower level block (Blk Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level with a block from the lower level + Time to deliver this block s word to the processor Miss Types: Compulsory, Conflict, Capacity 1/8/2017 ELEC / Lecture 6 72

73 Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then Memory-stall cycles come from cache misses (a sum of read-stalls and write-stalls) Memory-stall cycles = Read-stall cycles + Write-stall cycles where, CPU time = InstrCnt CPI ClkCylTime = IC (CPI ideal + Memory-stall cycles) CCT CPI stall Read-stall cycles = reads/program read miss rate read miss penalty Write-stall cycles = (writes/program write miss rate write miss penalty) + write buffer stalls

74 Measuring Cache Performance (cont.) For write-through caches, penalties of read miss and write miss are the same Both equal to the time to fetch the block from memory If write buffer stalls are assumed to be negligible, We can combine reads and writes using a single miss rate and miss penalty Memory-stall cycles = Memory access/program x Miss rate Miss penalty = Instructions/program x Misses/Instruction x Miss penalty 1/8/2017 ELEC / Lecture 6 74

75 Clocks per instruction Clocks per DRAM access Review: The Memory Wall Logic vs DRAM speed gap continues to grow Core Memory VAX/1980 PPro/ /8/2017 ELEC / Lecture 6 75

76 Impacts of Cache Performance Relative cache penalty increases as processor performance improves (faster clock rate and/or lower CPI) The memory speed is unlikely to improve as fast as processor cycle time. When calculating CPI stall, the cache miss penalty is measured in processor clock cycles needed to handle a miss The lower the CPI ideal, the more pronounced the impact of stalls Example A processor with a CPI ideal of 2, a 100 cycle miss penalty, 36% load/store instr s, and 2% I$ and 4% D$ miss rates Memory-stall cycles = 2% % 4% 100 = 3.44 So CPI stalls = = /8/2017 ELEC / Lecture 6 76

77 Impacts of Cache Performance (cont.) What happens if processor made faster but not memory? Memory stalls take up increasing fraction of execution time What if the CPI ideal is reduced to 1, without changing clock rate Possible by an improved pipeline CPI stalls = = 4.44 Execution time spent on memory stalls increased from 63% (3.44/5.44) to 77% (3.44/4.44) 1/8/2017 ELEC / Lecture 6 77

78 Impacts of Cache Performance (cont.) What if the processor clock rate is doubled (i.e., doubling the miss penalty)? Memory not changed. Increasing performance loss due to cache misses Example Miss penalty now doubles 100x2=200 cycles Because memory still takes the same time, but double of CPU cycles,» because CPU is twice as fast CPI stalls = 2+(2%x200)+(36%x4%x200)= 8.88 Relative performance due to cache miss = perfomance_fast_clk/performance_slow_clk=5.44/(8.88x1/2)=1.23 Not 2, smaller than 2» 2 is the rate of clock rate increase of processor» i.e., although processor clock rate is doubled, performance only increases 1.23 times due to cache misses 1/8/2017 ELEC / Lecture 6 78

79 Improving Cache Performance Time = InstrCnt x ClkCycleTime x (idealcpi+mem stalls) Average Memory Access time = Hit Time + (Miss Rate x Miss Penalty) = (Hit Rate x Hit Time) + (Miss Rate x Miss Time) 1. Reduce the time to hit in the cache Details later 2. Reduce the miss rate Allow more flexible block placement Use multiple levels of caches More details later 3. Reduce the miss penalty Details later 1/8/2017 ELEC / Lecture 6 79

80 Reduce the time to hit in the cache #1 Direct mapped cache Smaller cache Smaller blocks Cache Read: overlap tag comparison (TC) and data access Discard tag if mismatched Cache Write: pipeline write hit stages (TC and Write) Write buffer for write-thru cache Between processor and main memory to around the cache Store buffer for write-back cache Between processor and cache to allow one cycle cache store operation by pipelining tag check (checking for hit) and data access (storing) 1/8/2017 ELEC / Lecture 6 80

81 Reducing Cache Miss Rates #2 Allow more flexible block placement In a direct mapped cache a memory block maps to exactly one cache block At the other extreme, could allow a memory block to be mapped to any cache block fully associative cache A compromise is to divide the cache into sets each of the sets consists of n ways (n-way set associative). A memory block maps to a unique set (specified by the index field) and can be placed in any way of that set (so there are n-choices) Set # of a block = (block address) modulo (# sets in the cache) 1/8/2017 ELEC / Lecture 6 81

82 Reducing Cache Miss Rates #2 Use multiple levels of caches With advancing technology have more than enough room on the die for bigger L1 caches or for a second level of caches normally a unified L2 cache (i.e., it holds both instructions and data) and in some cases even a unified L3 cache L2/L3 could be on chip or off chip SRAMs Faster than main memory DRAMs Upon the miss of primary L1 cache, L2 cache is accessed If L2 contains the desired data, miss penalty for L1 will be the access time of L2 cache << that of the main memory Furthermore, L2 cache not tied to CPU clock rate (just like the concept of write buffer) If neither L1 nor L2 contains the data, main memory access is required and a larger miss penalty is incurred L2 access time + MM access time However, such a miss rate of L1&L2 << miss rate of L1 1/8/2017 ELEC / Lecture 6 82

83 Reducing Miss Penalty Using Multilevel Caches The following example demonstrates the significance of performance improvement from using a L2 cache. Example, CPI ideal of 2, 100 cycle miss penalty (to main memory), 36% load/stores, a 2% (4%) L1 Instr$ (D$) miss rate, add a L2$ that has a 25 cycle miss penalty and a 0.5% miss rate to main memory CPI stalls = 2 + 2% % 4% % % 0.5% 100 = 2 + (2%-0.5%)x %x(4%-0.5%)x %x(25+100) + 36%x0.5%x(25+100) = 3.54 (as compared to 5.44 with no L2$) 1/8/2017 ELEC / Lecture 6 83

84 Multilevel Cache Design Considerations Very different design considerations for L1 and L2 Primary cache (L1) should focus on minimizing hit time in support of a shorter clock cycle Smaller cache size with smaller block sizes However, smaller blocks, higher miss rate But, the miss penalty of the L1 cache is significantly reduced by the presence of an L2 cache so it can be smaller (i.e., faster) but have a higher miss rate Secondary cache(s) (e.g. L2) should focus on reducing miss rate to reduce the penalty of long main memory access times Larger cache size with larger block sizes However, larger block size, slower access (or hit) time But, for the L2 cache, hit time is less important than miss rate» Because L2$ hit time affects L1$ s miss penalty rather than L1$ hit time or processor cycle time 1/8/2017 ELEC / Lecture 6 84

85 Multilevel Cache Design Considerations (cont.) Global miss rate The fraction of references that miss in all levels of a multilevel cache. Local miss rate The fraction of references to one level of a cache that miss; used in multilevel hierarchies For example, L2$ local miss rate = the ratio of all misses in L2$ divided by the number of access to L2$ In our previous example, 0.5%/2% = 25% L2$ local miss rate >> than the global miss rate Because primary cache (L1) filters access Especially those with good spatial and temporal locality Luckily, the global miss rate dictates how often we must access main memory 1/8/2017 ELEC / Lecture 6 85

86 Reduce the miss penalty #3 Smaller blocks Use a write buffer to hold dirty blocks being replaced so don t have to wait for the write to complete before reading Check write buffer (and/or victim cache) on read miss may get lucky For large blocks fetch critical word first Use multiple cache levels L2 cache not tied to CPU clock rate Faster backing store/improved memory bandwidth wider buses memory interleaving, page mode DRAMs 1/8/2017 ELEC / Lecture 6 86

87 Key Cache Design Parameters L1 typical L2 typical Total size (blocks) 250 to to 250,000 Total size (KB) 16 to to 8000 Block size (B) 32 to to 128 Miss penalty (clocks) 10 to to 1000 Miss rates (global for L2) 2% to 5% 0.1% to 2% 1/8/2017 ELEC / Lecture 6 87

88 Two Machines Cache Parameters Intel P4 AMD Opteron L1 organization Split I$ and D$ Split I$ and D$ L1 cache size 8KB for D$, 96KB for trace cache (~I$) L1 block size 64 bytes 64 bytes 64KB for each of I$ and D$ L1 associativity 4-way set assoc. 2-way set assoc. L1 replacement ~ LRU LRU L1 write policy write-through write-back L2 organization Unified Unified L2 cache size 512KB 1024KB (1MB) L2 block size 128 bytes 64 bytes L2 associativity 8-way set assoc. 16-way set assoc. L2 replacement ~LRU ~LRU L2 write policy write-back write-back 1/8/2017 ELEC / Lecture 6 88

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality

More information

ECE ECE4680

ECE ECE4680 ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Memory Hierarchy Design (Appendix B and Chapter 2)

Memory Hierarchy Design (Appendix B and Chapter 2) CS359: Computer Architecture Memory Hierarchy Design (Appendix B and Chapter 2) Yanyan Shen Department of Computer Science and Engineering 1 Four Memory Hierarchy Questions Q1 (block placement): where

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 2 Cache Field Sizes Number of bits in a cache includes

More information

CMPT 300 Introduction to Operating Systems

CMPT 300 Introduction to Operating Systems CMPT 300 Introduction to Operating Systems Cache 0 Acknowledgement: some slides are taken from CS61C course material at UC Berkeley Agenda Memory Hierarchy Direct Mapped Caches Cache Performance Set Associative

More information

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 Agenda Cache Sizing/Hits and Misses Administrivia

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CS152 Computer Architecture and Engineering Lecture 17: Cache System CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

COSC 6385 Computer Architecture - Memory Hierarchies (I)

COSC 6385 Computer Architecture - Memory Hierarchies (I) COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Memory Hierarchy: Motivation

Memory Hierarchy: Motivation Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

Transistor: Digital Building Blocks

Transistor: Digital Building Blocks Final Exam Review Transistor: Digital Building Blocks Logically, each transistor acts as a switch Combined to implement logic functions (gates) AND, OR, NOT Combined to build higher-level structures Multiplexer,

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Memory Hierarchy: The motivation

Memory Hierarchy: The motivation Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 6C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H Katz David A PaHerson hhp://insteecsberkeleyedu/~cs6c/fa Direct Mapped (contnued) - Interface CharacterisTcs of the

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!) 7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches CS 61C: Great Ideas in Computer Architecture Cache Performance, Set Associative Caches Instructor: Justin Hsia 7/09/2012 Summer 2012 Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Agenda. Recap: Components of a Computer. Agenda. Recap: Cache Performance and Average Memory Access Time (AMAT) Recap: Typical Memory Hierarchy

Agenda. Recap: Components of a Computer. Agenda. Recap: Cache Performance and Average Memory Access Time (AMAT) Recap: Typical Memory Hierarchy // CS 6C: Great Ideas in Computer Architecture (Machine Structures) Set- Associa+ve Caches Instructors: Randy H Katz David A PaFerson hfp://insteecsberkeleyedu/~cs6c/fa Cache Recap Recap: Components of

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information