ECE ECE4680

Similar documents
CS152 Computer Architecture and Engineering Lecture 17: Cache System

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Technologies. Technology Trends

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

CPE 631 Lecture 04: CPU Caches

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Modern Computer Architecture

EE 4683/5683: COMPUTER ARCHITECTURE

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

14:332:331. Week 13 Basics of Cache

Course Administration

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

ECE468 Computer Organization and Architecture. Memory Hierarchy

CpE 442. Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System

14:332:331. Week 13 Basics of Cache

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

LECTURE 11. Memory Hierarchy

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

ארכיטקטורת יחידת עיבוד מרכזי ת

CS3350B Computer Architecture

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Advanced Computer Architecture

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Page 1. Memory Hierarchies (Part 2)

CSC Memory System. A. A Hierarchy and Driving Forces

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

MIPS) ( MUX

Handout 4 Memory Hierarchy

Computer Architecture Spring 2016

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

ECE468 Computer Organization and Architecture. Virtual Memory

Question?! Processor comparison!

ECE4680 Computer Organization and Architecture. Virtual Memory

Cache Memory - II. Some of the slides are adopted from David Patterson (UCB)

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

EEC 483 Computer Organization

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Memory Hierarchy: Caches, Virtual Memory

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

12 Cache-Organization 1

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory hierarchy review. ECE 154B Dmitri Strukov

The Memory Hierarchy & Cache

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

Memory Hierarchy: The motivation

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy: Motivation

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Levels in memory hierarchy

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Memory Hierarchy Review

Mo Money, No Problems: Caches #2...

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Why memory hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches. Samira Khan March 23, 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Recap: The Big Picture: Where are We Now? The Five Classic Components of a Computer. CS152 Computer Architecture and Engineering Lecture 20.

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Transcription:

ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time small by Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory ECE468. -4-7 An Expanded View of the System path Control Speed Fastest Size Smallest Cost Highest Slowest Biggest Lowest Levels of the Hierarchy Capacity Access Time Cost Registers s Bytes <s ns K Bytes - ns $.-./bit ain Bytes ns-us $.-. Disk G Bytes ms - -4 - cents Tape infinite sec-min -6 Registers Disk Tape Instr. Operands Blocks Pages s Staging fer Unit prog./compiler -8 bytes cache cntl 8-8 bytes OS 5-4K bytes user/operator bytes Upper Level faster Larger Lower Level ECE468. -4-7 ECE468.4-4-7 The Principle of Locality Hierarchy Principles of Operation Probability of reference Address Space The Principle of Locality Program access a relatively small portion of the address space at any instant of time. Example 9% of time in % of the code Two Different Types of Locality Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. At any given time, data is copied between only adjacent levels Upper Level () the one closer to the processor - Smaller, faster, and uses more expensive technology Lower Level () the one further away from the processor - Bigger, slower, and uses less expensive technology Block The minimum unit of information that can either be present or not present in the two level hierarchy To From Upper Level () Blk Lower Level () Blk Y ECE468.5-4-7 ECE468.6-4-7

ECE468.7-4-7 Hierarchy Terminology Hit data appears in some block in the upper level (example Block ) Hit Rate the fraction of memory access found in the upper level Hit Time Time to access the upper level which consists of RA access time + Time to determine hit/miss iss data needs to be retrieved from a block in the lower level (Block Y) iss Rate = - (Hit Rate) iss Penalty = Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << iss Penalty To From Upper Level () Blk Lower Level () Blk Y Basic Terminology Typical Values Typical Values Block (line) size 4-8 bytes Hit time - 4 cycles iss penalty 8 - cycles (and increasing) (access time) (6- cycles) (transfer time) ( - cycles) iss rate % - % Size KB - 56 KB ECE468.8-4-7 How Does Work? The Simplest Direct apped Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Keep more recently accessed data items closer to the processor Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. ove blocks consists of contiguous words to the cache To From Upper Level Blk Lower Level Blk Y Address 4 5 6 7 8 9 A B C D E F 4 Byte Direct apped Index Location can be occupied by data from location, 4, 8,... etc. In general any memory location whose LSBs of the address are s Address<> => cache index (Block Addr) module (#cache blocks) How can we tell which one is in the cache? Which one should we place in the cache? ECE468.9-4-7 ECE468. -4-7 Example Consider an eight-word direct mapped cache. Show the contents of the cache as it responds to a series of requests (decimal addresses), 6,, 6, 6,, 6, 8 Organization Tag and Index Assume a -bit memory (byte ) address N A bytes direct mapped cache - Index The lower N bits of the memory address - Tag The upper ( - N) bits of the memory address 6 6 6 6 8 N Tag Example x5 Index Ex x miss miss hit hit miss miss hit miss Bit Stored as part of the cache state x5 N Bytes Direct apped Byte Byte Byte Byte N Byte - N - ECE468. -4-7 ECE468. -4-7

ECE468. -4-7 Start Up Access Example Access (miss) Access (miss) V Tag iss Handling Load Write Tag & Set V [] Load Write Tag & Set V [] [] Access (HIT) Access (HIT) [] [] [] [] Sad Fact of Life A lot of misses at start up Compulsory isses - (Cold start misses) Definition of a Block Block the cache data that has in its own cache tag Our previous extreme example (see slide ) 4-byte Direct apped cache Block Size = Byte Take advantage of Temporal Locality If a byte is referenced, it will tend to be referenced soon. Did not take advantage of Spatial Locality If a byte is referenced, its adjacent bytes will be referenced soon. In order to take advantage of Spatial Locality increase the block size Tag Direct apped Byte Byte Byte Byte Question Can we say the larger, the better for the block size? ECE468.4-4-7 Example KB Direct apped with B Blocks For a ** N byte cache The uppermost ( - N) bits are always the Tag The lowest bits are the Byte Select (Block Size = ** ) apping an Address to a ultiword Block Consider a cache with 64 Blocks and a block size of 6 bytes. What block number does byte address map to? Tag Example x5 Stored as part of the cache state 9 Index Ex x 4 Byte Select Ex x [/6] module 64 = Bit Tag x5 Byte Byte 6 Byte Byte Byte Byte Byte Byte 99 ECE468.5-4-7 ECE468.6-4-7 Bits in a cache How many totoal bits are required for a directed mapped cache with 64 KB of data and one-word blocks, assuming a -bit address? Block Size Tradeoff In general, larger block size take advantage of spatial locality BUT Larger block size means larger miss penalty - Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up 4 ^4 ( + (-4-)+) = ^4*49 = 784 Kbits Average Access Time = Hit Time x ( - iss Rate) + iss Penalty x iss Rate iss Penalty iss Rate Exploits Spatial Locality Fewer blocks compromises temporal locality Average Access Time Increased iss Penalty & iss Rate 4 Block Size Block Size Block Size ECE468.7-4-7 ECE468.8-4-7

ECE468.9-4-7 Another Extreme Example Bit Tag Byte Byte Byte Byte Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache True If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely to be a miss again - Continually loading data into the cache but discard (force out) them before they are used again - Worst nightmare of a cache designer Ping Pong Effect Conflict isses are misses caused by Different memory locations mapped to the same cache index - Solution make the cache size bigger - Solution ultiple entries for the same Index How Do you Design a? Set of Operations that must be supported read data <= em[physical Address] write em[physical Address] <= Physical Address Read/Write Black Box Determine the internal register transfers Design the path Design the Controller Path Inside it has Tag- Storage, uxes, Comparators,... Control Points R/W Active Address Controller In wait Out Signals ECE468. -4-7 Impact on Cycle Time Improving Performance general options. Reduce the miss rate, Hit Time directly tied to clock rate increases with cache size increases with associativity I - miss PC IR IRex A B. Reduce the miss penalty, or. Reduce the time to hit in the cache. invalid stall cycles = Read stall cycles + write-stall cycles IRm Average Access time = Hit Time + iss Rate x iss Penalty IRwb Time = IC x CT x (ideal CPI + memory stalls) D iss R T Read stall cycles = #reads * read miss rate * read miss penalty Write stall cycles = #writes * write miss rate * write miss penalty stall cycles = #access * miss rate * miss penalty ECE468. -4-7 ECE468. -4-7 Examples Q Assume an instruction cache miss rate for gcc of % and a data cache miss rate of 4%. If a machine has a CPI of without any memory stalls and the miss penalty is 4 cycles for all misses, determine how much faster a machine would run with a perfect cache that never missed. (It is known that the frequency of all loads and stores in gcc is 6%, unused) Answer Instruction miss cycles = I * % * 4 =.8I miss cycles = I * 4% * 4 =.56I The CPI with memory stalls is +.6 =.6 The performance with the perfect cache is better by.68 Q Suppose we speed up the machine by reducing CPI from to. Q If we double clock rate, And yet Another Extreme Example Fully Associative Fully Associative Forget about the Index Compare the Tags of all cache entries in parallel Example Block Size = B blocks, we need N 7-bit comparators By definition Conflict iss = for a fully associative cache Tag Tag (7 bits long) 4 Byte Select Ex x Bit Byte Byte Byte Byte 6 Byte Byte ECE468. -4-7 ECE468.4-4-7

ECE468.5-4-7 A Two-way Set Associative N-way set associative N entries for each Index N direct mapped caches operates in parallel Example Two-way set associative cache Index selects a set from the cache The two tags in the set are compared in parallel is selected based on the tag result Tag Block Index Block Tag Disadvantage of Set Associative N-way Set Associative versus Direct apped N comparators vs. Extra U delay for the data comes AFTER Hit/iss In a direct mapped cache, Block is available BEFORE Hit/iss Possible to assume a hit and continue. Recover later if miss. Tag Block Index Block Tag Adr Tag Compare Sel Hit OR ux Sel Block Compare Adr Tag Compare Sel OR ux Sel Compare Block Hit ECE468.6-4-7 Example There are three small caches, each consisting of four one-word blocks. One cache is fully associative, a second is two-way set associative, and the third is direct mapped. Find the number of misses for each cache organization given the following sequence of block addresses, 8,, 6, 8 F. A. 8 H 6 8 H A Summary on Sources of isses Compulsory (cold start, first reference) first access to a block Cold fact of life not a whole lot you can do about it Conflict (collision) ultiple memory locations mapped to the same cache location Solution increase cache size Solution increase associativity Capacity cannot contain all blocks access by the program Solution increase cache size -Way H H Invalidation other process (e.g., I/O, processors) updates memory D.. ECE468.7-4-7 ECE468.8-4-7 Source of isses Quiz Sources of isses Answer Direct apped N-way Set Associative Fully Associative Direct apped N-way Set Associative Fully Associative Size Size Big edium Small Compulsory iss Compulsory iss See Note High (but who cares!) edium Low Conflict iss Conflict iss High edium Zero Capacity iss Invalidation iss Capacity iss Invalidation iss Low edium High Same Same Same Note If you are going to run billions of instruction, Compulsory isses are insignificant. ECE468.9-4-7 ECE468. -4-7

ECE468. -4-7 The Need to ake a Decision! Direct apped Each memory location can only mapped to cache location No need to make any decision -) - Current item replaced the previous item in that cache location N-way Set Associative Each memory location have a choice of N cache locations Fully Associative Each memory location can be placed in ANY cache location miss in a N-way Set Associative or Fully Associative Bring in new block from memory Throw out a cache block to make room for the new block ====> We need to make a decision on which block to throw out! Block Replacement Policy Random Replacement Hardware randomly selects a cache item and throw it out Least Recently Used Hardware keeps track of the access history Replace the entry that has not been used for the longest time Example of a Simple Pseudo Least Recently Used Implementation Assume 64 Fully Associative Entries Hardware replacement pointer points to one cache entry Whenever an access is made to the entry the pointer points to - ove the pointer to the next entry Otherwise do not move the pointer Entry Entry Replacement Pointer Entry 6 ECE468. -4-7 Write Policy Write Through versus Write Back read is much easier to handle than cache write Instruction cache is much easier to design than data cache write How do we keep data in the cache and memory consistent? Two options (decision time again -) Write Back write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss. - Need a dirty bit for each cache block - Greatly reduce the memory bandwidth requirement - Control can be complex Write Through write to cache and memory at the same time. - What!!! How can this be? Isn t memory too slow for this? Write Buffer for Write Through DRA Write Buffer A Write Buffer is needed between the and writes data into the cache and the write buffer controller write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries 4 Works fine if Store frequency (w.r.t. time) << / DRA write cycle system designer s nightmare Store frequency (w.r.t. time) > / DRA write cycle Write buffer saturation ECE468. -4-7 ECE468.4-4-7 Write Buffer Saturation Write Buffer DRA Store frequency (w.r.t. time) > / DRA write cycle If this condition exist for a long period of time ( cycle time too quick and/or too many store instructions in a row) - Store buffer will overflow no matter how big you make it - The Cycle Time <= DRA Write Cycle Time Solution for write buffer saturation Use a write back cache Install a second level (L) cache Write Buffer L DRA ECE468.5-4-7 Write Allocate versus Not Allocate Assume a 6-bit write to memory location x and causes a miss Do we read in the rest of the block (Byte,,... )? Yes Write Allocate No Write Not Allocate Bit Tag Tag x Example x 9 Index Ex x Byte Byte 6 Byte Byte Byte Byte Byte ECE468.6-4-7 4 Byte Select Ex x Byte 99

ECE468.7-4-7 Recap Improving Performance Reducing Transfer Time. Reduce the miss rate,. Reduce the miss penalty, or. Reduce the time to hit in the cache. stall cycles = Read stall cycles + write-stall cycles Read stall cycles = #reads * read miss rate * read miss penalty Write stall cycles = #writes * write miss rate * write miss penalty $ bus mux $ bus $ bus stall cycles = #access * miss rate * miss penalty Solution High BW DRA Solution Wide Path Between & Solution Interleaving Examples Page ode DRA SDRA Cost CDRA RAbus ECE468.8-4-7 Cycle Time versus Access Time Increasing Bandwidth - Interleaving Cycle Time Access Time Time DRA (Read/Write) Cycle Time >> DRA (Read/Write) Access Time - ; why? DRA (Read/Write) Cycle Time How frequent can you initiate an access? Analogy A little kid can only ask his father for money on Saturday DRA (Read/Write) Access Time How quickly will you get what you want once you initiate an access? Analogy As soon as he asks, his father will give him the money DRA Bandwidth Limitation analogy What happens if he runs out of money on Wednesday? Access Pattern without Interleaving D available Start Access for D Access Pattern with 4-way Interleaving Access Bank Access Bank Access Bank Start Access for D Access Bank We can Access Bank again Bank Bank Bank Bank ECE468.9-4-7 ECE468.4-4-7 ain Performance Independent Banks Timing model to send address, 6 access time, to send data Block is 4 words Simple.P. = 4 x (+6+) = Wide.P. = + 6 + = 8 Interleaved.P. = + 6 + 4x = How many banks? number banks number clocks to access word in bank For sequential accesses, otherwise will return to original bank before it has next word ready Increasing DRA => fewer chips => harder to have banks Growth bits/chip DRA 5%-6%/yr Nathan yrvold /S mature software growth (%/yr for NT) - growth B/$ of DRA (5%-%/yr) ECE468.4-4-7 ECE468.4-4-7

ECE468.4-4-7 SPARCstation s System SPARCstation s Controller Bus (bus) 64-bit wide Bus (SI Bus) 8-bit wide datapath odule 7 odule 6 odule 5 odule 4 odule odule odule odule (bus odule) Instruction Register odule B Direct apped Write Back Write Allocate odule (bus odule) Instruction Register SPARCstation s Size and organization B, direct mapped Block size 8 B Sub-block size B Write Policy Write back, write allocate ECE468.44-4-7 SPARCstation s Internal Instruction SPARCstation s Internal odule (bus odule) I- KB 5-way Register B Direct apped Write Back Write Allocate odule (bus odule) I- KB 5-way Register B Direct apped Write Back Write Allocate D- 6 KB 4-way WT, WNA SPARCstation s Internal Instruction Size and organization KB, 5-way Set Associative Block size 64 B Sub-block size B Write Policy Does not apply Note Sub-block size the same as the (L) SPARCstation s Internal Size and organization 6 KB, 4-way Set Associative Block size 64 B Sub-block size B Write Policy Write through, write not allocate Sub-block size the same as the (L) ECE468.45-4-7 ECE468.46-4-7 Two Interesting Questions? odule (bus odule) I- KB 5-way Register B Direct apped Write Back Write Allocate D- 6 KB 4-way WT, WNA SPARCstation s odule Supports a wide range of sizes Smallest 4 B 6 b DRA chips, 8 KB of Page ode SRA Biggest 64 B 6b chips, 6 KB of Page ode SRA 5 cols DRA Chip DRA Chip 5 56K x 8 = B Why did they use N-way set associative cache internally? Answer A N-way set associative cache is like having N direct mapped caches in parallel. They want each of those N direct mapped cache to be 4 KB. Same as the virtual page size. 5 rows 56K x 8 = B 8 bits 5 x 8 SRA bits<7> How many levels of cache does SPARCstation has? Answer Three levels. () Internal I & D caches, () cache and ()... 5 x 8 SRA bits<7> Bus<7> ECE468.47-4-7 ECE468.48-4-7

ECE468.49-4-7 DRA Performance Summary A 6 ns (t RAC ) DRA can perform a row access only every ns (t RC ) perform column access (t CAC ) in 5 ns, but time between column accesses is at least 5 ns (t PC ). - In practice, external address delays and turning around buses make it 4 to 5 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead. Drive parallel DRAs, external memory controller, bus to turn around, SI module, pins 8 ns to 5 ns latency from processor to memory is good for a 6 ns (t RAC ) DRA The Principle of Locality Temporal Locality vs Spatial Locality Four Questions For Any Where to place in the cache How to locate a block in the cache Replacement Write policy Write through vs Write back - Write miss Three ajor Categories of isses Compulsory isses sad facts of life. Example cold start misses. Conflict isses increase cache size and/or associativity. Nightmare Scenario ping pong effect! Capacity isses increase cache size ECE468.5-4-7