Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

Size: px
Start display at page:

Download "Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?"

Transcription

1 CS5 / EE5365 cache. Where Have We Been? Multi-Cycle Control Finite State Machines Microsequencing (Microprogramming) Exceptions Pipelining Datapath Making use of multi-cycle datapath Pipelining Control What s Next? Who Cares About the Hierarchy? Processor-DRAM Gap (latency) Performance CS5 / EE5365 cache. 98 µproc CPU 6%/yr. Moore s Law (X/.5yr) Processor- Performance Gap (grows 5% / year) DRAM DRAM 9%/yr. (X/ yrs) Time Performance Growth is Really Super-Exponential From Robot Mere Machine to Transcendent Mind by Hans Moravec CS5 / EE5365 cache.3

2 CS5 / EE5365 cache.4 The Big Picture Where are We Now? The Five Classic Components of a Computer Processor Input Control Datapath Output Today s Topics Recap last lecture Review Advanced Virtual Protection TLB Hierarchy of a Modern Computer System By taking advantage of the principle of locality Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. Processor Datapath Control Registers On-Chip Second Level (SRAM) Main (DRAM) Secondary Storage (Disk) Tertiary Storage (Disk) Speed (ns) s s s Size (bytes) s Ks Ms CS5 / EE5365 cache.5,,s (s ms) Gs,,,s (s sec) Ts Recap Static RAM Cell 6-Transistor SRAM Cell word (row select) word bit bit Write. Drive bit lines (bit=, bit=).. Select row Read. Precharge bit and bit to Vdd.. Select row 3. Cell pulls one line low 4. Sense amp on column detects difference between bit and bit bit replaced with pullup to save area bit CS5 / EE5365 cache.6

3 CS5 / EE5365 cache.7 Recap -Transistor Cell (DRAM) Write. Drive bit line.. Select row Read. Precharge bit line to Vdd.. Select row 3. Cell and bit line share charges - Very small voltage changes on the bit line 4. Sense (fancy sense amp) - Can detect changes of ~ million electrons 5. Write restore the value Refresh. Just do a dummy read to every cell. bit row select DRAMs over Time DRAM Generation st Gen. Sample Size Die Size (mm ) Area (mm ) Cell Area (µm ) Mb 4 Mb 6 Mb 64 Mb 56 Mb Gb CS5 / EE5365 cache.8 (from Kazuhiro Sakashita, Mitsubishi) Preview Two Different Types of Locality Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. By taking advantage of the principle of locality Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. DRAM is slow but cheap and dense Good choice for presenting the user with a BIG memory system SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. CS5 / EE5365 cache.9

4 CS5 / EE5365 cache. We Exploit Locality by Providing a Window on the World Programs Tend to Execute Instructions for Awhile Refill Main Repeat times Occasional Non-local Jump Repeat 7 times Programs Look Like This 3 4 Refill Refill Refill3 Main Not Like This Once the cache is full of a tight group of instructions, it can scream! The Art of System Design Workload or Benchmark programs Processor reference stream <op,addr>, <op,addr>,<op,addr>,<op,addr>,... op i-fetch, read, write $ MEM Optimize the memory system organization to minimize the average memory access time for typical workloads (Much different than embedded or real-time designs worst case.) CS5 / EE5365 cache. Direct-Mapped Mapping address is modulo the number of blocks in the cache CS5 / EE5365 cache.

5 CS5 / EE5365 cache.3 So, What Info Does the Need to Hold? How are we going to organize it? So, What Info Does the Need to Hold? The Data (or Instructions) Which of the many addresses it is filled with Whether or not it is valid (e.g. first access) How are we going to organize it? CS5 / EE5365 cache.4 Direct Mapped - Organization For MIPS Address Byte offset Byte Offset 3 What kind of locality are we taking advantage of? CS5 / EE5365 cache.5

6 CS5 / EE5365 cache.6 Direct Mapped - Organization For MIPS Hit Address (showing bit positions) Byte offset Tag Index Data Index Valid Tag Data 3 3 What kind of locality are we taking advantage of? How Do We take Advantage of the Other Locality? CS5 / EE5365 cache.7 Taking Advantage of Spatial Locality Load multiple words at once Block Read Hit Address (showing bit positions) Byte Tag offset Index 6 bits 8 bits V Tag Data Data Block Offset 4K entries MUX 3 CS5 / EE5365 cache.8

7 CS5 / EE5365 cache.9 Example KB Direct Mapped with 3 B Blocks For a ** N byte cache The uppermost (3 - N) bits are always the Tag The lowest M bits are the Byte Select (Block Size = ** M) 3 Tag Example x5 Stored as part of the cache state 9 Index Ex x 4 Byte Select Ex x Valid Bit Tag x5 Data Byte 3 Byte Byte 63 Byte 33 Byte 3 Byte 3 Byte 3 Byte 99 3 Terminology Hits vs. Misses Read hits this is what we want! Data is already in the cache Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses read the entire block into the cache, then write the word CS5 / EE5365 cache. Example Robot Control Code >tmp[] >tmp[] <tmp[] <kp[] <pos[] >torq[] 6 Tmp[] <pos[] <tmp[] <pos[] <tmp[] <pos[] >vel[] <tmp[] >pos[] <tmp[] <pos[] <kv[] <vel[] <torq[] >torq[] <kp[] <pos[] >torq[] <kv[] <vel[] <torq[] Data count flag i Tag Tmp[] Pos[] Pos[] Vel[] Vel[] Kp[] Kp[] Kv[] Kv[] Torq[] Torq[] >vel[] <tmp[] >pos[] >torq[] <pos[] <torq[] >torq[] count flag i CS5 / EE5365 cache.

8 CS5 / EE5365 cache. Where Have We Been? Pipelining Datapath Making use of multi-cycle datapath Pipelining Control Direct-Mapped Block Reads What s Next? More Set-Associative Block Size Tradeoff In general, larger block size take advantage of spatial locality BUT Larger block size means larger miss penalty - Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up - Too few cache blocks In General, Average Access Time = Hit Time x ( - Miss Rate) + Miss Penalty x Miss Rate Miss Penalty Miss Rate Exploits Spatial Locality Fewer blocks compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size CS5 / EE5365 cache.3 Block Size Block Size Block Size Effect is Real Increasing the block size tends to decrease miss rate Miss rate 4% 35% 3% 5% % 5% % 5% % Block size (bytes) to a point, then is starts increasing 56 KB 8 KB 6 KB 64 KB 56 KB Program Block size in words Instruction miss rate Data miss rate Effective combined miss rate gcc 6.%.% 5.4% 4.%.7%.9% spice.%.3%.% 4.3%.6%.4% CS5 / EE5365 cache.4

9 CS5 / EE5365 cache.5 Extreme Example single big line Valid Bit Tag Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely to be a miss again - Continually loading data into the cache but discard (force out) them before they are used again - Worst nightmare of a cache designer Ping Pong Effect Conflict Misses are misses caused by Different memory locations mapped to the same cache index - Solution - Solution Data Byte 3 Byte Byte Byte Extreme Example single big line Valid Bit Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely to be a miss again CS5 / EE5365 cache.6 Tag Data - Continually loading data into the cache but discard (force out) them before they are used again - Worst nightmare of a cache designer Ping Pong Effect Conflict Misses are misses caused by Different memory locations mapped to the same cache index - Solution make the cache size bigger Byte 3 Byte Byte Byte - Solution Multiple entries for the same Index Another Extreme Example Fully Associative Fully Associative Forget about the Index Compare the Tags of all cache entries in parallel Example Block Size = 3 B blocks, we need N 7-bit comparators By definition Conflict Miss = for a fully associative cache 3 Tag (7 bits long) 4 Byte Select Ex x? = = = = = Tag Valid Bit Data Byte 3 Byte 63 Byte Byte Byte 33 Byte 3 CS5 / EE5365 cache.7

10 CS5 / EE5365 cache.8 Where Have We Been? Pipelining Control Direct-Mapped Block Reads More Set-Associative What s Next? More Write Policies Virtual Decreasing miss ratio with associativity One-way set associative (direct mapped) Block Tag Data Two-way set associative Set Tag Data Tag Data 3 Four-way set associative Set Tag Data Tag Data Tag Data Tag Data Eight-way set associative (fully associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Compared to direct mapped, give a series of references that results in a lower miss ratio using a -way set associative cache results in a higher miss ratio using a -way set associative cache assuming we use the least recently used replacement strategy CS5 / EE5365 cache.9 Valid A Two-way Set Associative N-way set associative N entries for each Index N direct mapped caches operates in parallel Example Two-way set associative cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Tag Data Block Index Data Block Tag Valid Adr Tag Compare Sel Mux Sel Compare Hit OR Block CS5 / EE5365 cache.3

11 CS5 / EE5365 cache.3 Disadvantage of Set Associative N-way Set Associative versus Direct Mapped N comparators vs. Extra MUX delay for the data Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Block is available BEFORE Hit/Miss Possible to assume a hit and continue. Recover later if miss. Valid Tag Data Block Index Data Block Tag Valid Adr Tag Compare Sel Mux Sel Compare Hit OR Block Book s Figure for 4-Way Set Associative Address Index V Tag Data V Tag Data V Tag Data V Tag Data to- multiplexor Hit Data CS5 / EE5365 cache.3 Performance 5% % Miss Rate Miss rate 9% 6% 3% KB KB 4KB % One-way Two-way Four-way Eight-way Associativity KB KB 6 KB 3 KB 4 KB 64 KB 8 KB 8 KB CS5 / EE5365 cache.33

12 CS5 / EE5365 cache.34 A Summary on Sources of Misses Compulsory (cold start or process migration, first reference) first access to a block Cold fact of life not a whole lot you can do about it Note If you are going to run billions of instruction, Compulsory Misses are insignificant Conflict (collision) Multiple memory locations mapped to the same cache location Solution increase cache size Solution increase associativity Capacity cannot contain all blocks accessed by the program Solution increase cache size Invalidation other process (e.g., I/O) updates memory Sources of Misses Quiz (for constant complexity) Size Small, Medium, Big? Compulsory Miss Direct Mapped N-way Set Associative Fully Associative Conflict Miss Capacity Miss Invalidation Miss Choices Zero, Low, Medium, High, Same CS5 / EE5365 cache.35 How Do you Design a? Set of Operations that must be supported read data <= Mem[Physical Address] write Mem[Physical Address] <= Data CS5 / EE5365 cache.37 Physical Address Read/Write Data Determine the internal register transfers Design the Datapath Black Box Design the Controller Address Data In Data Out DataPath Inside it has Tag-Data Storage, Muxes, Comparators,... Control Points Signals Controller R/W Active wait

13 CS5 / EE5365 cache.38 Impact on Cycle Time Hit Time directly tied to clock rate increases with cache size increases with associativity I - miss PC IR IRex A B invalid Average Access time = Hit Time + Miss Rate x Miss Penalty Time = IC x CT x (ideal CPI + memory stalls) IRm IRwb R D T Miss Example direct map allows miss signal after data Improving Performance 3 general options. Reduce the miss rate. Reduce the time to hit in the cache 3. Reduce the miss penalty CS5 / EE5365 cache.39 Decreasing miss penalty with multilevel caches Add a second level cache often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in nd level cache Example CPI of. on a 5Mhz machine with a 5% miss rate, ns DRAM access Adding nd level cache with ns access time decreases miss rate to % Using multilevel caches try and optimize the hit time on the st level cache try and optimize the miss rate on the nd level cache CS5 / EE5365 cache.4

14 CS5 / EE5365 cache.4 Remember Our Hierarchy Slide? By taking advantage of the principle of locality A Hierarchy can decrease the miss penalty Processor Datapath Control Registers On-Chip Second Level (SRAM) Main (DRAM) Secondary Storage (Disk) Tertiary Storage (Disk) Speed (ns) s s s Size (bytes) s Ks Ms,,s (s ms) Gs,,,s (s sec) Ts Hardware Issues for Decreasing Miss Penalty Make reading multiple words easier by using banks of memory CPU CPU CPU Multiplexor Bus Bus Bus bank bank bank bank 3 b. Wide memory organization c. Interleaved memory organization a. One-word-wide memory organization It can get a lot more complicated... CS5 / EE5365 cache.4 4 Questions for Hierarchy Q Where can a block be placed in the upper level? (Block placement) Q How is a block found if it is in the upper level? (Block identification) Q3 Which block should be replaced on a miss? (Block replacement) Q4 What happens on a write? (Write strategy) CS5 / EE5365 cache.43

15 CS5 / EE5365 cache.44 Q3 Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative Random LRU (Least Recently Used) Associativity -way 4-way 8-way Size LRU RandomLRU Random LRU Random 6 KB 5.% 5.7% 4.7% 5.3% 4.4% 5.% 64 KB.9%.%.5%.7%.4%.5% 56 KB.5%.7%.3%.3%.%.% Q4 What happens on a write? Write through The information is written to both the block in the cache and to the block in the lowerlevel memory. Write back The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty? Pros and Cons of each? WT read misses cannot result in writes WB no writes of repeated writes WT always combined with write buffers so that we don t wait for lower level memory CS5 / EE5365 cache.45 Write Buffer for Write Through Processor DRAM Write Buffer A Write Buffer is needed between the and Processor writes data into the cache and the write buffer controller write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries 4 Works fine if Store frequency (w.r.t. time) << / DRAM write cycle system designer s nightmare Store frequency (w.r.t. time) -> / DRAM write cycle Write buffer saturation CS5 / EE5365 cache.46

16 CS5 / EE5365 cache.47 Write Buffer Saturation Processor DRAM Store frequency (w.r.t. time) -> / DRAM write cycle If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row) - Store buffer will overflow no matter how big you make it - The CPU Cycle Time <= DRAM Write Cycle Time Solution for write buffer saturation Use a write back cache Install a second level (L) cache Write Buffer Processor L DRAM Write Buffer Recall Levels of the Hierarchy Capacity Access Time Cost CPU Registers s Bytes < ns K Bytes.5-5 ns -3-4 cents/bit Main M Bytes ns-ns -5-6 cents Disk G Bytes ms - - cents Tape/CD/DVD infinite sec-min Registers Instr. Operands Blocks Pages Disk Files Tape/CD/DVD Staging Xfer Unit prog./compiler -8 bytes cache cntl 8-8 bytes OS 5-4K bytes user/operator Mbytes Upper Level faster Larger Lower Level CS5 / EE5365 cache.48 Virtual Main memory can act as a cache for the secondary storage (disk) Virtual Addresses Physical Addresses Address translation User Program Computer Disk Addresses Advantages illusion of having more physical memory program relocation protection CS5 / EE5365 cache.49

17 CS5 / EE5365 cache.5 Basic Issues in Virtual System Design size of information blocks that are transferred from secondary to main storage (M) block of information brought into M, and M is full, then some region of M must be released to make room for the new block --> replacement policy which region of M is to hold the new block --> placement policy missing item fetched from secondary memory only on the occurrence of a fault --> demand load policy mem disk reg cache frame pages Paging Organization virtual and physical address space partitioned into blocks of equal size pages page frames Pages virtual memory blocks Page faults the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use writeback Virtual Address Virtual Page Number Page Offset Translation Physical Page Number Page Offset Physical Address CS5 / EE5365 cache.5 Where Have We Been? Direct-Mapped Set Associative Virtual Page Table TLB What s Next? Virtual I/O CS5 / EE5365 cache.5

18 CS5 / EE5365 cache.53 Page Tables Virtual Page Number V a l id Page Table Physical Page or Disk Address Physical Disk Storage Page Tables Page table register Virtual address Virtual page number Page offset Valid Physical page number Page table 8 If then page is not present in memory Physical page number Page offset Physical address CS5 / EE5365 cache.54 Address Map V = {,,..., n - } virtual address space M = {,,..., m - } physical address space n > m MAP V --> M U {} address mapping function MAP(a) = a' if data at virtual address a is present in physical address a' and a' in M = if data at virtual address a is not present in M a Processor Name Space V missing item fault fault handler a Addr Trans Mechanism a' Main Secondary physical address OS performs this transfer CS5 / EE5365 cache.55

19 CS5 / EE5365 cache.56 P.A Paging Organization frame 7 Physical K K K Address Mapping VA page no. disp Addr Trans MAP V.A page 3 Virtual K K K unit of mapping also unit of transfer from virtual to physical memory Page Table Base Reg index into page table Page Table V Access Rights PA + table located in physical memory physical memory address actually, concatenation is more likely Virtual Address and a CPU CS5 / EE5365 cache.57 VA PA miss Translation Main hit data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible ASIDE Why access cache with PA at all? VA caches have a problem! synonym / alias problem two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! for update must update all cache entries with same physical address or memory becomes inconsistent determining this requires significant hardware, essentially an associative lookup on the physical address tags to see if you have multiple hits; or software enforced alias boundary same lsb of VA &PA > cache size TLBs A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Address Physical Address Dirty Ref Valid Access TLB access time comparable to cache access time (much less than main memory access time) CS5 / EE5365 cache.58

20 CS5 / EE5365 cache.59 Translation Look-Aside Buffers Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 8-56 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. Translation with a TLB CPU hit VA PA miss TLB Lookup miss Translation / t hit data t Main t Making address translation practical TLB Virtual memory => memory acts like a cache for the disk Page table maps virtual page numbers to physical frames Translation Look-aside Buffer (TLB) is a cache of recent translations Virtual Address Space Physical Space Page Table virtual address page off 3 Translation Lookaside Buffer frame page 5 CS5 / EE5365 cache.6 Reducing Translation Time Machines with TLBs go one step further to reduce # cycles/cache access They overlap the cache access with the TLB access Works because high order bits of the VA are used to look in the TLB while low order bits are used as index into cache CS5 / EE5365 cache.6

21 CS5 / EE5365 cache.6 Overlapped & TLB Access 3 TLB assoc lookup page # index K PA Hit/ Miss addr tag index block page # disp 4 bytes Tag Data Hit/ Miss = IF cache hit AND (cache tag = PA) then deliver data to CPU ELSE IF [cache miss OR (cache tag = PA)] and TLB hit THEN access memory with the PA from the TLB ELSE do standard VA translation then compare cache tag In the Absence of Virtual 3 TLB assoc lookup page # index K PA Hit/ Miss addr tag index block page # disp 4 bytes Tag Data Hit/ Miss = IF cache hit AND (cache tag = PA) then deliver data to CPU ELSE access memory with the PA CS5 / EE5365 cache.63 Impact of Hierarchy on Algorithms Today CPU time is a function of (ops, cache misses) vs. just f(ops) What does this mean to Compilers, Data structures, Algorithms? The Influence of s on the Performance of Sorting by A. LaMarca and R.E. Ladner. Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January, 997, Quicksort fastest comparison based sorting algorithm when all keys fit in memory Radix sort also called linear time sort because for keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys For Alphastation 5, 3 byte blocks, direct mapped L MB cache, 8 byte keys, from 4 to 4 CS5 / EE5365 cache.64

22 CS5 / EE5365 cache.65 Problems With Overlapped TLB Access Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache Example suppose everything is the same except that the cache is increased to 8 K bytes instead of 4 K cache index This bit is changed by VA translation, but is needed for cache virt page # disp lookup Solutions go to 8K byte page sizes; go to way set associative cache; or SW guarantee VA[3]=PA[3] 4 4 K way set assoc cache Page Fault What happens when you miss? Not talking about TLB miss TLB is HW s attempt to make page table lookup fast (on average) Page fault means that page is not resident in memory Hardware must detect situation Hardware cannot remedy the situation Therefore, hardware must trap to the operating system so that it can remedy the situation pick a page to discard (possibly writing it to disk) load the page in from disk update the page table resume to program so HW will retry and succeed! What is in the page fault handler? See OS class What can HW do to help it do a good job? CS5 / EE5365 cache.66 Page Replacement Not Recently Used (-bit LRU, Clock) Associated with each page is a reference flag such that ref flag = if the page has been referenced in recent past = otherwise -- if replacement is necessary, choose any page frame such that its reference bit is. This is a page that has not been referenced in the recent past page table entry dirty used CS5 / EE5365 cache.67 page table entry Or search for the a page that is both not recently referenced AND not dirty. page fault handler last replaced pointer (lrp) if replacement is to take place, advance lrp to next entry (mod table size) until one with a bit is found; this is the target for replacement; As a side effect, all examined PTE's have their reference bits set to zero. Architecture part support dirty and used bits in the page table => may need to update PTE on any instruction fetch, load, store How does TLB affect this design problem? Software TLB miss?

23 CS5 / EE5365 cache.68 Why virtual memory? Generality ability to run programs larger than size of physical memory Storage management allocation/deallocation of variable sized blocks is costly and leads to (external) fragmentation Protection regions of the address space can be R/O, Ex,... Flexibility portions of a program can be placed anywhere, without relocation Storage efficiency retain only most important portions of the program in memory Concurrent I/O execute other processes while loading/dumping page Expandability can leave room in virtual address space for objects to grow. Performance Observe impact of multiprogramming, impact of higher level languages Summary #/ 4 The Principle of Locality Program likely to access a relatively small portion of the address space at any instant of time. - Temporal Locality Locality in Time - Spatial Locality Locality in Space Three Major Categories of Misses Compulsory Misses sad facts of life. Example cold start misses. Conflict Misses increase cache size and/or associativity. Nightmare Scenario ping pong effect! Capacity Misses increase cache size Design Space total size, block size, associativity replacement policy write-hit policy (write-through, write-back) write-miss policy CS5 / EE5365 cache.69 Summary # / 4 The Design Space Several interacting dimensions cache size block size associativity replacement policy write-through vs write-back write allocation The optimal choice is a compromise depends on access characteristics - workload - use (I-cache, D-cache, TLB) depends on technology / cost Simplicity often wins Bad Size Good Factor A Less Associativity Block Size Factor B More CS5 / EE5365 cache.7

24 CS5 / EE5365 cache.7 Summary #3 / 4 TLB, Virtual s, TLBs, Virtual all understood by examining how they deal with 4 questions ) Where can block be placed? ) How is block found? 3) What block is replaced on miss? 4) How are writes handled? Page tables map virtual address to physical address TLBs are important for fast translation TLB misses are significant in processor performance (funny times, as most systems can t access all of nd level cache without TLB misses!) Summary #4 / 4 Hierachy VIrtual memory was controversial at the time can SW automatically manage 64KB across many programs? X DRAM growth removed the controversy Today VM allows many processes to share single memory without having to swap all processes to disk; VM protection is more important than memory hierarchy Today CPU time is a function of (ops, cache misses) vs. just f(ops) What does this mean to Compilers, Data structures, Algorithms? CS5 / EE5365 cache.7

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency) Recap Who Cares About the Hierarchy? -DRAM Gap (latency) CS52 Computer Architecture and Engineering s and Virtual October 3, 997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides http//www-inst.eecs.berkeley.edu/~cs52/

More information

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley. CS152 Computer Architecture and Engineering Caches and Virtual Memory October 31, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides http//www-inst.eecs.berkeley.edu/~cs152/ cs 152 L1

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

ECE468 Computer Organization and Architecture. Virtual Memory

ECE468 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively

More information

ECE4680 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

ECE ECE4680

ECE ECE4680 ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CS152 Computer Architecture and Engineering Lecture 17: Cache System CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson

More information

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory March 22, 1995 Dave Patterson (patterson@cs) and Shing Kong (shingkong@engsuncom) Slides available on http://httpcsberkeleyedu/~patterson

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

COSC 6385 Computer Architecture - Memory Hierarchies (I)

COSC 6385 Computer Architecture - Memory Hierarchies (I) COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Page 1. Review: Address Segmentation  Review: Address Segmentation  Review: Address Segmentation Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"

More information

Memory Hierarchy Review

Memory Hierarchy Review EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual January 27 th, 20 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012!  (0xE0)    (0x70)  (0x50) CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation

More information

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" October 1, 2012! Prashanth Mohan!! Slides from Anthony Joseph and Ion Stoica! http://inst.eecs.berkeley.edu/~cs162! Caching!

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0. Since 1980, CPU has outpaced DRAM... EEL 5764: Graduate Computer Architecture Appendix C Hierarchy Review Ann Gordon-Ross Electrical and Computer Engineering University of Florida http://www.ann.ece.ufl.edu/

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1 CS162 Operating Systems and Systems Programming Lecture 13 Caches and TLBs March 12, 2008 Prof. Anthony D. Joseph http//inst.eecs.berkeley.edu/~cs162 Review Multi-level Translation What about a tree of

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Virtual Memory. Virtual Memory

Virtual Memory. Virtual Memory Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) CS6C Review of Cache/VM/TLB Lecture 26 April 3, 999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs6c/schedule.html Outline Review Pipelining Review Cache/VM/TLB Review

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Time 11/03/99 UCB Fall 1999

Time 11/03/99 UCB Fall 1999 Recap Who Cares About the Hierarchy? CS52 Computer Architecture and Engineering Lecture 9 s and TLBs November 3, 999 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides http//www-inst.eecs.berkeley.edu/~cs52/

More information

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review CSE 502 Graduate Computer Architecture Lec 6-7 Memory Hierarchy Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

CSC Memory System. A. A Hierarchy and Driving Forces

CSC Memory System. A. A Hierarchy and Driving Forces CSC1016 1. System A. A Hierarchy and Driving Forces 1A_1 The Big Picture: The Five Classic Components of a Computer Processor Input Control Datapath Output Topics: Motivation for Hierarchy View of Hierarchy

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Recall: Paging. Recall: Paging. Recall: Paging. CS162 Operating Systems and Systems Programming Lecture 13. Address Translation, Caching

Recall: Paging. Recall: Paging. Recall: Paging. CS162 Operating Systems and Systems Programming Lecture 13. Address Translation, Caching CS162 Operating Systems and Systems Programming Lecture 13 Address Translation, Caching March 7 th, 218 Profs. Anthony D. Joseph & Jonathan Ragan-Kelley http//cs162.eecs.berkeley.edu Recall Paging Page

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

CpE 442. Memory System

CpE 442. Memory System CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Page 1. CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

Page 1. CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging March 4, 2010 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer System Take advantage

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Recap: Set Associative Cache. N-way set associative: N entries for each Cache Index N direct mapped caches operates in parallel

Recap: Set Associative Cache. N-way set associative: N entries for each Cache Index N direct mapped caches operates in parallel Recap: Set Associative Cache CS152 Computer Architecture and Engineering Lecture 21 Virtual and Buses April 19, 1999 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) Valid N-way set associative: N entries

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging October 17, 2007 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading

More information

Chapter Seven Morgan Kaufmann Publishers

Chapter Seven Morgan Kaufmann Publishers Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be

More information

10/7/13! Anthony D. Joseph and John Canny CS162 UCB Fall 2013! " (0xE0)" " " " (0x70)" " (0x50)"

10/7/13! Anthony D. Joseph and John Canny CS162 UCB Fall 2013!  (0xE0)    (0x70)  (0x50) Goals for Todayʼs Lecture" CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" October 7, 2013! Anthony D. Joseph and John Canny! http//inst.eecs.berkeley.edu/~cs162! Paging- and

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

CMPT 300 Introduction to Operating Systems

CMPT 300 Introduction to Operating Systems CMPT 300 Introduction to Operating Systems Cache 0 Acknowledgement: some slides are taken from CS61C course material at UC Berkeley Agenda Memory Hierarchy Direct Mapped Caches Cache Performance Set Associative

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner 5DV8 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner Topic 5: The Memory Hierarchy Part B: Address Translation These slides are mostly taken verbatim,

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Recap: The Big Picture: Where are We Now? The Five Classic Components of a Computer. CS152 Computer Architecture and Engineering Lecture 20.

Recap: The Big Picture: Where are We Now? The Five Classic Components of a Computer. CS152 Computer Architecture and Engineering Lecture 20. Recap The Big Picture Where are We Now? CS5 Computer Architecture and Engineering Lecture s April 4, 3 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides http//inst.eecs.berkeley.edu/~cs5/

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information