Memory Hierarchy Chapter 2. Abdullah Muzahid

Size: px
Start display at page:

Download "Memory Hierarchy Chapter 2. Abdullah Muzahid"

Transcription

1 Memory Hierarchy Chapter 2 Abdullah Muzahid

2 17. 2-Way Set Associative Example Assume 2-way set-associative, 64 cache sets, 16-byte cache line, and LRU replacement policy: How big is cache? Addr Blk Fnd Upd R/W Binary addr Tag Set O Way Way 3R R R R R R R R R R

3 17. 2-Way Set Associative Example Addr Blk Fnd Upd R/W Binary addr Tag Set O Way Way 3R Miss 34R Miss 1216R Miss 444R Miss 1 448R R Miss 1 296R R R R Miss 1

4 18. Write-Through, No Write Allocate Example Assume 2-way set-associative, 64 cache sets, 16-byte cache line, and LRU replacement policy: Addr Blk Fnd Upd Mem R/W Binary addr Tag Set O Way Way Refs 3W R R W W R R How many main memory reads? How many main memory writes?

5 18. Write-Through, No Write Allocate Example Assume 2-way set-associative, 64 cache sets, 16-byte cache line, and LRU replacement policy: Addr Blk Fnd Upd Mem R/W Binary addr Tag Set O Way Way Refs 3W Miss None 1W 34R Miss 1R 444R Miss 1 1R 448W W 8496W Miss None 1W 85R Miss 1R 34R Miss 1 1R Main memory reads 4 Main memory writes 3

6 19. Write-Back, Write Allocate Example Assume 2-way set-associative, 64 cache sets, 16-byte cache line, and LRU replacement policy: Addr Blk Fnd Upd Mem Dirty R/W Binary addr Tag Set O Way Way Refs 3W R R W W R R Put * in Upd Way if that way is (still) dirty How many reads and writes, and why?

7 19. Write-Back, Write Allocate Example Assume 2-way set-associative, 64 cache sets, 16-byte cache line, and LRU replacement policy: Addr Blk Fnd Upd Mem Dirty R/W Binary addr Tag Set O Way Way Refs 3W Miss 1 * 1R 34R Miss 1R 444R Miss 1 1R 448W *1 8496W Miss 1 * 1R 85R * 34R Miss 1 1R+1W ) Last ref evicts dirty block (448), causes read of 34 and write of 448! Main memory reads 5 Main memory writes 1

8 2. L1 cache of AMD Opteron Comp Arch, Henn & Patt, Fig B.5, pg B-13 Block address <25> <9> Tag (512 blocks) (512 blocks) Index 2 2 Block offset <6> Valid <1> Tag <25> =? =? Data <64> 2:1 mux 4 CPU address Data in Data out Victim buffer Lower-level memory 2-way assoc L1 Valid & dirty bit 8 blk LRU bit for each set 1. into 3 parts: 64 byte blksz! 6 bits o 512 entry! 9 bits index 25-bit tag (4 2. Index determ proper set 3. Check if tag match & valid bit set 4. Mux selects which way to pass out On hit, output! CPU On miss, output! vic bu

9 21. Improving Cache Performance Assume main and virtual memory implementations are fixed, how can we improve our cache performance? Reduce the cache miss penalty Reduce cache miss rate Use parallelism to overlap operations, improving one or both of above Doing hardware prefetch in parallel with normal mem tra c can reduce miss rate Reduce cache hit time ) Will discuss each of these in turn

10 22. Ideas for Reducing Cache Miss Penalty Use early restart: Allow cpu to continue as soon as required bytes are in cache, rather than waiting for entire block to load Critical word first: Load accessed bytes in block first Load remaining words in block in wrap-around manner Status bits needed to indicate how much of block has arrived Particularly good for caches wt large block sizes Give memory reads priority over writes Merging write bu er Victim caches Use multilevel caches

11 23. Giving Memory Reads Priority Over Writes (Reducing Cache Miss Penalty) Assume common case of having a write-bu er so that the CPU does not stall on writes (as it must for reads): The CPU can check in write bu er on read miss: presently in write bu, load from there not in write bu, load from mem before prior writes Advantages: Since read stalls CPU & write does not, we min stalls May avoid mem load if in write bu Write bu ers can make write-back more e cient as well: 1. For dirty bit eviction, copy the dirty blk frm cache to write bu 2. Load the evicting block from mem to cache (CPU unstalled) 3. Write dirty block from bu to memory

12 24. Merging Write Bu er (Reducing Cache Miss Penalty) Due to lat, multiword writes more e c than writing words sep Mult words in write bu may be associated with same Valid bits used to indicate which words to write Reduces the # of mem accesses Reduces the # of write bu stalls for a given bu size Write address V V Comp Arch, Henn & Patt, Fig 2.7, pg Mem[1] Mem[18] Mem[116] Mem[124] Write address V V V V 1 1 Mem[1] 1 Mem[18] 1 Mem[116] 1 V V Mem[124] Top bu wo write merge Don t need valid tags in write-back Assume 32-byte blk for further cache For seq acc, 4-fold red in # of writes & bu e In practice, must handle 1,2,4 as well as 8-byte words ) Larger blksz, more help

13 25. Victim Caches (Reducing Cache Miss Penalty) Victim cache: small (eg, 1-8 blocks), fully-associative cache that contains recently evicted blocks from a primary cache Checked in parallel with primary cache Available on following cycle if the item in the victim cache Victim block swapped with block in cache Of great benefit to direct-mapped L1 cache Comp Arch 3rd ed, Henn & Patt, Fig 5.13, pg 422 Less popular today!

14 26. Multi-Level Caches (Reducing Cache Miss Penalty) Becomes more popular as miss penalty for primary cache grows Further caches may be o -chip, but still made of SRAM Almost all general purpose machine have at least 2 lvls of cache, most have 2 on-chip caches Further caches typically have larger blocks and cache size Equations: local miss rate = misses in this cache / accesses to this cache global miss rate = misses in this cache / accesses to L1 cache avg acc time = L1 hit time + L1 miss rate * L1 miss penalty L1 miss penalty = L2 hit time + L2 miss rate * L2 miss penalty L2 miss penalty = mainmem access time L1 miss penalty is average access time for L2 Local miss rate: % of this cache s refs that go to the next lvl Global miss rate: % of all cache s refs that go to the next lvl

15 26. Cache miss equation examples Assume nref=1, nl1miss=4, nl2miss = 2, L2 miss penalty = 1 cycles, L2 hit time = 1, L1 hit time = 1 1. What is the local and global miss rate for each cache?! L1 has same loc & glob miss rate, since all mem refs go to L1 L1 miss rate = 1 4 = 1 4 =.4 = 4% L2 local miss rate = 2 4 =.5 = 5% L2 global miss rate = 1 2 = 1 2 =.2 = 2% 2. What is the average access time? avg acc time = L1 hit time + L1 miss rate L1 miss penalty L1 miss penalty = avg L2 acc time avg L2 acc time = L2 hit time + L2 miss rate L2 miss pen avg L2 acc time = = = 6 avg acc time = L1 hit time + L1 miss rate L1 miss penalty avg acc time = = = =3.4 cycles = =

16 27. Reducing Cache Miss Rate Just discussed techniques for reducing the cost of a cache miss, now want to investigate ways to increase our chances of hitting in the cache. Cache Miss Categories: Compulsory: First access to block must always miss Calc total # of blocks accessed in program Capacity: Blocks that are replaced and reloaded because the cache cannot contain all the used blocks needed during execution. Sim fully-assoc cache, sub compulsory miss from total miss Conflict: Occurs when too many blocks map to same cache set Sim desired cache, sub comp & capacity miss from total miss

17 Miss rate per type Miss rate per type.1 Comp Arch, Henn & Patt, Fig 2.2, pg % 8% 6% 4% 2% Cache size (KB) 28. Miss Rate vs. Cache Size 1-way 2-way 4-way 8-way Capacity Compulsory % Cache size (KB) 1-way 2-way 4-way 8-way Capacity Compulsory Top figure shows total miss rate Compulsory (tiny 1st line) misses stay constant Only way to dec comp is to increase blksz, which may increase miss penalty Capacity (lrg blk area) go down with size Conflict misses dec wt size: Since # of conflicts go down wt size, assoc pays o less for large caches Bottom figure shows distribution of misses % of compuls misses increase wt size, since other types of misses decrease wt size

18 29. Reducing Miss Rate wt Larger Blocks Advantages: Exploits spatial locality Reduces compulsory misses Disadvantages: Increases miss penalty Can increase conflicts May waste bandwidth SPEC92 blksz analysis: If linesz lrg comp to cachesz, conflicts rise, increasing miss rate 64-byte linesz reasonable 8 studied cache sizes Comp Arch, Henn & Patt, Fig B.1, pg B-27 Cache Size blksz 4K 16K 64K 256K % 3.94% 2.4% 1.9% % 2.87% 1.35%.7% 64 7.% 2.64% 1.6%.51% % 2.77% 1.2%.49% % 3.29% 1.15%.49%

19 3. Reducing Miss Rate wt Larger Caches & Higher Associativity Larger Caches Advantages: Reduces capacity & conflict misses Disdvantages: Uses more space May increase hit time Higher cost ($, power, die) Higher Associativity Advantages: Reduces conflict misses Disadvantages: May increase hit time Tag check done before data can be sent Req more space & power More logic for comparitors More bits for tag Other status bits (LRU)

20 31. Reducing Miss Rate wt Way Prediction & Pseudo Associativity Hit time as fast as direct-mapped, and req only 1 comparitor Reduces misses like a set-associative cache Will have fast hits and slow hits Way Prediction Each set has bits indicating which block to check on next access A miss requires checking the other blocks in the set on subsequent cycles Pseudo Associativity Accesses cache as in directmapped cache wt 1 less indx bit On miss, chks sister blk in cache (eg., by invert most sig indx bit) May swap two blks on an init cache miss wt a pseudo way hit

21 32. Reducing Cache Miss Penalty and/or Miss Rate via Parallelism Nonblocking caches Allow cache hit accesses while a cache miss is being serviced Some allow hits under multiple misses (req. a queue of outstanding misses) Could use a status bit for each block to indicate blk currently being filled Hardware prefetching & Software prefetching Idea is that predicted mem blocks are fetched while doing computations on present blocks Requires nonblocking caches Most prefetches do not raise exceptions If guess is right, data in-cache for use If wrong, wasted some bandwidth we weren t using anyway Helps with latency, by exploiting unused bandwidth If bus saturated, prefetch won t help, and most archs ignore Can help with throughput, if usage is sporadic Could expand conflict/capacity misses if prefetch is wrong

22 Pipeline Cache Access Pipeline cache access Increases hit latency But gives fast clock cycle and high bandwidth Most modern processors do this

23 34. Reducing Cache Hit Time Small & simple caches Small caches! less propogation delay Direct mapped! overlap tag chk & data sending Some designs have tags on-chip, data o Avoiding address translation: Virtual caches Avoids virtual-physical trans step, but problematic in practice Virtually indexed, physically tagged Indx cache by page o set, but tag with Can get data frm cache earlier Pipelined cache access: Allows fast clock speed, but results in greater br mispred penalty & load latency

24 35. Increasing Cache Bandwidth wt Multibanked Caches Increase bandwidth by sending address to b banks simultaneously b banks lookup address & write to bus at same time Increases bandwidth by b in best case Usually use sequential interleaving ) Figure 5.6 shows b=4 Block address Bank Block address 1 Comp Arch, Henn Block & Patt, Fig 5.6, pg 299 Bank 1 Bank 2 Block address address 2 3 Bank

25 Compiler Opt: Loop Interchange improve spaial/temporal locality of data for(j=;j<1;j=j+1) { } } for(i=;i<5;i=i+1){ x[i][j] = 2 * x [i][j]; for(i=;i<5;i=i+1){ } for (j=;j<1;j=j+1) { x[i][j] = 2 * x [i][j]; } Copyright Josep Torrellas 1999, 21, 22 25

26 Hardware Prefetching of I,D Prefetch : access items before they are needed and deposit them into caches or external buffers I prefetching: e.g. fetch next block on a miss or on access. The prefetched block goes to a stream buffer (or cache) D prefetching : same idea could have several stream buffers to capture several localiies Careful about bandwidth use Copyright Josep Torrellas 1999, 21, 22 26

27 Compiler Controlled Prefetching Compiler inserts prefetch instrucions Register prefetch : into a reg. (+ cache) Cache prefetch : into the cache Can be fauling : causes an excepion if protecion violaion non fauling : turns to No op if it would cause an excepion Needs a non blocking or lockup free cache: cache can be accessed while there is a prefetch / miss pending. Copyright Josep Torrellas 1999, 21, 22 27

28 Example 8 KB dir mapped cache with 16 B blocks Each element of a and b is 8 byte long 3r,1c 11r,3c for(i=;i<3;i=i+1) for(j=;j<1;j=j+1) a[i][j]= b[j][] * b[j+1][] a: Even j value miss; odd j value hit (spaial loc) - > 15 misses b: No spaial locality ; Only temp locality ; suppose no conflicts, miss 11 Imes TOTAL= 251 misses Copyright Josep Torrellas 1999, 21, 22 28

29 Usually works in loops Can be combined with loop unrolling & sokware pipelining Problem: Overhead Prefetching Copyright Josep Torrellas 1999, 21, 22 29

30 SimplificaIons: 1) not worry about first few misses, 2) not a fauling pref Split so that first loop prefetches a & b second loop prefetches only a assume long latency of miss prefetch 7 iteraions ahead for(j=;j<1;j=j+1) { prefetch(b[j+8][]); prefetch(a[][j+7]); a[][j] = b[j][]*b[j+1][]; } for(i=1;i<3;i=i+1){ for (j=;j<1;j=j+1) { prefetch(a[i][j+7]); a[i][j]=b[j][]*b[j+1][]; } } Copyright Josep Torrellas 1999, 21, 22 3

31 We are prefetching a[][7] - a[][99] a[1][7] - a[1][99] a[2][7] - a[2][99] b[8][] - b[1][] only lek with: 8 misses for b b[][].b[7][] 12 misses for a: a[][] a[][2] a[][4] a[][6] a[1][] a[1][2] a[1][4] a[1][6] a[2][] a[2][2] a[2][4] a[2][6] So execute 4 instrucions to avoid 231 misses Copyright Josep Torrellas 1999, 21, 22 31

32 36. Summary of Cache Optimizations Miss Miss Hit HW Technique pen rate tim BW cmplx Comment Comment Lrgr cachesz = + = 1widely used widely for used L2,L3 for L2,L3 Larger blksz + = = P4 L2 uses P4 L2 128 uses bytes 128 bytes Higher assoc = + = 1widely used widely used Multilevel + = = = 2Costly hrdwr, Costly hrdwr, esp if esp if caches L1 blkszl1 6= blksz L2; widely 6= L2; widely used used Cache indx w/o = = + = 1triv if small triv ifcache small translation USIII/21264 USIII/21264 Read priority + = = = 1easy foreasy uniproc, for uniproc, over writes widely used widely used Crit wrd frst + = = = 2widely used widely used &earlyrestrt Mrgng write bu + = = = 1widely used widely used Victim caches + + = = 2Athlon had Athlon 8-entry had 8-entry Way pred & = = + = 1I-cache I-cache of USIII/D-c of USIII/D-c of R43 of R43 Pseudoassoc = = + = 1L2 of ofl2 R1K of of R1K Comp. opt. = + = = hard, varies hard, by varies comp. by comp. Hardware pref + + = = 2I,3D widely used widely used Software pref + + = = 3widely used widely used Sm & simple cache = + widely used widely L1used L1 Nonblk caches + = = + 3all out-of-order all out-of-order CPUs CPUs Pipelined cache = = - + 1widely used widely used banked caches = = = + 1L2 of Opteron L2 of Opteron & Niagara & Niagara

Chapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Chapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Chapter 2 (cont) Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Improving Cache Performance Average mem access time = hit time + miss rate * miss penalty speed up

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CS222: Cache Performance Improvement

CS222: Cache Performance Improvement CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Pollard s Attempt to Explain Cache Memory

Pollard s Attempt to Explain Cache Memory Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

Lecture-18 (Cache Optimizations) CS422-Spring

Lecture-18 (Cache Optimizations) CS422-Spring Lecture-18 (Cache Optimizations) CS422-Spring 2018 Biswa@CSE-IITK Compiler Optimizations Loop interchange Merging Loop fusion Blocking Refer H&P: You need it for PA3 and PA4 too. CS422: Spring 2018 Biswabandan

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

Cache Optimisation. sometime he thought that there must be a better way

Cache Optimisation. sometime he thought that there must be a better way Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Performance metrics for caches

Performance metrics for caches Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin

More information

Types of Cache Misses: The Three C s

Types of Cache Misses: The Three C s Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur

More information

CPE 631 Lecture 06: Cache Design

CPE 631 Lecture 06: Cache Design Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance: #1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 6C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H Katz David A PaHerson hhp://insteecsberkeleyedu/~cs6c/fa Direct Mapped (contnued) - Interface CharacterisTcs of the

More information

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) CSE 4201 Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) Memory Hierarchy We need huge amount of cheap and fast memory Memory is either fast or cheap; never both. Do as politicians do: fake it Give

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Cache performance Outline

Cache performance Outline Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

Caches Concepts Review

Caches Concepts Review Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

ארכיטקטורת יחידת עיבוד מרכזי ת

ארכיטקטורת יחידת עיבוד מרכזי ת ארכיטקטורת יחידת עיבוד מרכזי ת (36113741) תשס"ג סמסטר א' July 2, 2008 Hugo Guterman (hugo@ee.bgu.ac.il) Arch. CPU L8 Cache Intr. 1/77 Memory Hierarchy Arch. CPU L8 Cache Intr. 2/77 Why hierarchy works

More information

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5) Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,

More information

Memory Hierarchy Design

Memory Hierarchy Design Memory Hierarchy Design Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information