CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

Size: px
Start display at page:

Download "CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade"

Transcription

1 CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade

2 CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade Office Hors: Te. 7:30-8:30 P (AP& 4141) Wed. 4:30-5:30 P (AP& 4141) Anjm Gpta (a3gpta@cs.csd.ed) Office Hor: on/wed 12-1 P Chengmo Yang (c5yang@cs.csd.ed) Office Hor: on/th 2-3 P Lectre: on/wed. 6-8:50 P, Center 109 Tetbook: Web-page: Compter Organization & Design The Hardware Software Interface, 2 nd Edition. Athors: Patterson and Hennessy 2

3 Reading Assignment: Annoncements Advanced Topics, Sections 6.8 (onday) Caches, Sections (onday) Virtal emory, Section (Wednesday) Homework 6: De Fri., Jly 30 Dring Discssion Cache: 7.7, 7.8, 7.9, 7.15, 7.16, 7.18, 7.20, 7.21 Virtal emory: 7.32, 7.33 Qiz 6 When: Wednesday, Jly 28, First 10 mintes of the class Topic: Caches, Chapter 7 Need: Paper, pen Final Eam When: Sat., Jly 31, 7-10 P, Center 101 (Note room change!) 3

4 CSE141 Corse Schedle Lectre # Date Time Room Topic Qiz topic 1 on. 6/28 6-8:50 P Center Wed. 6/30 6-8:50 P Center 109 Introdction, Ch. 1 ISA, Ch. 3 Performance, Ch. 2 Arithmetic, Ch. 4 Homework De - - ISA Ch. 3 - on. 7/5 No Class Jly 4th Holiday Wed. 7/7 6-8:50 P Center on. 7/12 6-8:50 P Center Te. 7/13 7:30-8:50 P Center Wed. 7/14 6-8:50 P Center on. 7/19 6-8:50 P Center Te. 7/20 7:30-8:50 P Center 109 Arithmetic, Ch. 4 Cont. Single-cycle CPU Ch. 5 Single-cycle CPU Ch. 5 Cont. lti-cycle CPU Ch. 5 lti-cycle CPU Ch. 5 Cont. (Jly 5th make p class) Single and lticycle CPU Eamples and Review for idterm id-term Eam Eceptions Pipelining Ch. 6 (Jly 5th make p class) Performance Ch. 2 #1 #2 Arithmetic, Ch. 4 #3 - - Single-cycle CPU Ch. 5 - # Wed. 7/21 6-8:50 P Center 109 Hazards, Ch on. 7/26 6-8:50 P Center 109 emory Hierarchy & Caches Ch Wed. 7/28 6-8:50 P Center 109 Virtal emory, Ch. 7 Corse Review Hazards Ch. 6 Cache Ch Sat. 7/ P Center 109 Final Eam #5 #6 4

5 Advanced Techniqes 5

6 Advanced Techniqes CPU CPU time time = Seconds = Instrctions Cycles Cycles 1 * * Program Program Instrction Clock Clock Freq Freq Sperpipelining ore pipeline stages Operand forwarding becomes complicated Branch penalty is high st se branch prediction scheme Enables rnning the clock at higher freqency Sperscalar ltiple pipelines eecting in parallel Each pipeline may be dedicated to a particlar task (integer, float, mem) Challenge is finding instrctions in parallel Decreases CPI 6

7 Sperscalar IPS Datapath ALU PC Instrction memory Registers Write data Data memory Sign etend Sign etend ALU Address Upto two instrctions issed per clock: one integer ALU instrction and one LD/ST 7

8 Sperscalar Isses Two instrctions have to be fetched and decoded 64-bits fetched at a given PC Additional ports are needed in the register file Total 4 read ports, 2 write ports in or eample Hardware resorces have to be replicated One ALU for arithmetic operation, another for E Address Additional data forwarding paths, control logic, Problems How to find mltiple instrctions to isse at rn time? Dependency on load instrction cannot be for mltiple clocks Determined by the nmber of instrctions issed in parallel Compiler technology need to statically schedle instrctions Breaks binary compatibility How to deal with a stall between LD and arithmetic instrction? In this case, net two instrctions cannot se load reslt w/o stalling 8

9 Advanced Techniqes Dynamic Pipeline Schedling Dynamic pipelining Eecte instrction ot-of-order to avoid pipeline hazards/stalls A stalled instrction shold not hold other instrctions Retire instrctions in eection order (i.e. commit reslt) Decreases CPI Three major sections Instrction fetch and isse Eecte nits Each nit has reservation station to hold operands and operations Instrctions held in the reservation station ntil ready to eecte Commit nit Common approach is in-order completion st discard instrctions as a reslt of a mis-predicted branch 9

10 Dynamically Schedled Pipeline Instrction fetch and decode nit In-order isse Reservation station Reservation station Reservation station Reservation station Fnctional nits Integer Integer Floating point Load/ Store Ot-of-order eecte In-order commit Commit nit Very comple to design and verify 10

11 emory Hierarchy 11

12 emory Systems Compter Control Inpt emory Datapath Otpt 12

13 Pipelined Design: Datapath and Control IF.Flsh ID.Flsh EX.Flsh IF/ID Hazard detection nit Control 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB 4 Shift left 2 Ecept PC PC I-E Instrction memory Registers = ALU D-E Data memory Sign etend Forwarding nit Can arbitrarily large amont of I-E and D-E be accessed in a single cycle? 13

14 emory Hierarchy in Compter Systems Processor Datapath Control Registers On-Chip Cache Second Level Cache (SRA) ain emory (DRA) Secondary Storage (Disk) Tertiary Storage Speed: 1 ns 10 s ns 100 s ns (10s ms) Size (bytes): 100s ~ KBytes ~ Bygtes ~G Bytes ~Tera Bytes Cycles (3 GHz): s 10 s of illions 14

15 emory Sbsystem Challenge Conflicting goals to provide: Largest possible memory At fastest access time With lowest cost Processor speeds now eceed 3 Ghz (0.3 ns cycle time) DRA access times are still ~10s of ns Serios emory access gap Every instrction has to be accessed from memory ~15% of the instrctions are load/store 15

16 Static RA Cell and Data Access 6-Transistor SRA Cell 0 1 word (row select) Write: 1. Drive bit lines to data Select row Read: 1. Precharge bit and bit to Vdd 2. Select row 3. Cell plls one line low bit bit 4. Sense amp on colmn detects difference between bit and bit Fast access, large area (6 transistors per cell) 16

17 Dynamic RA (DRA) Cell and Data Access 1-Transistor DRA Cell Write: 1. Drive bit line to data 2. Select row Read: 1. Precharge bit line to Vdd 2. Select row 3. Cell and bit line share charges Very small voltage changes on the bit line 4. Sense voltage difference Can detect changes of ~1 million electrons 5. Write: restore the vale Refresh 1. Jst do a dmmy read to every cell bit row select Slow access, small area (1 transistor per cell). Needs periodic refresh. 17

18 agnetic Disk Platters Tracks Average access time = Average seek time + Average rotational delay + Data transfer time + Disk controller overhead Platter Sectors Track Slow access (~ ms), very large capacity (100 s GB) 18

19 Caches 19

20 Who Cares abot emory Hierarchy? Processor vs emory Performance 1000 CPU CPU-DRA Gap DRA emory technology has not kept pace with Processor Performance emory access time is the performance bottleneck

21 emory Hierarchy and Locality emory locality is the principle that ftre memory accesses are near past accesses There are two types of locality Temporal locality -- near in time we will often access the same data again very soon Spatial locality -- near in space/distance or net access is often very close in address to recent access Types(s) of locality in following address seqence? 1,2,3,4,7,8,9,10,8,8,4,8,9,8,10,8,8 emory hierarchy is designed to take advantage of memory locality. Cache is implemented with SRA (fast, epensive) ain memory is implemented with DRA (cheap, slower) Storage is disk and tape (very slow, cheap, vast) 21

22 emory Hierarchy Faster Registers Cache Operands Blocks Program/compiler 1-8 bytes Cache controller bytes emory Pages OS 512-4K bytes Disk Files OS bytes Tape Larger 22

23 Dictionary meaning: What is a Cache? A hiding place sed especially for storing provisions A cache is a small amont of fast memory emory hierarchies eploit locality by caching (keeping close to the processor) data likely to be sed again. It is impractical to bild large, fast memories. Caches give an illsion of Fast access time (of a SRA) With very large capacity (provided by a disk) 23

24 Locality and Caching A cache is a small amont of fast memory emory hierarchies eploit locality by caching (keeping close to the processor) data likely to be sed again. This is done becase we can bild large, slow memories and small, fast memories, bt we can t bild large, fast memories. If it works, we get the illsion of SRA access time with disk capacity SRA (static RA) ns access time, very epensive DRA (dynamic RA) ns, cheaper disk -- access time measred in milliseconds, very cheap 24

25 Cache Terminology Instrction cache: cache that only holds instrctions Data cache: cache that only holds data Split cache: instrction and data cache are separate Provides increased bandwidth from the cache Hit rate is lower (than nified cache) Wins over nified cache de to higher bandwidth Unified cache: cache that holds both instrctions and data Hit rate is higher Bandwidth is lower (than that of of the split cache) 25

26 Cache Terminology Cache hit: an access where the data is fond in the cache Cache miss: an access which is not fond in the cache Hit time: time to access the cache iss penalty: Time to process a cache miss ove data from lower level memory to the cache and CPU Hit ratio: % of time the data is fond in the cache iss ratio: (1 - hit ratio) Cache block size or cache line size: the amont of data that gets transferred on a cache miss Effective access time: (Hit Ratio * Hit Time) + (iss Ratio * iss Time) 26

27 Pipelined Design: I-Cache & D-Cache IF.Flsh ID.Flsh EX.Flsh IF/ID Hazard detection nit Control 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB 4 Shift left 2 Ecept PC PC I Cache Instrction memory Registers = ALU Data D memory Cache Sign etend Forwarding nit How is the cache organized and managed? 27

28 How are Cache Entries ade? X4 X1 Xn 2 X4 X1 Xn 2 Xn 1 X2 Xn 1 X2 Xn X3 X3 a. Before the reference to Xn b. After the reference to Xn How is it determined whether the data for a given address is in the cache? In of of a miss, where is the data corresponding to the new address stores? 28

29 A Direct-mapped Cache If a data item is in the cache, how do we find it? Cache location = (block address) modlo (Nmber of cache blocks in the cache) In the following case: Nmber of cache blocks in the cache = 8 Cache emory 29

30 A Direct-mapped Cache, contd. Address trace: An An inde inde is is sed sed to to determine determine which which line line an an address address might might be be fond fond in in the the cache cache The The tag tagidentifies a a portion portion of of address address of of the the cached cached data data tag v data Valid Valid bit bit indicates indicates that that entry entry is is valid valid 4 entries, each block holds one word, each word in memory maps to eactly one cache location. A cache that can pt a line of data in eactly one place is called a direct-mapped cache 30

31 How is a Block fond in the Cache? Address Address (showing bit positions) Byte offset A 4 Kbyte Cache with a 1 word blocks Nmber of blocks = Cache Size/(Block Size) = 1 K Inde bits = log 2 (Nmber of blocks) = 10 bits Hit Tag Inde Inde Valid Tag Data Data Tag bits = ( Total address -Inde bits - Byte offset bits ) = = Kbyte Cache direct mapped cache with 1 word (4-byte) blocks 31

32 Handling a Cache Read iss IF.Flsh ID.Flsh EX.Flsh IF/ID Hazard detection nit Control 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB Processor Chip PC 4 Instrction I memory Cache Shift left 2 Registers = Ecept PC ALU Data D memory Cache Sign etend Forwarding nit I-Cache iss Logic I/O Controller I-Cache iss Logic A mis-match on tag and/or Valid bit indicates a miss. Stall CPU. ake read reqest to memory (via memory controller) When memory retrns the data write it into the cache Retrn the data to the CPU 32

33 Handling a Cache Write iss IF.Flsh ID.Flsh EX.Flsh IF/ID Hazard detection nit Control 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB Processor Chip PC 4 Instrction I memory Cache Shift left 2 Registers = Ecept PC ALU Data D memory Cache Sign etend Forwarding nit I-Cache iss Logic I/O Controller I-Cache iss Logic A mis-match on tag and/or Valid bit indicates a miss. Write tag, valid bit and data into the cache Works only if block size = word size Shold the data be written to memory also? 33

34 Dealing with Stores Stores mst be handled differently than loads, becase... They don t necessarily reqire the CPU to stall stores don t prodce register vales sed by other instrctions They change the content of cache/memory (creating memory consistency isses) Policy decisions for stores write-throgh => all writes go to both cache and main memory write-back => writes go only to cache. odified cache lines are written back to memory when the line is replaced. write-allocate => on a store miss, bring written line into the cache no-write-allocate => write to main memory, and ignore cache 34

35 How to Insre emory Consistency? Write-throgh Cache Write data to cache as well as lower level cache/memory This incrs performance penalty Use write bffer CPU can proceed with the following instrctions What abot brst writes? Use mltiple entries in the write bffer Write-back Cache Write cache data to memory when it is abot to be overwritten for another address Write-allocate: On a write miss, bring written line into the cache No-write-allocate: On a write miss, write to main memory, and ignore cache 35

36 Write-throgh Cache Handling a Cache Write iss Write data to cache as well as memory Don t need to consider whether the write hits or misses the cache Disadvantage: Every write cases the data to be written to the main memory Use write bffer so CPU can proceed with the following instrctions Write-back Cache When write occrs, write the new vale only to the block in the cache Write cache data to memory when it is abot to be overwritten for another address Improves performance over write-throgh cache ore comple to implement 36

37 Smmary for Stores On a store hit, write the new data to cache. In a write-throgh cache, write the data immediately to memory. In a write-back cache, mark the line as dirty. On a store miss, initiate a cache block load from memory for a write-allocate cache. Write directly to memory for a no writeallocate cache. On any kind of cache miss in a write-back cache, if the line to be replaced in the cache is dirty, write it back to memory. 37

38 Taking advantage of Spatial Locality Consider following address trace: 0,1,2,3,17,8,9,10,11,17,4,5,6,7 Notice that addresses lie in the vicinity of each other Instrctions show high degree of spatial locality Typically accessed seqentially Generally, code consists of a lot of loops Data also shows spatial locality Typically less than that of instrctions Different elements of a strctre may be accessed Why not bring mltiple words on a cache miss? Instead of bringing a single? 38

39 Spatial Locality: Larger Cache Blocks address string: tag data 4 entries, each block holds two words, each word in memory maps to eactly one cache location (this cache is twice the total size of the prior caches). Large cache blocks take advantage of spatial locality. Too large of a block size can waste cache space. Larger cache blocks reqire less tag space 39

40 A 64 KB Cache sing 16-byte Blocks Address Address (showing bit positions) Hit Tag Byte offset Inde Block offset Data 16 bits 128 bits V Tag Data 4K entries

41 Complication with Larger Blocks Write-throgh cache: Can t write to the cache while performing a tag comparison Ok if there is a hit in the cache Not ok if there is a cache miss: The block has to be fetched from memory and placed in the cache Rewrite the word that cased the miss into the cache 41

42 Impact of Block Size on iss Rate 40% 35% 30% iss rate 25% 20% 15% 10% 5% 0% Block size (bytes) In general, larger block decreases miss rate, however, Larger block size means larger miss penalty: Takes longer time to fill p the block iss rates go p if block size is too big Since there are too few cache blocks 1 KB 8 KB 16 KB 64 KB 256 KB 42

43 Cache Performance 64 KB each instrction cache and data cache (direct mapped) Program Block Size in words Instrction miss rate Data miss rate Effective Combined miss rate gcc spice 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% In general, Average Access Time: = [Hit Time * (1 - iss Rate)] + [iss Penalty * iss Rate] Limitations of direct mapped cache A block can go in eactly one place in the cache 43

44 Fleible Placement of Blocks Direct mapped cache A block can go in eactly one place in the cache Leads to collision among blocks Flly associative cache A block can go in any place in the cache All addresses have to be compared simltaneosly Slow and epensive N-way set-associative cache Consists of a nmber of sets Each set consists of N blocks Each block in memory maps to a niqe set A block can be placed in any element of the set Set containing a memory block = (block nmber) modlo(nmber of sets in the cache) Nmber of sets in the cache = Cache size/[(block size)*(associativity)] 44

45 Locating a Block in a Cache Block address = Direct apped Set-associative Flly associative Direct mapped Set associative Flly associative Block # Set # Data Data Data Tag 1 2 Tag 1 2 Tag 1 2 Search Search Search 45

46 Cache Configrations An eight-block cache with varios configrations On-way Associative One-way set associative (direct mapped) Block Tag Data Two-way set associative Set Tag Data Tag Data For-way set associative Set 0 1 Tag Data Tag Data Tag Data Tag Data Eight-way set associative (flly associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data 46

47 Accessing a 4-way Set-associative cache Address Nmber of Blocks = Cache Size/Block Size = 4 Kbytes/4 Bytes = 1 K blocks Nmber of Sets = (# Blocks)/associativity = 1 K/4 = 256 Inde V Tag Data V Tag Data V Tag Data V Tag Data Inde bits = log 2 ( # Sets ) = log 2 ( 256 ) = 8 4-to-1 mltipleor 4 K-byte 4-way set-associative cache, with a block size of 4 bytes Hit Data 47

48 Accessing a Direct apped Cache 64 KB cache, direct-mapped, 32-byte cache block size tag inde hit/miss valid tag = 11 data 256 word offset KB / 32 bytes = 2 K cache blocks/sets 48

49 Accessing a Set-associative Cache 32 KB cache, 2-way set-associative, 16-byte block size tag inde valid 10 word offset tag data valid tag data 32 KB / 16 bytes / 2 = 1 K cache sets hit/miss = = 49

50 A Flly-associative cache address string: ? The The tag tagidentifies the the address address of of the the cached cached data data tag v Valid Valid bit bit indicates indicates that that entry entry is is valid valid data 4 entries, each block holds one word, any block can hold any word. A cache that can pt a block of data anywhere is called flly associative To access the cache, address mst be compared with all the entries in the cache 50

51 Cache Organization A typical cache has three dimensions Nmber of sets (cache size) tag tag data data tag tag data data tag tag data data tag tag data data tag tag data data tag tag data data tag tag data data tag tag data data... Blocks/set (associativity) Bytes/block (block size) tag inde block offset 51

52 The Three Cs Complsory misses Cased by the first access to a block that has never been in the cache Also called cold-start misses Can be redced by increasing the block size Capacity misses Cased when cache cannot contain all the blocks needed Occr becase of blocks being replaced and later retrieved Can be redced by enlarging the cache Conflict misses Occr in direct mapped and set-associative caches ltiple blocks compete for the same set Can be eliminated by sing flly associative cache 52

53 Which Block Shold be Replaced on a iss? Direct apped is Easy Set associative or flly associative: Random (large associativities) LRU (smaller associativities) iss rates for the two schemes: Associativity: 2-way 4-way 8-way Size LRU Random LRU Random LRU Random 16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96% 64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% LRU is preferred scheme for a small size cache 53

54 Associative Caches: Higher hit rates, bt... Longer access time (longer to determine hit/miss, more ming of otpts) ore space (longer tags) 16 KB, 16-byte blocks, direct mapped, tag =? 16 KB, 16-byte blocks, 4-way, tag =? 54

55 Smmary The Principle of Locality: Program likely to access a relatively small portion of the address space at any instant of time. Temporal Locality: Locality in Time Spatial Locality: Locality in Space Three ajor Categories of Cache isses: Complsory isses: sad facts of life. Eample: cold start misses. Conflict isses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! Capacity isses: increase cache size Cache Design Space total size, block size, associativity replacement policy write-hit policy (write-throgh, write-back) Caches give an illsion of a large, cheap memory with the access time of a fast, epensive memory 55

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries

More information

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard

More information

Pipelining. Chapter 4

Pipelining. Chapter 4 Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we

More information

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1 Otline Introdction

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

The single-cycle design from last time

The single-cycle design from last time lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not

More information

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read. The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5

More information

Enhanced Performance with Pipelining

Enhanced Performance with Pipelining Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the

More information

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

More information

Exceptions and interrupts

Exceptions and interrupts Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes

More information

Chapter 6: Pipelining

Chapter 6: Pipelining CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in

More information

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes

More information

The extra single-cycle adders

The extra single-cycle adders lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since

More information

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle

More information

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code: EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i

More information

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading

More information

1048: Computer Organization

1048: Computer Organization 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards

More information

Review. A single-cycle MIPS processor

Review. A single-cycle MIPS processor Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion . (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA

More information

Chapter 6: Pipelining

Chapter 6: Pipelining Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate

More information

CS 153 Design of Operating Systems

CS 153 Design of Operating Systems CS 153 Design of Operating Systems Spring 18 Lectre 18: Memory Hierarchy Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Some slides modified from originals

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) Caches akim Weatherspoon CS 341, Spring 212 Computer Science Cornell University See P& 5.1, 5.2 (except writes) ctrl ctrl ctrl inst imm B A B D D Big Picture: emory emory: big & slow vs Caches: small &

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

ECE ECE4680

ECE ECE4680 ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17 Caches and emory Anne Bracy CS 34 Computer Science Cornell University Slides by Anne Bracy with 34 slides by Professors Weatherspoon, Bala, ckee, and Sirer. See P&H Chapter: 5.-5.4, 5.8, 5., 5.3, 5.5,

More information

CS 153 Design of Operating Systems Spring 18

CS 153 Design of Operating Systems Spring 18 CS 153 Design of Operating Systems Spring 18 Lectre 15: Virtal Address Space Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian OS Abstractions Applications

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca

More information

EECS 322 Computer Architecture Improving Memory Access: the Cache

EECS 322 Computer Architecture Improving Memory Access: the Cache EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow

More information

CS 153 Design of Operating Systems

CS 153 Design of Operating Systems CS 53 Design of Operating Systems Spring 8 Lectre 6: Paging Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Some slides modified from originals by Dave

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization. Branch (Control) Hazards EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o

More information

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University s See P&.,. (except writes) akim Weatherspoon CS, Spring Computer Science Cornell University What will you do over Spring Break? A) Relax B) ead home C) ead to a warm destination D) Stay in (frigid) Ithaca

More information

PS Midterm 2. Pipelining

PS Midterm 2. Pipelining PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry

More information

CS 153 Design of Operating Systems

CS 153 Design of Operating Systems CS 53 Design of Operating Systems Spring 8 Lectre 9: Locality and Cache Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Some slides modified from originals

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

CS 153 Design of Operating Systems Spring 18

CS 153 Design of Operating Systems Spring 18 CS 53 Design of Operating Systems Spring 8 Lectre 2: Virtal Memory Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Recap: cache Well-written programs exhibit

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) s akim Weatherspoon CS, Spring Computer Science Cornell University See P&.,. (except writes) Big Picture: : big & slow vs s: small & fast compute jump/branch targets memory PC + new pc Instruction Fetch

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Lecture 7. Building A Simple Processor

Lecture 7. Building A Simple Processor Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Quiz #1 EEC 483, Spring 2019

Quiz #1 EEC 483, Spring 2019 Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory

More information

Multi-cycle Datapath (Our Version)

Multi-cycle Datapath (Our Version) ulti-cycle Datapath (Our Version) npc_sel Next PC PC Instruction Fetch IR File Operand Fetch A B ExtOp ALUSrc ALUctr Ext ALU R emrd emwr em Access emto Data em Dst Wr. File isters added: IR: Instruction

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015 s and emory Deniz Altinbuken CS, Spring Computer Science Cornell University See P& Chapter:.-. (except writes) Big Picture: emory Code Stored in emory (also, data and stack) compute jump/branch targets

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Lecture 13: Exceptions and Interrupts

Lecture 13: Exceptions and Interrupts 18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab

More information

Memory Hierarchy: The motivation

Memory Hierarchy: The motivation Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Computer Architecture. Lecture 6: Pipelining

Computer Architecture. Lecture 6: Pipelining Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Memory Hierarchy: Motivation

Memory Hierarchy: Motivation Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Solutions for Chapter 6 Exercises

Solutions for Chapter 6 Exercises Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

PART I: Adding Instructions to the Datapath. (2 nd Edition):

PART I: Adding Instructions to the Datapath. (2 nd Edition): EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information