Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Size: px

Start display at page:

Download "Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion"

Henry Evans
5 years ago
Views:

1 . (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA = IorD = Start IRWrite SrcB = Op = PCWrite PCSorce = (Op = 'LW') or (Op = 'SW') Eection (Op = R-type) Branch completion decode/ register fetch SrcA = SrcB = Op = (Op = 'BEQ') (Op = 'J') Jmp completion SrcA = SrcA = SrcB = PCWrite SrcB = Op = PCSorce = Op= PCWriteCond PCSorce = 3 (Op = 'LW') emory access (Op = 'SW') 5 emory access 7 R-type completion em IorD = emwrite IorD = Dst = Write emto = 4 Write-back step Dst= Write emto=

2 em IorD emwrite IRWrite Dst Write SrcA PC Address emory emdata Write data [25 2] [2 6] [5 ] register [5 ] [5 ] register register 2 data isters Write register data 2 Write data A B Zero reslt Ot emory data register 6 Sign etend 32 Shift left 2 contrl emto SrcB Op Step name Action for R-type instrctions Action for memory-reference instrctions Action for branches Action for jmps fetch decode/register fetch IR <= emory[pc] PC <= PC + 4 A <= [IR[25:2]] B <= [IR[2:6]] Ot <= PC + (sign-etend (IR[5:]) << 2) Eection, address Ot <= A op B Ot <= A + sign-etend if (A ==B) then PC <= {PC [3-28], comptation, branch/ (IR[5:]) PC <= Ot (IR[25:],2 b)} jmp completion emory access or R-type R [IR[5:]] <= Load: DR <= emory[ot] completion Ot or Store: emory [Ot] <= B emory read completion Load: [IR[2-6]] <= DR Answers: See Figre Please note that there are two errors in state 4 in Figre Specifically, Dst shold be and emto shold be. Please refer to Figre 5.33 for correct control signal assignment in state 4.

3 2. (Chapter 6) Consider the pipeline datapath below, and the two instrctions, add $rd, $rs, $rt and lw $rt, offset($rs). (a) Give a set of code seqences that only consist of lw and add, in which the pipeline stall can be resolved by forwarding. Jstify yor answer. (b) What is the fowarda vale (either or ), if the third instrction of the code seqence add $, $, $2 add $, $, $3 add $, $, $4 is at its eection step? Jstify yor answer. (c) odify the register nmbers sed in the code in (b) so that forwarda becomes the alternative vale. Note that the ltimate reslt in $ after the eection of the code seqence mst remain the same. (Hint: Introdce additional register, sch as $5). Hazard detection nit ID/EX.em ID/EX IF/IDWrite IF/ID Control WB EX EX/E WB E/WB WB PCWrite PC memory isters ForwardA Data memory IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd ID/EX.isterRt ForwardB Rt Rd Rs Rt Forwarding nit EX/E.isterRd E/WB.isterRd Answers: (a) It is not possible to do time-backward forwarding; hence, the pipeline stall for the code seqence of lw $2, 2($) add $4, $2, $6 cannot be resolved by forwarding techniqe becase lw reqires at least 4 clocks to obtain the accessed memory content, bt the net instrction add needs the content one clock ahead. (See the mltiple-clock-cycle pipeline diagram below.) Notably, the qestion is asking for a set of code seqences that only consist of lw and add, in which the pipeline stall can be resolved by forwarding; hence, any seqence of codes that do not reqire time-backward forwarding is a legitimate answer.

4 Time (in clock cycles) Program CC CC 2 CC 3 CC 4 CC 5 CC 6 eection order (in instrctions) lw $2,2($) I D CC 7 CC 8 CC 9 add $4, $2,$5 I D (b) ForwardA = since E/WB.isterRd = EX/E.isterRD = ID/EX.isterRs, and we shall se the more recent reslt. (c) ForwardA = when E/WB.isterRd = ID/EX.isterRs bt EX/E.isterRD ID/EX.isterRs. Hence, the code add $, $, $2 add $5, $3, $4 add $, $, $5 satisfies the need. 3. (Chapter 7) (a) For 32-bit byte addressing (in IPS), how many tag bits are reqired for a direct-mapped cache with for-word blocks and a total size of 6 words. (b) Here is a series of address references given as word addresses: 2, 3,, 6, 2, 3, 64, 48, 9,, 3, 22, 4, 27, 6, and. Tablar each reference in the list as a hit or miss and show the final contents of the cache. Answers: (a) 26 bits. Address (showing bit positions) Hit Tag Byte offset Inde Block offset Data 26 bits 28 bits V Tag Data 4 entries (b) Hit or miss for a directly mapped cache is determined by the reqired tag nmber and cache block nmber (i.e., inde). The former can be derived by the formla of Integer[Address/6], while the ladder can be obtained sing (Integer[Address/4] mod 4). Hence, for word addresses 2,

5 3,, 6, 2, 3, 64, 48, 9,, 3, 22, 4, 27, 6, and, the reqired tag-nmber/block-nmber are respectively /(2), /(3), /2(), /(6), /(2), /3(3), 4/(64), 3/(48), /(9), /2(), /(3), /(22), /(4), /2(27), /(6), and /2(). Therefore, the hit-and-miss list is as follows. (Address referenced) Hit or miss Tag Address for content /(2) miss Address Address for content for content Address for content /(3) hit /2() miss /(6) miss /(2) miss /3(3) miss 4/(64) miss 3/(48) miss /(9) miss /2() hit /(3) miss /(22) hit /(4) miss

/2(27) miss 4 5 6 7 24 25 26 27 /(6) hit 4 5 6 7 24 25 26 27 /2() miss 4 5 6 7 Note that the nmbers in black in the table are not reqested parts of the answer. Only those nmbers in ble are necessary.

6 /2(27) miss /(6) hit /2() miss Note that the nmbers in black in the table are not reqested parts of the answer. Only those nmbers in ble are necessary. 4. (Chapter 8) (a) Calclate the average time to read a 52-byte sector for a disk with 5ms average seek time, 6 RP, 5B/sec transfer rate and 2ms controller overhead. (Here, = 2 2. Assme that the average rotational latency is the time to rotate 8 degree for the disk.) (b) Consider the following diagram of two potential ways of nmbering the sectors of data on a disk. Which way of nmbering the sectors will be likely to reslt in higher performance, if typical reads are contigos in sector nmbers? Jstify yor answer. Answer: (a) Average seek time + average rotational delay + transfer time + controller over head = 5 msec +.5/(6/6) sec +.5 KB/ (5B/sec) + 2 msec = 5ms + 5ms +.ms + 2ms = 2.ms. (b) After reading sector 7, a seek is necessary to get to the track with sector 8 on it. This will take some time (on the order of a millisecond, typically), dring which the disk will contine to revolve nder the head assembly. Ths, in the version where sector 8 is in the same anglar position as sector, sector 8 will have already revolved past the head by the time the seek is completed and some large fraction of additional revoltion time will be needed to wait for it to come back again. By skewing the sectors so that sector 8 starts later on the second track, the seek will have time to complete, and then the sector will soon thereafter appear nder the head withot additional revoltion.

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining