The single cycle CPU [2 0] Shift Jump address [31 0] left 2 26 28 0 1 Add + [31 28] [31 26] Control RegDst Jump Branch MemRead MemtoReg Op MemWrite Src RegWrite Shift left 2 Add result M u x 1 0 M u x Read address [31 0] memory [2 21] [20 16] [1 11] 0 M u x 1 Read register 1 Write data Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Zero result Address Write data Data memory Read data 1 M u x 0 [1 0] 16 32 Sign extend control [ 0]
Performance of Single-Cycle Machines Unit 2 ns and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode Write Ba Total R-format 2 1 2 0 1 6 LW 2 1 2 2 1 8 SW 2 1 2 2 7ns Branch 2 1 2 ns Jump 2 2ns
מה היה קורה עם cycle של השעון היה באורך משתנה נשווה לגבי תוכנית עם התערובת הבאה של פקודות: Rtype: %, LW: 2%, SW: 12% BRANCH: 18%, JUMP: 2% - מספר פקודות בתוכנית - אורך מחזור שעון I T - מספר מחזורים לפקודה = 1 CPI Execution=I*T*CPI= 8*2%+7*12%+6*%+*18%+2*2%=6.3 ns
How to save time? Idea 1: Clo with variable rate Drawbas: Complex to construct Not modular (other components in the system) Hardware components sit idle for most of the time Better Idea: Fixed clo cycle, each instruction takes a different
Multicycle Approach הרעיון מאחורי שיטת ה- Multicycle: חיסכון בזמן: כל פקודה תקח את מספר היחידות השעון הנחוצות לה. חיסכון ברכיבים: שימוש באותו רכיב בשלבים שונים של הפקודה.
שיטת הבניה של ארכיטקטורת ה- Multicycle חלק את הפקודה לשלבים. כל שלב cycle: - אזן את כמות העבודה הנדרשת בכל שלב. - הקטן את כמות העבודה הנדרשת בכל שלב - כל שלב יבצע רק פעולה אחת פונקצינאלית. בסיום כל מחזור שעון: - שמור את הערכים עבור השלבים הבאים. - הוסף לביצוע משימה זו רגיסטרים פנימיים נוספים.
Timing of a lw instruction in a single cycle CPU I.Mem data 0x00000 output Rs, Rt inputs We want to replace a long single CK cycle with short ones: Timing of a lw instruction in a multi-cycle CPU D.Mem adrs D. Mem data A,B out Mem data MDR fetch 2ns 0x00000 fetch decode decode execute execute memory output (address) memory Mem data Write ba 1ns 2ns 2ns 1ns 0 1 2 3 =(0) in calculates something Write ba
Therefore we should add registers to the single cycle CPU shown below: Adder Reg File [2:21]=Rs [20:16]=Rt Data Address D. Out Rd D.In [1:0] 16 Sext 16->32
Adding registers to split the instruction to stages: Adder [2:21]=Rs [20:16]=Rt Reg File A 2 out Data Address D. Out MDR Write 0 1 Rd B 3 D.In [1:0] 16 Sext 16->32
A multi--cycle CPU capable of R-type & lw/sw & branch instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 << 2 <<2
Let us explain the multi-cycle CPU First we ll look at a CPU capable of performing only R- type instructions Then, we ll add the lw instruction And the sw instruction Then, the beq instruction And finally, the j instruction
0x00000 0x0000 output New output Adder Rs, Rt inputs new inputs fetch decode execute output Write ba New output [31:26] 6 [2:21]=Rs [20:16]=Rt Reg File [1:11]=Rd [:0]=funct 6 Let us remind ourselves how works a single cycle CPU capable of performing R-type instructions. Here you see the data-path and the timing of an R-type instruction.
A single cycle CPU demo: R-type instruction [2:21]=Rs Reg File [20:16]=Rt [1:11]=Rd
A multi cycle CPU capable of performing R-type instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B
A multi cycle CPU capable of R-type & instructions fetch & data [2:21]=Rs [20:16]=Rt Reg File A out 0 1 Rd B
A multi cycle CPU capable of R-type & instructions decode & data [2:21]=Rs [20:16]=Rt Reg File A out 1 Rd B 2
A multi cycle CPU capable of R-type & instructions execute & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B 3 2
A multi cycle CPU capable of R-type & instructions write ba & data [2:21]=Rs [20:16]=Rt Reg File A out Rd Rd B 3
Timing of an R-type instruction in a single cycle CPU Inst. Mem data Rs, Rt GPR input 0x00000 output = the instruction inputs output (Data = result of cala.) fetch decode execute Write Ba 0 1 2 3 (=0) Timing of an R-type instruction in a multi-cycle CPU Mem data A,B fetch Previous inst. Current instruction decode execute out Write ba
fetch Mem data Current instruction =M ( ) Previous inst. Current instruction next inst. GPR outputs A,B decode A= Rs, B= Rt output execute uot= A op B out Write ba Rd = out R-Type instruction takes CKs At the rising edge of CK: Rd=out Write The state diagram: =M() A= Rs, B= Rt out = A op B Rd=out
A multi-cycle CPU capable of R-type instructions ( calc. ) & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B
fetch current next = current + Mem data Previous inst. current instruction next inst. GPR outputs decode A,B output execute Write ba out = + uot = A op B At the rising edge of CK: Rd=out Write
A multi cycle CPU capable of R-type & instructions fetch [2:21]=Rs [20:16]=Rt Reg File A out Rd B
The state diagram of a CPU capable of R-type instructions only Fetch 0 =M() = + Decode 1 A=Rs B=Rt R-type 6 out=a op B WBR 7 Rd = out
Fetch 0 The state diagram of a CPU capable of R-type and lw instructions out= A+sext(imm) AdrCmp 2 lw lw Decode 1 R-type 6 Load 3 MDR = M(out) WB Rt = MDR WBR 7
We added registers to split the instruction to stages. Let s discuss the lw instruction Adder [2:21]=Rs [20:16]=Rt Reg File A 2 out Data Address D. Out MDR Write 0 1 Rd B 3 D.In [1:0] 16 Sext 16->32
First we draw a multi-cycle CPU capable of R-type & lw instructions: [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32 We just moved the data memory All parts related to lw only are blue
A multi-cycle CPU capable of R-type & lw instructions fetch [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32
A multi-cycle CPU capable of R-type & lw instructions decode [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDE [1:0] 16 Sext 16->32 << 2
A multi-cycle CPU capable of R-type & lw instructions AdrCmp [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32
A multi-cycle CPU capable of R-type & lw instructions memory Branch Address [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32 << 2
A multi-cycle CPU capable of R-type & lw instructions WB [2:21]=Rs [20:16]=Rt Rd Reg File Rt A B out Data MDR [1:0] 16 Sext 16->32
Can we unite the & Data memories? (They are not used simultaneously as in the single cycle CPU) [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32
So here is a multi-cycle CPU capable of R-type & lw instructions using a single memory for instructions & data & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B MDR [1:0] 16 Sext 16->32
0x00000 Timing of a lw instruction in a single cycle CPU I.Mem data Rs, Rt D.Mem adrs D. Mem data fetch decode execute output inputs memory output (address) Mem data Write ba Timing of a lw instruction in a multi-cycle CPU A,B out fetch Previous inst. + current instruction decode execute Data address Mem data MDR memory Data to Rt Write ba
fetch Mem data =M ( ) = + Previous inst. current instruction GPR outputs A,B decode A= Rs, B= Rt output out execute Data address Data address uot= A+sext(imm) Mem data memory MDR=M(out) MDR Write ba Data to Rt Write, Write At the rising edge of CK: Rt=MDR
Fetch 0 =M() = + The state diagram of a CPU capable of R-type and lw instructions lw Decode 1 R-type A=Rs B=Rt out= A+sext(imm) AdrCmp 2 6 out=a op B Load 3 MDR = M(out) WB Rt = MDR WBR Rd = out 7
A multi-cycle CPU capable of R-type & lw & sw instructions Branch Address & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B MDR [1:0] 16 Sext 16->32 << 2 lw sw
Fetch 0 =M() = + The state diagram of a CPU capable of R-type and lw and sw instructions lw+sw out= A+sext(imm) AdrCmp 2 lw sw Decode 1 R-type 6 A=Rs B=Rt out=a op B MDR = M(out) Load 3 Store M(out)=B Rt = MDR WB WBR 7 Rd = out
A multi-cycle CPU capable of R-type & lw/sw & branch instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 << 2 <<2
Fetch 0 Adding the instruction beq to the state diagram: lw+sw Decode 1 R-type beq lw Load 3 AdrCmp Branch 2 8 6 sw Store not zero Calc Rs -Rt (just to produce the zero signal) zero Calc =+sext(imm)<<2 WB WBR 7
Calc out=+sext(imm)<<2 lw+sw Fetch 0 Decode 1 R-type beq Adding the instruction beq to the state diagram, a more efficient way: Let s use the decode state in which the is doing nothing to compute the branch address. We ll have to store it for 1 more CK cycle, until we know whether to branch or not! (We store it in the out reg.) AdrCmp Branch 2 8 6 lw sw Calc Rs - Rt. If zero, load the with out data, else do not load the Load 3 Store WB WBR 7
A multi-cycle CPU capable of R-type & lw/sw & branch instructions + & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 <<2 Branch Address
Fetch 0 Adding the instruction j to the state diagram: Decode lw+sw 1 R-type beq j AdrCmp Branch 2 8 lw sw 6 Jump 9 = [31:28] [2:0]<<2 Load 3 Store WB WBR 7
A multi-cycle CPU capable of R-type & lw/sw & branch & jump instructions += next address Jump address <<2+ [2:0] [31:28] & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 <<2 Branch Address
סיכום שלבי הפקודות השונות Step name fetch decode/register fetch Action for R-type instructions Action for memoryreference instructions branches Action for = [] = + A = Reg [[2-21]] B = Reg [[20-16]] Out = + (sign-extend ([1-0]) << 2) Action for jumps Execution, address Out = A op B Out = A + sign-extend if (A ==B) then = [31-28] II computation, branch/ ([1-0]) = Out ([2-0]<<2) jump completion 6 2 8 9 access or R-type Reg [[1-11]] = Load: MDR = [Out] completion Out 3 or 7 Store: [Out] = B read completion 0 1 Load: Reg[[20-16]] = MDR
MultiCycle implementation with Control 0 M u x 1 Address Write data MemData [31-26] [2 21] [20 16] [1 0] register [1 0] data register WriteCond Write IorD Outputs MemRead MemWrite MemtoReg Write [2 0] Control Op [ 0] 0 M u [1 11] x 1 0 M u x 1 Source Op SrcB 16 SrcA RegWrite RegDst Read register 1 Read register 2 Registers Write register Write data Sign extend Read data 1 Read data 2 32 Shift left 2 A B 0 M u x 1 0 1 M u 2 x 3 26 28 Shift left 2 control [31-28] Zero result Jump address [31-0] Out 0 1 2 M u x [ 0]
(Op = 'LW') (Op = 'J') Final State Machine 2 address computation SrcA = 1 SrcB = 10 Op = 00 Start fetch 0 MemRead SrcA = 0 IorD = 0 Write SrcB = 01 Op = 00 Write Source = 00 6 (Op = 'LW') or (Op = 'SW') Execution SrcA =1 SrcB = 00 Op= 10 8 (Op = R-type) Branch completion SrcA = 1 SrcB = 00 Op = 01 WriteCond Source = 01 decode/ register fetch 1 (Op = 'BEQ') 9 SrcA = 0 SrcB = 11 Op = 00 Jump completion Write Source = 10 3 access (Op = 'SW') access 7 R-type completion MemRead IorD = 1 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 Write-ba step RegDst = 0 RegWrite MemtoReg =1
Fetch 0 The final state diagram: Decode lw+sw 1 R-type beq j AdrCmp Branch 2 8 lw sw 6 Jump 9 Load 3 Store WB WBR 7
End of multi-cycle implementation