DLX computer. Electronic Computers M
|
|
- Donna Harper
- 6 years ago
- Views:
Transcription
1 DLX computer Electronic Computers 1
2 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90% of cases Waste of silicon Bottleneck: the bus id 80s a new architecture: RISC Solution: reduction of instruction number and complexity (fewer simpler machine instructions) Fixed instruction format (simpler instruction decoders) Simpler control logic network increasing the number of on-chip registers Reduction of bus/memory accesses Increase of machine instructions needed for a job which is (in many cases) more than compensated (in term of time) by the reduction of bus accesses CISC and RISC are each one the best solution in different application fields Nowadays coexistence of both architectures in the same processor: analysis at the end of the course A simplified RISC architecture: DLX (implemented as real processor in the 80s as R4000) 2
3 DLX (fixed) instruction format bit 5 bit 5 bit 5 bit 11 bit R Op-code Ra Rb Rc Cod. op (11 bit) extension Arithmetic or logic instructions; i.e. Ra Rb op Rc or Set Conditions between registers Branch instructions I Op-code Ra Rb Immediate operand or offset Data transfer (Load, Store), conditional Branch, JR and JALR (Control transfer via register), Set Condition e ALU with immediate operator. In Load and ALU instructions Ra=destination, in the Store Ra=source. -- Rb as ALU value for the immediate instructions - Branch instructions J Op-code 26 bit (PC relative) offset 3 Direct, unconditional control transfer(j e JAL)
4 DLX non floating-point instructions (31x32bit registers R31 R1 - R0=0 fixed - Ra and Rb any of the 32 registers) Data Transfer Arithmetic/Logic Control LW LB LBU LHU LH SW SH SB LHI 4 Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, offset(rb) Ra, value ADD ADDI ADDU ADDUI SUB SUBI SUBU SUBUI DIV DIVI ULU ULI SLL SLLI SHR SHRI SLA SLAI OR ORI XOR XORI AND ANDI Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb, value Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb, value Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb, value Ra,Rb,Rc Ra,Rb;value Ra,Rb.Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb,value Ra,Rb,Rc Ra,Rb,value No STACK registers SETx SETIx BEQZ BNEQZ J JR JL JLR Ra,Rb,Rc Ra,Rb,value Ra, offset ( [PC]) Ra, offset ( [PC]) offset Ra offset ( [PC]) Ra N.B. Postfix x (set condition) can be LT, GT, LE, GE, EQ, NE JL (via or non via register) -> Jump and link saving PC in R31 Offset is a value within the instruction Postfix I means «immediate» (value within the instruction) PostfixA means «arithmetic» (sign extension) Postfix U means «unsigned» Value is the immediate within the instruction
5 DLX ALU operations Two inputs data One output data plus flags S1 S Flags ALU Controls OUT 32 S1, S2 : ALU inputs (32 bit) S1 + S2 S1 S2 S1 and S2 S1 or S2 S1 exor S2 Left Shift S1 of S2 positions Right Shift S1 of S2 positions Arithmetic Right Shift S1 of S2 positions S1 S2 0 1 Output Flags Zero Negative sign ALU is a combinatorial circuit!!! 5
6 Sequential DLX Ready? [REG INSTR] ]<= [PC] INSTRUCTION FETCH Abstract instruction execution [X] number of the destination register [PC] <= [PC] +4 [A ]<= [Ra] [B ]<= [Rb] [C] <= [Rc] [X ]<= num [Ra] INSTRUCTION DECODE PC is the Program Counter, A and B are two scratchpad internal registers,reg instr is the register where the new fetched instruction is stored. All these registers are unknown to the programmer Data transfer ALU INSTRUCTION EXECUTION This is a synchronous state diagram Set Jump Branch 6
7 Example: LB (LOAD BYTE format I) Op-code Ra Rb offset LB Ra, offset(rb) I NSTR <= [PC] [PC] <= [PC] +4 [A ]<= [Ra] [B ]<= [Rb] [C ]<= [Rc] [X ]<= num [Ra] Instruction bit 15 (sign) is left extended 16 times Instr is the instruction offset Address is always 32 bit 31 Bbit 0 LSbit Sign extension!! Example [Addr] 7..0 =A7 H => ( ) b LOAD Byte Addr. < =[B] + (Instr 15 ) 16 ## Instr Byte address compute ## => JOIN operator Sign extension [Ra] < =([Addr.] 7 ) 24 ## [Addr.] 7..0 Sign extended address <= FFFFFFA7 H Byte in register 7 Next Instruction
8 Sign extension - example with IR (IR 15 ) 16 ## IR IR From the Control Unit Tri-state devices
9 Ra unsigned Data transfer Instructions (R format) Addr. <= [B] + (Instr 15 ) 16 ## Instr Examples LW Ra, offset(rb) LB Ra, offset(rb) LBU Ra, offset(rb) unsigned LHU Ra, offset(rb) unsigned SW Ra, offset(rb) LB LB(byte) [Ra] <= ([Addr] 7 ) 24 ## [Addr] 7..0 LBU (byte) [Ra] < = (0) 24 ## [Addr] 7..0 LH (half word) LH LHU LHU (half word) [Ra ]< = ([Addr] 15 ) 16 ## [Addr] [Ra] <= (0) 16 ## [Addr] Signed LW [Addr]<=[A] SW 9
10 Register (format R) Immediate (format I) ALUinstructions examples (I format) [T]<= [Rc] [T]<= (Instr 15 ) 16 ## Instr 15..0] (T is a hidden register unknown to the programmer storing temporary data) Register content signed if arithmeticoperations ADD AND [Ra ]<= [Rb ]+ [T] [Ra] <= [Rb] and [T] ADD Ra,Rb,Rc ADDI Ra,Rb,value ADDU Ra,Rb,Rc ADDUI Ra,Rb, value SUB [Ra]<= [Rb] - [T] XOR [Ra] <= [Rb] xor [T] OR [Ra] <=[Rb] or [T] The same scheme for the shift etc. A and B generic registers (Ra, Rb) 10
11 Register (format R) Immediate (format I) SET instructions (see branch) [T]<= [Rc] [T]<= (Instr 15 ) 16 ## Instr 15..0] ex. SLT Ra,Rb,Rc Set Ra=1 if Rb is less than Rc otherwise Ra=0 Register content as signed SEQ SLT SGE (T is a hidden register unknown to the programmer storing temporary data) [Ra] = 1 if [Rb] = [T] [Ra] = 1 if [Rb] < [T] [Ra] = 1 if [Rb] >= [T] SNE SGT SLE [Ra] = 1 if [Rb]! = [T] [Ra] = 1 if [Rb] > [T] [Ra] =1 if [Rb] <= [T] 11
12 format J For saving [PC] in R31 JALR JAL [T] <= [PC] [T] <= [PC] JUP Instructions JALR JR JP JAL format I J offset (jump address) JR Ra (jump register) JL offset (jump and link address) JLR Ra (jump and link register) [PC] <= [Ra] [PC] <= [PC] + (Instr 25 ) 6 ## Instr JALR [R31 ]<= [T] JAL 12
13 BRANCH BEQZ format R BNEZ Branch Instructions [Ra] = 1 [Ra!] = 1 Ex. BNEQZ R5, 100 Jump to PC+100 if R5 not equal 0 YES NO YES NO [PC] <= [PC] + (Instr 15 ) 16 ## Instr INIT 13
14 The Pipelining Principle Pipelining is the main basic technique used for speeding-up a CPU. The key idea for pipelining is general, and is currently applied to several industry fields (productions lines, oil pipelines, ) A system S must operate N times on a task A i producing result R i : A 1, A 2, A 3 A N S R 1, R 2, R 3 R N Latency : time occurring between the beginning and the end of task A (T A ). Throughput : frequency of each task completion 14
15 The Pipelining Principle 1) Sequential System - A new instruction starts when the previous instruction is finished A 1 A 2 A 3 A n t T A A n n-th instruction - Latency (execution time of a single instruction) = T An Different execution times 2) Pipelined System (instruction are subdivided in stages each stage during one n th 1/4 in this example - of the entire instruction time) Successive instructions stages overlap A P 1 P 2 P 3 P 4 t S i : pipeline stage S 1 S 2 S 3 S 4 S 15
16 T P A 1 P 1 P 2 P 3 P 4 A 2 P 1 P 2 P 3 P 4 The Pipelining Principle A 3 P 1 P 2 P 3 P 4 A 4 P 1 P 2 P 3 P 4 T P : pipeline cycle (ideally one clock) For each cycle one instruction terminates In figure A1 terminates at t x Next cycle A2 terminates at t y etc. A n t x t y t P 1 P 2 P 3 P 4 16
17 Typical instruction stages IF ID EX E WB Instruction fetch (from memory) Write-back (if needed jump no need) Instruction decode Instruction execution (ALU) Data memory access (if needed registers instructions no need) N.B. The execution time (latency) of all instructions must be the same, for maintaining the results order. Some stages are not used for some instructions (the stage is a NOP for them) i.e. the stage E for register operations) 17
18 Pipelining of a CPU (DLX) Instruction sequence: I 1, I 2, I 3 I N Instruction j Combinatorial circuits IF ID EX E WB t IF/ID ID/EX EX/E E/WB IF ID EX E WB Registers (Pipeline Registers D FF) CPU (datapath) Pipeline Cycle Clock Cycle Delay of the slowest stage ClockPerInstruction (CPI)=1 (ideally!) 18
19 DLX Pipeline Instr i IF ID EX E WB CPI (ideally) = 1 Instr i+1 IF ID EX E WB Instr i+2 IF ID EX E WB Instr i+3 Instr i+4 IF ID EX E WB IF ID EX E WB Overhead introduced by the Pipeline Registers: T clk = T d + T P + T su Clock Cycle Switch delay of the input stage register Delay of the slowest combinatorial stage Set-up time of the 19 output stage register
20 Tp D Combinatorial Circuit D Switch delay of the input stage register Delay of the slowest combinatorial stage Set-up time of the output stage register 20
21 Each stage is active at each clock cycle. Pipeline implementationrequirements The PC is incremented in the IF stage. An ADDER should be introduced (PC <=PC+4 one instruction is 4 bytes) in the IF stage. But instructions are aligned (each one ends to an address multiple of the instruction length in bytes) and therefore a 30 bit only register (a programmable counter for jumps) is used, incremented by 1 each clock cycle PC Always 0 Two emory Data Registers are required (referred to as LDR e SDR). In fact when a LOAD is immediately followed by a STORE there is a WB/E stages overlap two data waiting therefore to be written (one onto the memory, the other onto a register of the RF). Each clock cycle 2 memory accesses must be possibly executed (IF, E): Instruction emory (I) and Data emory (D): Harvard Architecture The CPU clock is determined by the slowest stage PipelineRegisters store both data and controlinformation ( distributed controlunit) 21
22 Actually a programmable counter DLX Pipelined Datapath IF ID For Set Condition EX for Branch (also <0 and >0) E WB [it acts on the output] if jump 4 A D D PC DEC For computing new PC value when branch JL and JLR (PC in R31) =0? PC INSTR E Ra Rb Rc DR D RF =0? A L U DATA E Sign extension SE Num [R a ] For operations with immediates Number of dest. registers in case of LOAD and ALU instr. destination register number (1-31) Data (from reg. or mem or PC per link) IF/ID ID/EX EX/E E/WB
23 ID stage (N.B. stage layout different from previous slide!) IF/ID IR (Jump; Jump and Link) ID/EX IR 15-0 (Offset/Immediate as dest. reg. in R instr. ) 26 (J and JL) 32 I R IR (Opcode) IR (R Istr.) DEC LB SW Info travelling with the instruction Sign extension 32 P C IR IR IR RF Ra Rb Rc A B DR C D Num Ra IR 15 (31-16) Immed./Branch IR 25 SE (31-26) Jump Sign extension PC 31-0 (JL and JLR) Data (from WB stage) 23 Number of the dest. register (from WB stage)
24 SDR => Store emory Data Register LDR => Load memory data Register IRi => Instruction Register i DLX Pipelined Datapath for Set Condition (also <0 e >0) [it acts on output] IF ID EX E WB for Branch PC 4 A D D Address I X: Computed data or emory Address or Branch Address Y: Computed data from the previous stage Data P P P C C C3 (PC saved in R31) P C4 1 2 I R 1 DEC Ra Rb Rc RF DR D SE Num [R a ] =0? IF/ID ID/EX EX/E E/WB 24 =0? A L U I R 2 destination register number C O ND Z S DR I R 3 JL JLR D L DR Y I R4
25 Pipelined execution of an ALU instruction The result of each stage is sampled at the end of its cycle IF IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 Decoded opcode travels through all stages ID EX E WB A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra] Z<= A op B or Z <= A op [(IR2 15 ) 16 ## IR ] Y <= Z (temporary storage for WB) Ra <= Y [IR4 <.= IR3] [PC3 <= PC2] [IR3 <= IR2] [PC4 <= PC3] NOTE: IRi bits which are dropped stage by stage when no more needed for all instructions. Why? JAL, JALR!! 25
26 Pipelined execution of a E instruction IF ID IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra] Decoded opcode travels through all stages EX E WB AR <= B op (IR2 15 ) 16 ## IR SDR <= A [IR3 <= IR2 [PC3 <= PC2] LDR <= [AR] (if LOAD) or [AR] <= SDR (if STORE) [PC4 <= PC3] [IR4 <= IR3] Ra <= DR (if LOAD) [Sign ext.] 26
27 Pipelined execution of a BRANCH instruction (normally after a SCn instruction see later) Computed new PC address IF ID EX IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra] Z <= PC2 op (IR 15 ) 16 ## IR Cond <= A op 0 [PC3 <= PC2] [IR3 <= IR2] Decoded opcode travels through all stages E WB if (Cond) PC <= Z (NOP) [PC4 <= PC3] [IR4 <= IR3 New value in PC at the end of this cycle. When Branch is taken 3 new unwanted instructions have already started X : BTA (BRANCH TARGET ADDRESS) Branch on Reg A value (0/1) 27
28 Pipelined execution of a JR instruction new PC address IF ID IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 ID A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra] Decoded opcode travels through all stages EX E Z E <= A WB PC <= Z [IR3 <= IR2] [PC3 <= PC2] [IR4 <= IR3] [PC4 <= PC3] WB (NOP) New value in PC in this interval. When Jump executed 3 new unwanted instructions are already started Which would be the stage sequence for a J instruction? 28
29 Pipelined execution of a JL or JLR instruction IF ID EX ID IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra] Z <= A (If JLR) PC3 <= PC2 [IR3 <= IR2] Z <= PC2 + (IR 25 ) 6 ## IR (If JL) E WB PC <= Z ; PC4<= PC3 R31 <= PC4 [IR4 <= IR3] In this case PCi values are used Decoded opcode through all stages NOTE: Write on R31 CANNOT be performed on-the fly since it could overlap with another register write New value in PC in this interval. When Jump executed 3 new unwanted instructions are already started 29
30 Which would be the sequence in case of SCn (ex SLT R1,R2,R3)? IF ID EX E WB ID IR <= [PC] ; PC <= PC + 4 ; PC1 <= PC + 4 A <= Ra; B <= Rb;C<=Rc PC2 <= PC1; IR2<=IR1 ID/EX <= Instruction decode; [X]<= num[ra]??? 30
31 Pipeline Hazards A Hazard occurs when during a clock cycle an instruction currently in a pipeline stage can t be executed in the same clock cycle. Structural Hazards The same resource is used by two different pipeline stages: the instructions currently in those stages can t be executed simultaneously. Data Hazards they are due to instruction dependencies. For example, an instruction that needs to read a RF register not yet written by a previousinstruction (Read After Write). Control Hazards Instructions following a branch depend from the branch result (taken/not taken). The instruction that cannot be executed must be stalled ( pipeline stall or pipeline bubbling ), together with all the following instructions, while the previous instructions must proceed normally (so as to eliminate the hazard). 31
32 Hazards and stalls The consequence of a data hazard: if instruction I i needs the result of instruction I i-1 (data are read in ID stage), must wait until after WB of I i-1 I i-3 I i-2 Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 IF ID EX E WB IF ID EX E I i-1 IF ID EX Clk 6 Clk 7 Clk 8 WB E WB Clk 9 Clk 10 Clk 11 Clk 12 I i IF ID S S S ID WB I i+1 IF S S S IF WB Stall: the clock signal for I i, I i+1 etc. is blocked for three periods T i = 8 * CLK = (5 + 3) * CLK Normally the three stalled instructions are transformed in NOPs to avoid clock blocking T i = 5 * (1 + 3/5 ) * CLK Instruction stalls 32
33 Forwarding Data are read from registers in the ID stage Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 Clk 6 Clk 7 Clk 8 Clk 9 ADD R3, R1, R4 IF ID EX E WB SUB R7, R3, R5 hazard IF ID EX E WB OR R1, R3, R5 hazard IF ID EX E WB LW R6, 100 (R3) hazard IF ID EX E WB AND R9, R5, R3 no hazard IF ID EX E WB Here too the requested data is not yet in RF since it is written on the positive clock edge at the end of WB (register value is read in ID!) Forwarding allows eliminating almost all RAW hazards of the pipeline without stalling the pipeline. (NOTE: in DLX, registers are modified only in WB stage) 33
34 A,B,C source registers 1-31 Forward implementation Combinatorial!! comparison between A,B,C, and R d 1, R d 2 and the Opcodes R d 1 (/OpCode) R d 2/OpCode R d 1, R d 2 destination registers 1-31 RF Bypass A,B,C OpCode PC A C B PC FU A L U IR3 FD3 IR4 PC em ALU Offset FD1 FD2 ID/EX EX/E E/WB FD3 Often performed inside the RF It allows the anticipation of the register on ID/EX control: IF opcode and comparison of RD with Ra, Rb and Rc numbers 34
35 Forward Unit implementation Does the instruction in the em stage want to write a register? Yes Does the instruction in the E or WB stage will write a register number which is identical to Ra or Rb or Rc number? No No FD1 FD2 Yes No Is the destination register number identical to Ra or Rb or Rc number? No Does the instruction in the WB stage want to write a register? Does the fetched instruction needs the register in em stage? Yes FD1 Yes Is the destination register number identical to Ra or Rb or Rc number FD3 Yes No Does the instruction in the WB stage want to write a register? Yes Is the destination register number identical to Ra or Rb or Rc number and different from the register which will be written by the E stage? No No No FD2 Yes No FD2 Yes NO FD1 Does the fetched instruction needs the register being written by WB stage? FD3 Yes No NoFD3 FD1 35
36 This slide must be viewed using its.ps version Data hazard due to LOAD instructions LW R1,32(R6) IF ID EX E WB ADD R4,R1,R7 SUB R5,R1,R8 AND R6,R1,R7 IF ID EX E IF ID EX NOTE: the data required by the ADD is available only at the end of E stage. This hazard cannot be eliminated by forwarding (unless there is an additional input in the s between memory and ALU delays!) IF ID Transformed in NOP PC-<PC-4 (Re-fetch) The pipeline needs to be stalled LW R1,32(R6) IF ID EX E WB ADD NOP R4,R1,R7 IF ID EX S E EX E WB ADD R4,R1,R7 IF ID EX E SUB R5,R1,R8 IF ID EX AND R6,R1,R7 IF ID From the end of this stage onwards: standard forwarding 36
37 Delayed load In many RISC CPUs, the special hazard associated with the LOAD instruction (which would in any case lead to a stall ) is not handled by stalling the pipeline but by software through the compiler (delayed load). In this example R3 is needed by the ADD instruction while it is read from the memory [instruction LW R3, 10(R4)]. Please notice that in any case a hardware forward netwotk is required LW R1,32(R6) LW R3,10 (R4) ADD R5,R1,R3 LW R6, 20 (R7) LW R8, 40(R9) LW R1,32(R6) LW R3,10 (R4) LW R6, 20 (R7) ADD R5,R1,R3 LW R8, 40(R9) Forward hardware LOAD Instruction delay slot Next instruction The compiler tries to fill the delay-slot with a useful instruction (worst case: NOP). 37
38 PC BEQZ R4, 200 Control Hazards PC+4 SUB R7, R3, R5 PC+8 OR R1, R3, R5 PC+12 LW R6, 100 (R8) Next InstructionAddress R4 = 0 : (taken) R4 0 : PC+4 (not taken) Branch Target Address PC (BTA) BEQZ R4, 200 SUB R7, R3, R5 OR R1, R3, R5 LW R6, 100 (R8) AND R9, R5, R3 Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 IF ID EX E WB IF ID IF EX E ID New computed PC value (Aluout) New value in PC (one clock after: new value must be clocked onto the PC) EX Clk 6 Clk 7 Clk 8 WB E WB IF ID EX E WB IF ID EX E WB Fetch with the new PC 38
39 Detailed dapath slide: See DLX Pipelined Datapath Here we assume that the JP instruction is the Ith instruction Instruction Fetch ID EX 4 A D D JI + 1P 32 DLX Branch or JP DEC JI + P1 2 BEQZ R4, 200 JI + P1 NOTE if the feedback signal of the new PC were output directly from the ALU output instead of Z the required stalls would be only two slower clock! E J P WB RF =0? PC I Ra Rb Rc DR D RF =0? A L U Z D PC em ALU When the new PC acts on the I three instructions have already travelled through the first three stages (EX included) IF/ID SE Num [R a ] ID/EX EX/E 39 E/WB
40 BEQZ R4,200 Handling the Control Hazards Always Stall (three-clock block being propagated) IF here: the previous instruction (BEQZ) has not been yet decoded Predict Not Taken NOP NOP NOP BEQZ R4, 200 SUB R7, R3, R5 OR R1, R3, R5 Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 IF ID EX E WB IF LW R6, 100 (R8) No problem because no instruction in WB stage S S S S IF Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 IF ID EX E WB IF S Here the new value of PC has been computed ID IF IF ID Clk 6 Clk 7 Clk 8 EX E IF ID Here the new value is sampled by the PC EX ID Clk 6 Clk 7 Clk 8 WB E WB EX E Fetch at new PC Real situation Repeated IF PC <= PC - 4 Branch Completion If branch taken: flush. They become NOP. No data yet written WB 40
41 When the Branch Target Address is clocked into the PC three unwanted instructions are already in IF/ID, ID/EX and EX/E Stalls with jumps (1/3) IF ID EX E WB Active if jump Jump forced NOP 4 A D D N O P PC DEC N O P N O P =0? PC I Ra Rb Rc DR D RF =0? A L U D Three NOPs UST replace the 3 unwanted instructions already started SE Num [R a ] IF/ID ID/EX EX/E E/WB 41 Data
42 NOTE in this case the jump condition detection and the new PC value are input to the in the same clok interval 4 A D D Stalls with jump (2/3) IF ID EX E WB Active if jump forced NOP when jump N O P PC DEC N O P =0? PC I Ra RF RS1 Rb RS2 Rc DR D =0? A L U DATA D E Two NOPs UST replace the 2 unwanted instructions already started SE Num [R a ] IF/ID ID/EX EX/E E/WB 42 Data
43 NOTE In this case the jump condition and the new PC act on the in the same Stalls with jump (3/3) period when the condition is detected Very slow clock solution! IF ID EX E WB Active if jump 4 A D D N O P Becomes NOP if jump PC DEC =0? PC I Ra Rb Rc DR D RF =0? A L U DATA D E A NOP UST replace the unwanted instruction already started SE Num [R a ] IF/ID ID/EX EX/E E/WB 43 Data
44 Delayed branch Similarly to the LOAD case. In several RISC CPUs the BRANCH instructions hazard is handled by SW through the compiler (delayed branch): BRANCH instruction delay slot delay slot delay slot The compiler tries to fill the delay-slots with useful instructions (worst case: NOP). Next instruction 44
45 Delayed branch/jump Original Add R5, R4, R3 Sub R6, R5, R2 Or R14, R6, R21 Sne R1, R8, R9 ; Br R1, +100 branch condition Obviously in this instructions group there must be no jumps!!! Compiled Sne R1, R8, R9 ; branch condition Br R1, +100 Add R5, R4, R3 Sub R6, R5, R2 Or R14, R6, R21 Executed in both cases Instead of one or more postponed instructions, the compiler inserts NOPs when no suitable instructions are available 45
46 Handling the Control Hazards Dynamic Prediction: Branch Target Buffer => no stall (almost..) PC TAGS Predicted PC T/NT T/NT taken/not taken N.B. Here the branch slot is selected during the IF clock cycle that loads IR1 in IF/ID = HIT : Fetch with predicted PC ISS : Fetch with PC + 4 Correct prediction : Wrong prediction : no stalls 1-3 stalls (correct fetch in ID or EX, see before) 48
47 Prediction Buffer: the simplest implementation uses a single bit that indicates what happened when last branch occurred. Loop1 Loop2 When the program ends loop2, the prediction fails (branch predicted as taken but actually it is untaken), then it fails again when it predicts as untaken whilst entering once again loop2 In case of predominance of one prediction, when the opposite situation occurs we have two consecutive errors. 49
48 Usually two bits. TAKEN TAKEN UNTAKEN TAKEN TAKEN TAKEN UNTAKEN UNTAKEN UNTAKEN TAKEN UNTAKEN UNTAKEN 50
Computer Architectures. DLX ISA: Pipelined Implementation
Computer Architectures L ISA: Pipelined Implementation 1 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationA Model RISC Processor. DLX Architecture
DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register
More informationDesign for a simplified DLX (SDLX) processor Rajat Moona
Design for a simplified DLX (SDLX) processor Rajat Moona moona@iitk.ac.in In this handout we shall see the design of a simplified DLX (SDLX) processor. We shall assume that the readers are familiar with
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationEN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts
EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationSpeeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land
Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA
CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationTSK3000A - Generic Instructions
TSK3000A - Generic Instructions Frozen Content Modified by Admin on Sep 13, 2017 Using the core set of assembly language instructions for the TSK3000A as building blocks, a number of generic instructions
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationR-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions
Experiment 4 R-type Instructions 4.1 Introduction This part is dedicated to the design of a processor based on a simplified version of the DLX architecture. The DLX is a RISC processor architecture designed
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationEXERCISE 3: DLX II - Control
TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3200 Computer Architectures I EXERCISE 3: DLX II - Control.. 2007 Group Name Email Student nr. DLX-CONTROL The meaning of
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization
CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationISA: The Hardware Software Interface
ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationDLX: A Simplified RISC Model
DLX: A Simplified RISC Model 1 DLX Pipeline Fetch Decode Integer ALU Data Memory Access Write Back Memory Floating Point Unit (FPU) Data Memory IF ID EX MEM WB definition based on MIPS 2000 commercial
More informationComputer Architecture. The Language of the Machine
Computer Architecture The Language of the Machine Instruction Sets Basic ISA Classes, Addressing, Format Administrative Matters Operations, Branching, Calling conventions Break Organization All computers
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationA Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3
A Processor Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 2.16-20, 4.1-3 Let s build a MIPS CPU but using Harvard architecture Basic Computer System Registers ALU
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationISA and RISCV. CASS 2018 Lavanya Ramapantulu
ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationPipelining: Basic Concepts
Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationProcessor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4
Processor Han Wang CS3410, Spring 2012 Computer Science Cornell University See P&H Chapter 2.16 20, 4.1 4 Announcements Project 1 Available Design Document due in one week. Final Design due in three weeks.
More informationDLX: A Simplified RISC Model
1 DLX Pipeline DLX: A Simplified RISC Model Integer ALU Floating Point Unit (FPU) definition based on MIPS 2000 commercial microprocessor 32 bit machine address, integer, register width, instruction length
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationPipeline Architecture RISC
Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must
More information6.823 Computer System Architecture Datapath for DLX Problem Set #2
6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationLaboratory Exercise 6 Pipelined Processors 0.0
Laboratory Exercise 6 Pipelined Processors 0.0 Goals After this laboratory exercise, you should understand the basic principles of how pipelining works, including the problems of data and branch hazards
More informationAnne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Chapter: 2.16-2.20, 4.1-4.4,
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationVertieferlabor Mikroelektronik (85-324) & Embedded Processor Lab (85-546) Task 5
FB Elektrotechnik und Informationstechnik Prof. Dr.-Ing. Norbert Wehn Dozent: Uwe Wasenmüller Raum 12-213, wa@eit.uni-kl.de Task 5 Introduction Subject of the this task is the extension of the fundamental
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationPipelining: Hazards Ver. Jan 14, 2014
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:
More informationInstruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers
CSE 675.02: Introduction to Computer Architecture MIPS Processor Memory Instruction Set Architecture of MIPS Processor CPU Arithmetic Logic unit Registers $0 $31 Multiply divide Coprocessor 1 (FPU) Registers
More informationImproving Performance: Pipelining
Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationComputer Architecture (TT 2011)
Computer Architecture (TT 2011) The MIPS/DLX/RISC Architecture Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011 Outline ISAs Overview MIPS/DLX Instruction Formats D. Kroening:
More informationAnne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]
Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Understanding the basics of a processor We now have the technology to build a CPU! Putting it all
More informationInstruction Set Principles. (Appendix B)
Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More information6.004 Tutorial Problems L22 Branch Prediction
6.004 Tutorial Problems L22 Branch Prediction Branch target buffer (BTB): Direct-mapped cache (can also be set-associative) that stores the target address of jumps and taken branches. The BTB is searched
More informationThe MIPS Instruction Set Architecture
The MIPS Set Architecture CPS 14 Lecture 5 Today s Lecture Admin HW #1 is due HW #2 assigned Outline Review A specific ISA, we ll use it throughout semester, very similar to the NiosII ISA (we will use
More informationThese actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.
MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously
More informationAppendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,
Appendix C Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Pipelining Multiple instructions are overlapped in execution Each is in a different stage Each stage is called
More informationChapter 2. Instruction Set Design. Computer Architectures. software. hardware. Which is easier to change/design??? Tien-Fu Chen
Computer Architectures Chapter 2 Tien-Fu Chen National Chung Cheng Univ. chap2-0 Instruction Set Design software instruction set hardware Which is easier to change/design??? chap2-1 Instruction Set Architecture:
More informationAppendix C. Abdullah Muzahid CS 5513
Appendix C Abdullah Muzahid CS 5513 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero) Single address mode for load/store: base + displacement no indirection
More informationMIPS Instruction Set
MIPS Instruction Set Prof. James L. Frankel Harvard University Version of 7:12 PM 3-Apr-2018 Copyright 2018, 2017, 2016, 201 James L. Frankel. All rights reserved. CPU Overview CPU is an acronym for Central
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationVery Simple MIPS Implementation
06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationLecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.
Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control
More informationCHAPTER 2: INSTRUCTION SET PRINCIPLES. Prepared by Mdm Rohaya binti Abu Hassan
CHAPTER 2: INSTRUCTION SET PRINCIPLES Prepared by Mdm Rohaya binti Abu Hassan Chapter 2: Instruction Set Principles Instruction Set Architecture Classification of ISA/Types of machine Primary advantages
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationP = time to execute I = number of instructions executed C = clock cycles per instruction S = clock speed
RISC vs CISC Reduced Instruction Set Computer vs Complex Instruction Set Computers for a given benchmark the performance of a particular computer: P = 1 II CC 1 SS where P = time to execute I = number
More informationComputer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,
More informationEECS 322 Computer Architecture Improving Memory Access: the Cache
EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationAdvanced Computer Architecture Pipelining
Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationOverview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP
Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More information101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009
101 Assembly ENGR 3410 Computer Architecture Mark L. Chang Fall 2009 What is assembly? 79 Why are we learning assembly now? 80 Assembly Language Readings: Chapter 2 (2.1-2.6, 2.8, 2.9, 2.13, 2.15), Appendix
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More information