CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Similar documents
Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

MIPS-Lite Single-Cycle Control

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

Working on the Pipeline

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

COMP303 Computer Architecture Lecture 9. Single Cycle Control

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

How to design a controller to produce signals to control the datapath

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

Lecture 7 Pipelining. Peng Liu.

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Recap: The MIPS Subset ADD and subtract EEL Computer Architecture shamt funct add rd, rs, rt Single-Cycle Control Logic sub rd, rs, rt

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Lecture #17: CPU Design II Control

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4 The Processor 1. Chapter 4A. The Processor

CPU Organization (Design)

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

COMP2611: Computer Organization. The Pipelined Processor

Major CPU Design Steps

UC Berkeley CS61C : Machine Structures

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Pipelining. CSC Friday, November 6, 2015

The Processor: Datapath & Control

Chapter 4. The Processor

Single Cycle CPU Design. Mehran Rezaei

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4. The Processor

CS61C : Machine Structures

Processor (I) - datapath & control. Hwansoo Han

CS61C : Machine Structures

CPU Design Steps. EECC550 - Shaaban

UC Berkeley CS61C : Machine Structures

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Chapter 4. The Processor

Lecture 6 Datapath and Controller

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Designing a Multicycle Processor

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Computer Architecture. Lecture 6.1: Fundamentals of

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

CENG 3420 Lecture 06: Datapath

COMPUTER ORGANIZATION AND DESIGN

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

LECTURE 5. Single-Cycle Datapath and Control

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Systems Architecture

COMPUTER ORGANIZATION AND DESIGN

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Chapter 4. The Processor

ECE232: Hardware Organization and Design

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

UC Berkeley CS61C : Machine Structures

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Ch 5: Designing a Single Cycle Datapath

CS61C : Machine Structures

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Final Exam Spring 2017

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

LECTURE 3: THE PROCESSOR

CS 61C: Great Ideas in Computer Architecture Lecture 12: Single- Cycle CPU, Datapath & Control Part 2

Pipeline design. Mehran Rezaei

Processor (II) - pipelining. Hwansoo Han

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single Cycle MIPS CPU

Chapter 4. The Processor Designing the datapath

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2. Clk

If you didn t do as well as you d hoped

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Transcription:

3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design a control unit for the single-cycle processor (i.e., how to set up 8 control signals) We will also learn the pipeline processor (i.e., a much faster implementation) Processor Control Memory Input Datapath Output 2

3/6/8 How to Control the Instruction Fetch Unit? The first control signal: PC_sel Inst Memory Adr Instruction<3:> PC_sel st control PC_sel works as follows:. Increase by 4 if PC_sel = 2. Branch target if PC_sel = 4 imm6 Sign Et Adder Adder PC Mu Clk 3 Activated Datapath for Eecuting Add Rd Rt RegDst = Mu Rs Rt RegWr = 5 5 5 busw 32 Clk 3 8 control signals imm6 26 2 6 6 op rs rt rd shamt funct Rw Ra Rb 32 32-bit Registers 6 PC_sel= incr busb 32 Etender Clk busa 32 ALUctr = Add 32 Mu EtOp = don t care ALUSrc = Instruction Fetch Unit Data In ALU Clk Rs Rt Rd MemWr = Zero 32 32 Instruction<3:> <2:25> <6:2> WrEn Adr Data Memory <:5> 32 <:5> Imm6 MemtoReg = Mu 4 2

3/6/8 But, How to Generate Correct Control Signals? Control signals are derived from the instruction R-type Load/ Store Branch rs rt rd shamt funct 3:26 25:2 2:6 5: :6 5: 35 or 43 rs rt address 3:26 25:2 2:6 5: 4 rs rt address 3:26 25:2 2:6 5: opcode always read read, ecept for load write for R-type and load sign-etend then add 5 Adding Control to Datapath Instruction<3:> Inst Memory Adr <2:6> <2:25> <5:> <3:26> Op Fun Rs Rt <:5> Rd <:5> Imm6 Inputs: blue variables Outputs: red variables Control: Combinational Logic Circuit PC_sel RegWr RegDst EtOp ALUSrc ALUctr MemWr MemtoReg Zero? DATA PATH 6 3

3/6/8 inst Eamples of Control Signals Register Transfer ADD R[rd] ß R[rs] + R[rt]; PC ß PC + 4 ALUsrc = BusB, ALUctr = add, RegDst = rd, RegWr, PC_sel = incr SUB R[rd] ß R[rs] R[rt]; PC ß PC + 4 ALUsrc = BusB, ALUctr = sub, RegDst = rd, RegWr, PC_sel = incr ORi R[rt] ß R[rs] OR zero_et(imm6); PC ß PC + 4 ALUsrc = Im, Etop = Z, ALUctr = or, RegDst = rt, RegWr, PC_sel = incr LOAD R[rt] ß MEM[ R[rs] + sign_et(imm6) ]; PC ß PC + 4 ALUsrc = Im, Etop = Sign, ALUctr = add, MemtoReg=, RegDst = rt, RegWr, PC_sel = incr STORE MEM[ R[rs] + sign_et(imm6)] ß R[rt]; PC ß PC + 4 ALUsrc = Im, Etop = Sn, ALUctr = add, MemWr, PC_sel = incr BEQ if ( R[rs] == R[rt] ) then PC ß PC + 4 + sign_et(imm6)]*4; else PC ß PC + 4 PC_sel = output of ALU, ALUctr = sub 7 See MIPS reference First 2 columns identical ecept last row ->can be combined! Summary of Control Signals (for 7 instructions) RegDst ALUSrc MemtoReg RegWrite MemWrite PCsel func N/A op add sub ori lw sw beq jump EtOp ALUctr<3:> Add Subtract Or Add Add Subtract 3 R-type I-type J-type 26 2 6 6 op rs rt rd shamt funct add, sub op rs rt immediate ori, lw, sw, beq op target address jump 8 4

3/6/8 RegDst ALUSrc MemtoReg RegWrite MemWrite Branch EtOp ALUop<:> The Concept of Local Decoding op R-type ori lw sw beq jump R-type Or First two columns in prev slide collapsed to one func op Main 6 6 Control ALUop 2 Add This could be more bits ALU Control (Local) Add Subtract ALUctr 4 ALUctr generated locally based on funct code ALU 9 The ALU Control Assume 2-bit ALUOp derived from opcode Net, combinational logic derives the ALU control opcode ALUOp Operation funct ALU function ALUCtr lw load word XXXXXX add sw store word XXXXXX add beq branch equal XXXXXX subtract ori or immediate XXXXXX OR R-type add add subtract subtract AND AND OR OR set-on-less-than set-on-less-than 5

3/6/8 ALU Control ALU ALUCtr ALU Function AND OR add subtract set-on-less-than NOR Logic Function for Each Signal Mostly just a simple function: f(op) PC_sel ç if (OP == BEQ) then EQUAL ZERO, else ALUsrc ç if (OP == Rtype ) then BusB else immed ALUctr ç if (OP == Rtype ) then check funct elseif (OP == ORi) then OR elseif (OP == BEQ) then sub else add EtOp ç if (OP == ORi) then zero else sign MemWr ç (OP == Store) MemtoReg ç (OP == Load) RegWr: ç if ((OP == Store) (OP == BEQ)) then else RegDst: ç if ((OP == Load) (OP == ORi)) then Rt else Rd 2 6

3/6/8 Truth Table for the Main Control op 6 Main Control RegDst ALUSrc : ALUop func 6 ALU Control (Local) ALUctr 4 2 op R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump EtOp ALUop (Symbolic) R-type Or Add Add Subtract ALUop <> ALUop <> don t need func 3 A Simple Datapath + Control Based on the previous truth table 2 bits 4 bits 4 7

3/6/8 R-Type Instruction func ALU Ctr 5 Load Instruction add 6 8

3/6/8 Branch-on-Equal Instruction (beq) sub 7 Finally, Implementing Jumps (j) J-type 2 address 3:26 25: Jump uses word addressing It updates PC with concatenation of: Most significant 4 bits of <current PC+4> 26-bit jump address (shift left by 2 bits to get byte-wise address) Now we need a new control signal decoded from opcode for jump 8 9

3/6/8 DatapathWith Jumps Added 4 bits 9 Performance Issues Yes, the single-cycle CPU works correctly But the longest delay determines the CPU clock cycle What is the critical (or longest) path? The load instruction Instruction memory register file ALU data memory register file Could be worse if you deal with floating point numbers This violates a design principle: Making the common case fast Net, we will improve it using pipelining 2

3/6/8 Pipeline is natural! Pipelining Analogy The classic laundry eample: Washer, dryer, folder, storer Total = 8 hours Total = 3.5 hours n n Four loads: n Speedup = 8/3.5 = 2.3X Non-stop (steady state): n Speedup 4 (2N/.5N) = number of stages 2 Important Lessons about Pipelining Pipelining doesn t help latency of single task, but helps throughput of entire workload Multiple tasks operate simultaneously using different resources (in parallel) Potential speedup = Number pipe stages Pipeline rate limited by slowest stage Unbalanced lengths of stages reduce speedup Time to fill pipeline and time to drain it reduces speedup May stall for dependences 22

3/6/8 The MIPS Pipeline Five pipeline stages on MIPS processors:. IF: Instruction Fetch from memory 2. ID: Instruction Decode and Register Read 3. EX: Eecute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register file 23 Pipeline Performance Assume the time for different stages is: ps for ID stage ps for WB stage 2ps for all the other stages Performance of the single-cycle datapath design Instructio n eample Instr fetch Register read ALU op Memory access Register write back Total time lw 2ps ps 2ps 2ps ps 8ps sw 2ps ps 2ps 2ps 7ps R-format 2ps ps 2ps ps 6ps beq 2ps ps 2ps 5ps 24 2

3/6/8 Single-Cycle vs Pipeline Single-cycle (CC = 8ps) Pipelined (CC = 2ps) 2ps is the slowest stage time 4 speedup! 25 Convenient Pipelined Representation Time IFetch ID Eec Mem WB IFetch ID Eec Mem WB IFetch ID Eec Mem WB IFetch ID Eec Mem WB Program Flow IFetch ID Eec Mem WB IFetch ID Eec Mem WB 26 3

3/6/8 Pipeline Speedup If all stages are balanced (i.e., all take the same time) t pipelined = t nonpipelined # of stages If stages are NOT balanced, speedup becomes less Speedup is due to an increased throughput Latency (time for each instruction) does not necessarily improve Under ideal conditions and if a large number of instructions, then speedup = #stages 27 ISA Design is Suitable for Pipelining All MIPS instructions are 32 bits Much easier to fetch and decode But, VS 86: - to 7-byte instructions, more difficult Very regular instruction formats So that we can decode and read registers simultaneously in one stage Only load/store can access memory Can calculate address in EX stage, access memory in MEM stage (i.e., E, Mem, WR) Alignment of memory operands Always have a single data transfer So memory access takes only one cycle (in one stage) 28 4

3/6/8 Pipeline Hazards Hazards eist: Situations when the net instruction cannot eecute in the net cycle. Structural hazards A required resource (e.g., memory) is occupied/busy more details in net slide 2. Data hazards Need to wait for previous instruction s data to complete its data read/write 3. Control hazards Depend on a control action from a previous instruction (e.g., branch instruction: beq) 29 Structural Hazards Conflict for the use of a resource already occupied (e.g., only one memory!) MIPS is well designed so that there is No structural hazard Suppose MIPS pipeline has a single memory Load/store requires memory access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble Hence, MIPS pipelines require separate instruction and data memories To avoid a structural hazard 3 5

3/6/8 Data Hazards An instruction depends on the completion of data access by a previous instruction add $s, $t, $t 2 sub $t2, $s, $t3 Shading on right --> register is read Shading on left --> register is written Waited for 3 cycles Figure. Graphical representation of the instruction pipeline 3 Data Dependencies RAW (read-after-write) data dependency need not always be a data hazard add $s, $t, $t sub $t4, $t, $t and $t5, $t, $t sub $t2, $s, $t3 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB There is a RAW dependency of $s, but there is no pipeline data hazard! How to solve it? Either Stall or Forwarding. EX already has the result!! 32 add sub and sub 6

3/6/8 Forwarding (aka Bypassing) Use result whenever it is computed/available Don t need to wait until data is stored to register Requires etra connections in the datapath Work most of the time Need to modify hardware 33 Load-Use Data Hazard Unfortunately, we can t always avoid stalls even with forwarding values are still not available when needed Must stall one cycle for a load-use data hazard A special form of data hazard 34 7

3/6/8 How to Use Code Scheduling to Avoid Stalls: a software solution First, find the load-use data hazards i.e., the immediate net instruction Reorder to avoid using a load result in the net instruction C code for A = B + E; C = B + F; stall stall lw $t, ($t) lw $t2, 4($t) add $t3, $t, $t2 sw $t3, 2($t) lw $t4, 8($t) add $t5, $t, $t4 sw $t5, 6($t) 3 cycles lw $t, ($t) lw $t2, 4($t) lw $t4, 8($t) add $t3, $t, $t2 sw $t3, 2($t) add $t5, $t, $t4 sw $t5, 6($t) cycles 35 Other Types of Data Hazards We have discussed RAW (read after write) data hazard The other Two Data Hazards are avoided by design! Eliminate WAR by always fetching operands early (ID) in pipe Eliminate WAW by doing all WBs in order (always at the last stage, static) WAR: ADD R3, R2, R SUB R2, R4, R5 36 8

3/6/8 37 9