CS 152 Computer Architecture and Engineering. Lecture 12: Multicycle Controller Design

Similar documents
Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

Initial Representation Finite State Diagram. Logic Representation Logic Equations

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

CS152 Computer Architecture and Engineering Lecture 13: Microprogramming and Exceptions. Review of a Multiple Cycle Implementation

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

CpE 442. Designing a Multiple Cycle Controller

Designing a Multicycle Processor

Major CPU Design Steps

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller. Review of a Multiple Cycle Implementation

CPU Design Steps. EECC550 - Shaaban

CC 311- Computer Architecture. The Processor - Control

ENE 334 Microprocessors

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

CPE 335. Basic MIPS Architecture Part II

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Implementing the Control. Simple Questions

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

The Big Picture: Where are We Now? CS 152 Computer Architecture and Engineering Lecture 11. The Five Classic Components of a Computer

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Outline of today s lecture. EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor. What s wrong with our CPI=1 processor?

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Microprogramming. Microprogramming

Working on the Pipeline

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Systems Architecture I

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

Multicycle conclusion

Computer Architecture. Lecture 6.1: Fundamentals of

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

The Processor: Datapath & Control

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Multicycle Approach. Designing MIPS Processor

COMP303 Computer Architecture Lecture 9. Single Cycle Control

RISC Processor Design

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Midterm I March 1, 2001 CS152 Computer Architecture and Engineering

ECE369. Chapter 5 ECE369

RISC Design: Multi-Cycle Implementation

361 multipath..1. EECS 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

Chapter 4 The Processor (Part 2)

CS61C : Machine Structures

MIPS-Lite Single-Cycle Control

CPU Organization (Design)

CPU Organization Datapath Design:

Topic #6. Processor Design

Ch 5: Designing a Single Cycle Datapath

Lecture 6: Pipelining

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Mapping Control to Hardware

Lecture 7 Pipelining. Peng Liu.

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lets Build a Processor

How to design a controller to produce signals to control the datapath

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

Review: Abstract Implementation View

Design a MIPS Processor (2/2)

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Modern Computer Architecture

CENG 3420 Lecture 06: Datapath

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

CSE 2021 COMPUTER ORGANIZATION

Lecture 6 Datapath and Controller

Pipelining. Maurizio Palesi

Outline Marquette University

EITF20: Computer Architecture Part2.2.1: Pipeline-1

RISC Architecture: Multi-Cycle Implementation

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

Computer Systems Architecture Spring 2016

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Lecture #17: CPU Design II Control

Microprogrammed Control Approach

CSE 2021 COMPUTER ORGANIZATION

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

EECE 417 Computer Systems Architecture

Chapter 5: The Processor: Datapath and Control

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Transcription:

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152 L12.1

Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique hardwired control microprogrammed control cs 152 L12.2

Recap: Macroinstruction Interpretation Main Memory execution unit ADD SUB AND... DATA User program plus Data this can change! one of these is mapped into one of these CPU control memory AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) cs 152 L12.3

The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory Input Datapath Output Today s Topics: Microprogramed control Administrivia; Courses Microprogram it yourself Exceptions Intro to Pipelining (if time permits) cs 152 L12.4

Recap: Horizontal vs. Vertical Microprogramming NOTE: previous organization is not TRUE horizontal microprogramming; register decoders give flavor of encoded microoperations Most microprogramming-based controllers vary between: horizontal organization (1 control bit per control point) vertical organization (fields encoded in the control memory and must be decoded to control something) Horizontal + more control over the potential parallelism of operations in the datapath - uses up lots of control store Vertical + easier to program, not very different from programming a RISC machine in assembly language - extra level of decoding may slow the machine down cs 152 L12.5

Recap: Designing a Microinstruction Set 1) Start with list of control signals 2) Group signals together that make sense (vs. random): called fields 3) Places fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals Use computers to design computers 5) To minimize the width, encode operations that will never be used at the same time cs 152 L12.6

Alternative datapath (book): Multiple Cycle Datapath Miminizes Hardware: 1 memory, 1 adder PCWr PC 32 32 32 IorD 0 Mux 1 32 32 PCWrCond Zero MemWr 32 RAdr Ideal Memory WrAdr Din Dout 32 IRWr Instruction Reg 32 Mem Data Reg RegDst Rs Rt Rt 0 Mux 5 5 Rd 1 1 Mux 0 RegWr Ra Rb busa A Reg File Rw B busw busb << 2 ALUSelA 32 4 32 PCSrc 0 Mux 1 0 1 2 3 1 Mux 0 32 32 32 ALU ALU Control Zero 32 ALU Out Imm 16 ExtOp Extend 32 MemtoReg ALUSelB ALUOp cs 152 L12.7

Finite State Machine (FSM) Spec IR <= MEM[PC] PC <= PC + 4 0000 instruction fetch Execute R-type ALUout <= A fun B 0100 ORi ALUout <= A op ZX 0110 0001 LW ALUout <= A + SX 1000 SW decode ALUout <= A + SX 1011 BEQ Q: How improve to do something in state 0001? ALUout <= PC +SX 0010 Memory R[rd] <= ALUout 0101 R[rt] <= ALUout 0111 M <= MEM[ALUout] 1001 R[rt] <= M 1010 MEM[ALUout] <= B 1100 If A = B then PC <= ALUout 0011 Write-back cs 152 L12.8

Single Bit Control Multiple Bit Control 1&2) Start with list of control signals, grouped into fields Signal name Effect when deasserted Effect when asserted ALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs] RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst Reg. dest. no. = rt Reg. dest. no. = rd MemRead None Memory at address is read, MDR <= Mem[addr] MemWrite None Memory at address is written IorD Memory address = PC Memory address = S IRWrite None IR <= Memory PCWrite None PC <= PCSource PCWriteCond None IF ALUzero then PC <= PCSource PCSource PCSource = ALU PCSource = ALUout Signal name Value Effect ALUOp 00 ALU adds 01 ALU subtracts 10 ALU does function code 11 ALU does logical OR ALUSelB 000 2nd ALU input = Reg[rt] 001 2nd ALU input = 4 010 2nd ALU input = sign extended IR[15-0] 011 2nd ALU input = sign extended, shift left 2 IR[15-0] 100 2nd ALU input = zero extended IR[15-0] cs 152 L12.9

Start with list of control signals, cont d For next state function (next microinstruction address), use Sequencer-based control unit from last lecture Called micropc or µpc vs. state register Signal Value Effect Sequen 00 Next µaddress = 0 -cing 01 Next µaddress = dispatch ROM 10 Next µaddress = µaddress + 1 1 Adder µaddress Select Logic micropc Mux 2 1 ROM Opcode 0 0 cs 152 L12.10

3) Microinstruction Format: unencoded vs. encoded fields Field Name Width Control Signals Set wide narrow ALU Control 4 2 ALUOp SRC1 2 1 ALUSelA SRC2 5 3 ALUSelB ALU Destination 3 2 RegWrite, MemtoReg, RegDst Memory 4 3 MemRead, MemWrite, IorD Memory Register 1 1 IRWrite PCWrite Control 4 3 PCWrite, PCWriteCond, PCSource Sequencing 3 2 AddrCtl Total width 26 17 bits cs 152 L12.11

4) Legend of Fields and Symbolic Names Field Name Values for Field Function of Field with Specific Value ALU Add ALU adds Subt. ALU subtracts Func code ALU does function code Or ALU does logical OR SRC1 PC 1st ALU input = PC rs 1st ALU input = Reg[rs] SRC2 4 2nd ALU input = 4 Extend 2nd ALU input = sign ext. IR[15-0] Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0] rt 2nd ALU input = Reg[rt] destination rd ALU Reg[rd] = ALUout rt ALU Reg[rt] = ALUout rt Mem Reg[rt] = Mem Memory Read PC Read memory using PC Read ALU Read memory using ALU output Write ALU Write memory using ALU output Memory register IR IR = Mem PC write ALU PC = ALU ALUoutCond IF ALU Zero then PC = ALUout Sequencing Seq Go to sequential µinstruction Fetch Go to the first microinstruction Dispatch Dispatch using ROM. cs 152 L12.12

Administrivia Enjoyed meeting everyone after midterm Midterm graded, scores posted Average score Std. Dev. Schedule change: Delay Lab 4 until Tuesday after midterm (10/14) => Delay Lab 5 until 10/28 => Delay Lab 6 until 11/11 => Delay Midterm II until 11/19 Next Lecture: Prof. Brodersen on Low Power Design Not in book, but can be on Midterm II Next reading assignment: Chapter 6 Advice on courses as pre-enroll cs 152 L12.13

Administrivia: Courses to consider during Telebears General Philosophy Take courses from great teachers (HKN ratings helps find them) - http://www-hkn.eecs.berkeley.edu/toplevel/coursesurveys.html Take variety of undergrad courses now to get introduction to areas; can learn advanced material on own later once know vocabulary Who knows what you will work on over a 40 year career? CS169 Software Engineering Everyone writes programs, even hardware designers Often programs are written in groups => learn skill in school EE122 Introduction to Communication Networks World is getting connected; communications must play major role CS162 Operating Systems All special-purpose hardware will run a layer of software that uses processes and concurrent programming; CS162 is the closest thing cs 152 L12.14

Microprogram it yourself! Label ALU SRC1 SRC2 ALU Dest. Memory Mem. Reg. PC Write Sequencing Fetch: Add PC 4 Read PC IR ALU Seq cs 152 L12.15

Microprogram it yourself! Label ALU SRC1 SRC2 Dest. Memory Mem. Reg. PC Write Sequencing Fetch: Add PC 4 Read PC IR ALU Seq Add PC Extshft Dispatch Lw: Add rs Extend Seq Read ALU Seq rt MEM Fetch Sw: Add rs Extend Seq Write ALU Fetch Rtype: Func rs rt Seq rd ALU Fetch Beq: Subt. rs rt ALUoutCond. Fetch Ori: Or rs Extend0 Seq rt ALU Fetch cs 152 L12.16

An Alternative MultiCycle DataPath A-Bus next PC P C inst mem Reg A IR S mem File ZX SX B B Bus W-Bus In each clock cycle, each Bus can be used to transfer from one source µ-instruction can simply contain B-Bus and W-Dst fields cs 152 L12.17

What about a 2-Bus Microarchitecture (datapath)? Instruction Fetch next PC P C IR ZXSX Reg File A B S Mem M A-Bus B Bus Decode / Operand Fetch next PC P C IR ZXSX Reg File A B S Mem M cs 152 L12.18

Load Execute next PC P C IR ZXSX Reg File A B S Mem M Mem next PC P C IR ZXSX Reg File A B S addr Mem M Write-back next PC P C IR ZXSX Reg File A B S Mem M What about 1 bus? 1 adder? 1 Register port? cs 152 L12.19

Legacy Software and Microprogramming IBM bet company on 360 Instruction Set Architecture (ISA): single instruction set for many classes of machines (8-bit to 64-bit) Stewart Tucker stuck with job of what to do about software compatability If microprogramming could easily do same instruction set on many different microarchitectures, then why couldn t multiple microprograms do multiple instruction sets on the same microarchitecture? Coined term emulation : instruction set interpreter in microcode for non-native instruction set Very successful: in early years of IBM 360 it was hard to know whether old instruction set or new instruction set was more frequently used cs 152 L12.20

Microprogramming Pros and Cons Ease of design Flexibility Easy to adapt to changes in organization, timing, technology Can make changes late in design cycle, or even in the field Can implement very powerful instruction sets (just more control memory) Generality Can implement multiple instruction sets on same machine. Can tailor instruction set to application. Compatibility Many organizations, same instruction set Costly to implement Slow cs 152 L12.21

Exceptions user program Exception: System Exception Handler return from exception Exception = unprogrammed control transfer cs 152 L12.22 normal control flow: sequential, jumps, branches, calls, returns system takes action to handle the exception - must record the address of the offending instruction returns control to user must save & restore user state Allows constuction of a user virtual machine

What happens to Instruction with Exception? MIPS architecture defines the instruction as having no effect if the instruction causes an exception. When get to virtual memory we will see that certain classes of exceptions must prevent the instruction from changing the machine state. This aspect of handling exceptions becomes complex and potentially limits performance => why it is hard cs 152 L12.23

Two Types of Exceptions Interrupts caused by external events asynchronous to program execution may be handled between instructions simply suspend and resume user program Traps caused by internal events - exceptional conditions (overflow) - errors (parity) - faults (non-resident page) synchronous to program execution condition must be remedied by the handler instruction may be retried or simulated and program continued or program may be aborted cs 152 L12.24

MIPS convention: exception means any unexpected change in control flow, without distinguishing internal or external; use the term interrupt only when the event is externally caused. Type of event From where? MIPS terminology I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunctions Either Exception or Interrupt cs 152 L12.25

Addressing the Exception Handler Traditional Approach: Interupt Vector PC <- MEM[ IV_base + cause 00] 370, 68000, Vax, 80x86,... iv_base RISC Handler Table PC < IT_base + cause 0000 saves state and jumps Sparc, PA, M88K,... MIPS Approach: fixed entry PC < EXC_addr Actually very small table - RESET entry - TLB - other iv_base cause handler code handler entry code cause cs 152 L12.26

Saving State Push it onto the stack Vax, 68k, 80x86 Save it in special registers MIPS EPC, BadVaddr, Status, Cause Shadow Registers M88k Save state in a shadow of the internal pipeline registers cs 152 L12.27

Additions to MIPS ISA to support Exceptions? EPC a 32-bit register used to hold the address of the affected instruction (register 14 of coprocessor 0). Cause a register used to record the cause of the exception. In the MIPS architecture this register is 32 bits, though some bits are currently unused. Assume that bits 5 to 2 of this register encodes the two possible exception sources mentioned above: undefined instruction=0 and arithmetic overflow=1 (register 13 of coprocessor 0). BadVAddr - register contained memory address at which memory reference occurred (register 8 of coprocessor 0) Status - interrupt mask and enable bits (register 12 of coprocessor 0) Control signals to write EPC, Cause, BadVAddr, and Status Be able to write exception address into PC, increase mux to add as input 01000000 00000000 00000000 01000000 two (8000 0080 hex ) May have to undo PC = PC + 4, since want EPC to point to offending instruction (not its successor); PC = PC - 4 cs 152 L12.28

Recap: Details of Status register Status 15 8 5 Mask k 4 e 3 k 2 e 1 k 0 e old prev current Mask = 1 bit for each of 5 hardware and 3 software interrupt levels 1 => enables interrupts 0 => disables interrupts k = kernel/user 0 => was in the kernel when interrupt occurred 1 => was running user mode e = interrupt enable 0 => interrupts were disabled 1 => interrupts were enabled When interrupt occurs, 6 LSB shifted left 2 bits, setting 2 LSB to 0 run in kernel mode with interrupts disabled cs 152 L12.29

Big Picture: user / system modes By providing two modes of execution (user/system) it is possible for the computer to manage itself operating system is a special program that runs in the priviledged mode and has access to all of the resources of the computer presents virtual resources to each user that are more convenient that the physical resurces - files vs. disk sectors - virtual memory vs physical memory protects each user program from others Exceptions allow the system to taken action in response to events that occur while user program is executing O/S begins at the handler cs 152 L12.30

Recap: Details of Cause register Status 15 10 Pending 5 2 Code Pending interrupt 5 hardware levels: bit set if interrupt occurs but not yet serviced handles cases when more than one interrupt occurs at same time, or while records interrupt requests when interrupts disabled Exception Code encodes reasons for interrupt 0 (INT) => external interrupt 4 (ADDRL) => address error exception (load or instr fetch) 5 (ADDRS) => address error exception (store) 6 (IBUS) => bus error on instruction fetch 7 (DBUS) => bus error on data fetch 8 (Syscall) => Syscall exception 9 (BKPT) => Breakpoint exception 10 (RI) => Reserved Instruction exception 12 (OVF) => Arithmetic overflow exception cs 152 L12.31

Precise Interrupts Precise => state of the machine is preserved as if program executed upto the offending instruction Same system code will work on different implementations of the architecture Position clearly established by IBM Difficult in the presence of pipelining, out-ot-order execution,... MIPS takes this position Imprecise => system software has to figure out what is where and put it all back together Performance goals often lead designers to forsake precise interrupts system software developers, user, markets etc. usually wish they had not done this cs 152 L12.32

cs 152 L12.33 How Control Detects Exceptions in our FSD Undefined Instruction detected when no next state is defined from state 1 for the op value. We handle this exception by defining the next state value for all op values other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12. Shown symbolically using other to indicate that the op field does not match any of the opcodes that label arcs out of state 1. Arithmetic overflow Chapter 4 included logic in the ALU to detect overflow, and a signal called Overflow is provided as an output from the ALU. This signal is used in the modified finite state machine to specify an additional possible next state Note: Challenge in designing control of a real machine is to handle different interactions between instructions and other exception-causing events such that control logic remains small and fast. Complex interactions makes the control unit the most challenging aspect of hardware design

How add Exceptions for Overflow and Unimplmented? IR <= MEM[PC] PC <= PC + 4 0000 instruction fetch ALUout <= PC +SX 0001 decode Execute R-type ALUout <= A fun B 0100 ORi ALUout <= A op ZX 0110 LW ALUout <= A + SX 1000 SW ALUout <= A + SX 1011 BEQ If A = B then PC <= ALUout 0010 Memory R[rd] <= ALUout 0101 R[rt] <= ALUout 0111 M <= MEM[ALUout] 1001 R[rt] <= M 1010 MEM[ALUout] <= B 1100 Write-back cs 152 L12.34

Modification to the Control Specification EPC <= PC - 4 PC <= exp_addr cause <= 12 (Ovf) IR <= MEM[PC] PC <= PC + 4 0000 S<= PC +SX 0001 decode other instruction fetch undefined instruction EPC <= PC - 4 PC <= exp_addr cause <= 10 (RI) R-type S <= A fun B 0100 overflow Execute Memory R[rd] <= S 0101 ORi S <= A op ZX 0110 R[rt] <= S 0111 LW S <= A + SX 1000 M <= MEM[S] 1001 R[rt] <= M 1010 BEQ SW S <= A If - A B= B S <= A + SX then PC <= S 1011 0010 MEM[S] <= B 1100 Write-back cs 152 L12.35

Pipelining is Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder takes 20 minutes cs 152 L12.36

Sequential Laundry T a s k O r d e r A B C D cs 152 L12.37 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20 30 40 20 30 40 20 Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

Pipelined Laundry: Start work ASAP 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r 30 40 40 40 40 20 A B C D Pipelined laundry takes 3.5 hours for 4 loads cs 152 L12.38

T a s k O r d e r Pipelining Lessons 6 PM 7 8 9 Time 30 40 40 40 40 20 A B C D Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences cs 152 L12.39

Pipelined Execution Time IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Utilization? Now we just have to make it work cs 152 L12.40

Clk Single Cycle, Multiple Cycle, vs. Pipeline Cycle 1 Cycle 2 Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem R-type Ifetch Pipeline Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem Wr R-type Ifetch Reg Exec Mem Wr cs 152 L12.41

Why Pipeline? Suppose we execute 100 instructions Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns cs 152 L12.42

Why Pipeline? Because the resources are there! Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU cs 152 L12.43

Can pipelining get us into trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; can t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions Can always resolve hazards by waiting cs 152 L12.44 pipeline control must detect the hazard take action (or delay action) to resolve hazards

Summary 1/3 Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields Control design reduces to Microprogramming Exceptions are the hard part of control Need to find convenient place to detect exceptions and to branch to state or microinstruction that saves PC and invokes the operating system As we get pipelined CPUs that support page faults on memory accesses which means that the instruction cannot complete AND you must be able to restart the program at exactly the instruction with the exception, it gets even harder cs 152 L12.45

Summary 2/3 Microprogramming is a fundamental concept implement an instruction set by building a very simple processor and interpreting the instructions essential for very complex instructions and when few register transfers are possible Pipelining is a fundamental concept multiple steps using distinct resources Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards cs 152 L12.46

Summary: Microprogramming one inspiration for RISC If simple instruction could execute at very high clock rate If you could even write compilers to produce microinstructions If most programs use simple instructions and addressing modes If microcode is kept in RAM instead of ROM so as to fix bugs If same memory used for control memory could be used instead as cache for macroinstructions Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1) cs 152 L12.47