Computer System. Agenda

Size: px
Start display at page:

Download "Computer System. Agenda"

Transcription

1 Computer System Hiroaki Kobayashi 7/6/2011 Ver /6/2011 Computer Science 1 Agenda Basic model of modern computer systems Von Neumann Model Stored-program instructions and data are stored on memory Fundamental Functions of Computer Systems Data Memory Data Processing How to control/use a computer: user s perspective Machine instructions to specify operations on computer How a computer controlled: system s perspective Instruction fetch, decode, execution, result store Pipelining: A mechanism to increase the execution throughput Overlapped execution of machine instructions 7/6/2011 Computer Science 2

2 EDVAC in 1949 Basic Model of Modern Computers Von Neumann Computer Model Basic model of modern computers, designed and proposed by John von Neumann in 1945 First implementation in 1949; EDVAC ( Characteristics of Von-Neumann-type Computers Stored program Program and data are stored in memory Instruction read from memory, decode, data read from memory, processing (calculation), and result store to memory Linear memory space Memory cells (unit for data storage) are placed as an 1D array Each cell has its own address to specify a cell for data read and write Simple instructions to control a computer are provided such as Addition, subtraction, shift, logical AND/OR, Data move for read/write to/from memory Execution sequence control Sequentially-controlled computer that processes an instruction one by one Program (machine instructions) is processed sequentially stored on memory Special counter named program counter specifies the address of the current executing instruction. 7/6/2011 Computer Science 3 Change value of program counter if you want to change the sequence of execution. Basic Structure of Von-Neumann Computer Computer System Processor Control Unit Arithmetic Unit Memory Input Unit Output Unit 7/6/2011 Computer Science 4

3 Structure of Memory Logical Structure (visible from programmers) 1 dimensional array of memory cells Each cell stores a unit of data and has its own address to specify for data read/ write in memory Physical Structure (actual implementation) Memory devices are placed in a 2-dimensional and accessed through a combination of row and column address address data control Logical structure data cell row address Physical structure (implementation) 7/6/2011 Computer Science 5 column address Memory cell (1bit) Physical Memory Structure SRAM:Static Random Access Memory 1-bit memory cell column-address decoder 8-bit data Row-address decoder Data (input/output) CE: Chip Enable Input Input/output control 1-bit memory structure (plane) (control signal, WE=1 for write) Kbit 8-bit! K cells) SRAM structure 7/6/2011 Computer Science 6

4 Another Implementation of Memory DRAM:Dynamic Random Access Memory represent 1-bit store whether a capacitor is charged or not. Need only one transistor and one capacitor for 1-bit memory! Less hardware compared with SRAM (1/4) " More memory capacity on the same area" " # Need periodical refreshment of the memory content $ So named dynamic memory $ Longer access time for read/write of DRAM (output of address decoder) 1-bit cell of DRAM (for data read/write) 7/6/2011 Computer Science 7 SRAM vs. DRAM SRAM:Static Random Access Memory Use D-FF(equivalent to 4 Transistors) for 1-bit storage Stable(static) memory Fast memory access Access time: 0.5ns-5ns High (hardware) cost/1bit DRAM:Dynamic Random Access Memory Use one transistor and one capacitor for 1-bit storage More memory capacity at lower cost Unstable and need refreshment of contents Dynamic operation Refresh mechanism enlarges memory access time Access time: 50 70ns For speed-oriented memory %on-chip(processor) memory such as register and cache For capacity oriented memory %off-chip(processor) memory such as main memory (SIMM/DIMM) 7/6/2011 Computer Science 8

5 Basic Structure of Von-Neumann Computer Computer System Processor Control Unit Arithmetic Unit Memory Input Unit Output Unit 7/6/2011 Computer Science 9 Memory Operations Data read from memory Specify an address to read data and a register to have data in processor A register is an internal memory in processor Move data from memory specified to the register Data write to memory prepare data (calculation result) in a register Specify an address to write data Move data from the register to memory specified. D-FF Processor Combinational circuit Sequential circuit D-FF/capacitor &Tr 7/6/2011 Computer Science 10

6 Basic Operation of Computer Data Processing Basic behavior for data processing Move data from memory to processor, Perform an operation on data Store the operation result to memory Processor Example: perform an addition of two values y = a + b Data y,a,b are placed in memory Addition is performed in a processor by using an adder (combinational circuit) Processing of y=a+b is realized as a sequence of basic instructions & Execute the following four instructions sequentially Instructions 1 and 2 for data movement from memory locations a and b to internal registers Instruction 3 to perform addition of data stored in the registers, and store the result in (another) register Instruction 4 for data movement from register to memory location for y Program counter Address 7/6/2011 Computer Science a b y Contents Instruction1 Instruction2 Instruction3 Instruction4 Data of a Data of b Data of y Sequential Processing: Basic Execution Control for Computer 1. Computer sequentially processes instructions in order of memory locations Sequential processing Current memory location of an instruction processed by computer is hold by a program counter (PC) Sequential processing is carried out by incrementing the content of PC & Basic operations specified by instructions are: data movement to/from memory, arithmetic operations 2. In addition, special instructions (named JUMP and BRANCH) for execution control are prepared to change the order of instruction execution. if then else loop (iteration) Action: change the content of PC to the destination address specified by JUMP or BRANCH instruction 7/6/2011 Computer Science 12

7 Execution Control of a Computer Flow of instruction execution (Fetch/Decode/Exection of Instructions) Program counter Instruction decoder Memory Registers(accumulator) 7/6/2011 Computer Science 13 Control and Data Flow in a Computer Main memory Control Unit Program Counter Instruction fetch Specify an address of instruction to be executed Instruction area Machine Instructions Opecode Operands Specify an operation Specify operands Input/Output Unit Arithmetic Logic Unit Registers Operand fetch Result store Data Area Data Processing Unit (Processor) 7/6/2011 Computer Science 14

8 Three Basic Execution Flow Available on Computers Execution order Process 1 Process 2 Process 3 Descriptions at programming level If condition then process 1 else process 2 While condition do process 1 process 2 Proc1 Satisfied Cond Not satisfied Cond Not satisfied Proc2 Proc1 Proc2 Satisfied Proc1 Proc2 Proc3 7/6/2011 Computer Science 15 Execution Control of Computer: Sequential Execution of Instructions Program Description Sequential execution of process1, process 2, process 3 proc1 proc2 proc3 Memory allocation for instructions address Inst.1 for proc. 1 Inst.2 for proc. 2 Inst.3 for proc. 3 Behavior of computer for instructions execution Step Fetch/decode/execute instruction1 Step Fetch/decode/execute instruction2 Step Fetch/decode/execute instruction3 Fetch/decode/execute following instructions 7/6/2011 Computer Science 16

9 Execution Control of Computer Execution of Conditional Branches satisfied Proc1 Program description If condition then proc 1 else proc 2 Cond Not satisfied Proc2 Memory allocation for instructions address Inst. for cond evaluation Instruction for moving the next execution to address 004, if eval of cond is not satisfied Inst for proc 1 Instruction for moving the next execution to address 005 Inst for proc2 Behavior of computer for instructions execution Step fetch/decode/execute instruction for evaluating the condition at address 000 Step fetch/decode/execute instruction for instruction for moving the next execution at address 004, if condition is not satisfied, otherwise, execute the next instruction at 002 (if condition satisfied) Step fetch/decode/execute instruction for proc1 Step fetch/decode/execute instruction for moving the next execution at address 005 if condition not satisfied Step fetch/decode/execute instruction for proc2 Fetch/decode/execute following instructions at address 005 or later 7/6/2011 Computer Science 17 Execution Control of Computer Execution of Loop (Iterations) Program description While condition do proc Proc Cond satisfied proc1 Not satisfied proc2 Memory allocation for instructions addres Inst. for cond evaluation Instruction for moving the next execution to address 004, if eval of cond is not satisfied Inst for proc1 Instruction for moving the next execution to address 000 Inst for proc2 Behavior of computer for instructions execution Step fetch/decode/execute instruction for evaluating the condition at address 000 Step fetch/decode/execute instruction for moving the next execution at address 004, if condition is not satisfied, otherwise, execute the next instruction at 002 (if condition satisfied) Step fetch/decode/execute instruction for proc1 Step fetch/decode/execute instruction for moving the next execution at address 000 Repeat Steps 1 to 4 if condition not satisfied after several iteration Step fetch/decode/execute instruction for proc2 Fetch/decode/execute following instructions at address 005 or later 7/6/2011 Computer Science 18

10 Summary of Machine Instructions Commands to Computer Programming language at the lowest level to command data movement, arithmetic operations and execution control to a computer Load instructions: move data from memory to internal registers Example: Load Register1 MemoryAddress : (Reg1)"(MemAdrs) Store Instructions: move data from registers to memory Example: Store Register1 MemoryAddress : (MemAdrs)"(Reg1) Perform an operation on data stored in registers/memory Example: Add Reg1 Reg2 Reg3 : Reg1=Reg2+Reg3 Change the instruction execution flow Unconditional jump instructions: Jump Address Conditional branch instructions: BranchOnZero MemAdrs Sub-routine (procedure) call instruction Call MemAdrs Evaluation of condition for conditional branch instructions For conditions a=b, and a>b (or a<b), calculate a-b, and evaluate the result is equal to 0/Non-0, and positive ( or negative), respectively Machine-dependent X86 instruction set for Intel processors 7/6/2011 Computer Science 19 Classification of Instructions based on Number of Operands: Arithmetic Operations One-operand instruction Specify one operand and the others are defined implicitly Implicitly defined internal register is called Accumulator Registers or memory Two-operand instruction Two operands specified, and one of them is also used as a destination (output) Registers or memory 7/6/2011 Computer Science 20

11 Classification of Instructions based on Number of Operands : Arithmetic Operations (Cont d) Three-operand instruction Three operands, two for inputs and one for output, are specified Registers or memory Instruction Format 7/6/2011 Computer Science 21 Classification of Instructions based on Number of Operands : Data Movement between registers and memory LD(LoaD) Instruction move data from memory to internal register Case where implicitly specified register is used LD address 1-operand (acc) " (address):always accumulator register is used as a destination Case where explicitly specify internal register used LD Reg. address 2-operand (Reg) " (address): can specify any register as a destination ST(STore) instruction move data from register to memory Case where implicitly specified register is used ST address 1-operand (acc) # (address):always accumulator register implicitly specified Case where a register explicitly specified ST Reg. address 2-operand (Reg) # (address): any register can be specified as source register 7/6/2011 Computer Science 22

12 Classification of Instructions based on Number of Operands : Execution Control Instructions Unconditional Jump JP(JumP) instruction execute an instruction at address specified by operand JP address : 1 operand Conditional Branch $ Conditionally execute an instruction at address specified by operand based on the content of specified register JPZ(JumP Zero) instruction: if the content of implicitly or explicitly specified register is 0, execute instruction at address specified by operand JPZ address : if implicitly specified register (accumulator) is 0 goto address specified for the next execution 1-operand JPZ Reg address : if explicitly specified register is 0 go to address specified for the next execution 2-operand Similarly, JPNZ(JumP NonZero)are also defined If specified register is not zero, execute the next instruction at address specified (1 or 2 operand instructions defined 7/6/2011 Computer Science 23 Example of 16-bit machine instructions A model computer handles a 16-bit word has one internal register named accumulator executes 16-bit instructions each of which consists of one opecode (4-bit) and one operand (12-bit) One operand instruction format 16 kinds of operations specified A 12-bit operand specifies a memory address between 000 and FFF (4K memory space addressable by 12-bit) Accumulator is implicitly used as operands Acc is not specified in an instruction (Acc) " (Acc) Opecode Operand Load memory to (Acc) Store (Acc) to memory Instructions and data are stored in memory 16-bit word operation Instruction format Memory space 000~FFF(12-bit addressable) Operand 1 Operand 2 Accumulator ALU: Arithmetic Logic Unit opecode Operand(to specify mem) 4-bit 12-bit 7/6/2011 Computer Science 24

13 16 Kinds of Instructions Defined Symbol( ) Code Meaning (*** is a 12-bit memory address) Move data to accumulator from memory Move data from accumulator to memory Unconditional jump Note: -[A] means the contents of accumulator -PC is a program counter that specifies the address of the next instruction to be executed. Shift left Shift right Stop program execution Stop program execution 7/6/2011 Computer Science 25 Program example Sum of integers 1 to 10 Address Instruction code (binary representation) Machine code in symbolic representation (Assembly code) Special instruction to specify data space DC: Define Constant DS: Define Storage 7/6/2011 Computer Science 26

14 Summary: Control and Data Flow in a Computer Main memory Control Unit Program Counter Instruction fetch Specify an address of instruction to be executed Instruction area Machine Instructions Opecode Operands Specify an operation Specify operands Input/Output Unit Arithmetic Logic Unit Registers Operand fetch Result store Data Area Data Processing Unit (Processor) 7/6/2011 Computer Science 27 General-Purpose Register Architecture: Base for Modern Processor Design " Operands are explicitly specified only on GPRs Memory can be accessed only with load/store instructions Virtually every new architecture designed after 1980 uses a load-store register architecture Registers are faster than memory Registers are more efficient for a compiler to use Registers can be used to hold variables Memory traffic reduces Program speeds up Code density improves Examples: MIPS(1981), SPARC(1985), PowerPC(1991) Processor Memory GRRs 7/6/2011 Computer 28 Science

15 Format of Machine Instructions for General-Purpose Register Architecture An instruction has a fixed bit width (for example, 32-bit) and consists of an operation code and its operands & Opecode (OPEration CODE) specifies an operation to be performed ' Arithmetic operations ( AND OR! ' Data movement ( load/store instructions ' Execution Control ( Unconditional Jump ( Conditional Branch ) Operands is an one of inputs (arguments) of an opcode 8-bit 8-bit 8-bit 8-bit opcode Operand 1 Operand Operand 2 3 Example of a 32-bit instruction format with one opcode and three operands ) Bits of an instruction are divided into the opcode field and operand field 7/6/2011 Computer 29 Science Example of a Modern Processor Design Based on General-Purpose Register Architecture Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC 4 Adder Next SEQ PC RS1 Zero? MUX Address Memory Inst RS2 RD Reg File MUX MUX ALU Data Memory L M D MUX IR <= mem[pc]; PC <= PC + 4 Imm Sign Extend Reg[IR rd ] <= Reg[IR rs ] op IRop Reg[IR rt ] WB Data 7/6/2011 Computer Science 30

16 Features of the General-Purpose Register Processor " Key properties of GPRP instruction set All operations on data apply to data in registers and typically change the entire register The only operations that affect memory are load and store operations that move data from memory to a register or to memory from a register, respectively. The instruction formats are few in number with all instructions typically being one size " Five steps to execute an instruction I. Instruction fetch cycle () II. Instruction decode/register fetch cycle () III. Execution/effective address cycle (EX) IV. Memory access (MEM) V. Write-back cycle (WB) 7/6/2011 Computer Science 31 Example: MIPS Processor Architecture MIPS emphasizes a simple load-store instruction set design for pipelining efficiency, including a fixed instruction set encoding efficiency as a compiler target 7/6/2011 Computer Science 32

17 Pipelining: A Mechanism to Increase the Throughput in General-Purpose Register Architecture Processors by Overlapped Execution of Instructions Multiple instructions are overlapped in execution to increase execution throughput! Process of instruction execution is divided into two or more steps, called pipe stages or pipe segments, and Different stage are completing different parts of different instructions in parallel The stages are connected one to the next to form a pipe Instructions enter at one end, progress through the stages, and exit at the other end It takes advantage of parallelism that exists among the actions needed to execute an instruction in a sequential instruction stream. Unlike some speedup techniques, it is not visible to the programmer/compiler. 7/6/2011 Computer Science 33 Example of Pipelined Instruction Execution Clock Number Instruction number Instruction i EX MEM WB Instruction i+1 EX MEM WB Instruction i+2 EX MEM WB Instruction i+3 EX MEM WB Instruction i+4 EX MEM WB Latency of each execution is still 5 cycles, but execution of each instruction completed every cycle! 7/6/2011 Computer Science 34

18 A Data Path Drawn in a Pipeline Fashion 7/6/2011 Computer Science 35 A Pipeline with Pipeline Registers 7/6/2011 Computer Science 36

19 Basic Performance Issues in Pipelining Throughput How often an instruction exits the pipeline Processor Cycle (Pipeline Cycle) The time required between moving an instruction one step down the pipeline Because all stages proceed at the same time, the length of a processor cycle is determined by the time required for the slowest pipe stage. The longest step would determine the time between advancing the line. Ideal time per instruction on the pipeline processor Time per instruction on unpipelined machine Number of pipe stages Ideally, n times faster on n- stage pipeline, but usually the stages will not be perfectly balanced! In addition, Pipeline overhead: Latch delay and skew 7/6/2011 Computer Science 37 Example Assume that A processor has a 1ns clock cycle, and uses 4 cycles for ALU operations and branches, and 5 cycles for memory operations, where the relative frequencies of these operations are 40% (ALUs), 20% (Branches), and 40% (memory) Suppose that due to clock skew and setup, pipelining the processor add 0.2 ns of overhead to the clock. Question: Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline? 7/6/2011 Computer Science 38

20 Major Hurdle of Pipelining: Pipeline Hazards Structural hazards Arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution Data hazards Arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline Control hazards Arise from the pipelining of branches and other instructions that change the PC Hazards in pipelines can make it necessary to stall the pipelines! 7/6/2011 Computer Science 39 Performance of Pipelines with Stalls Speedup from pipelining = Average instruction time unpipelined Average instruction time pipelined = CPI unpipelined! Clock cycle unpipeline CPI pipelined! Clock cycle pipeline = CPI unpipelined CPI pipelined! Clock cycle unpipeline Clock cycle pipeline Because the Ideal CPI on a pipelined processor is almost always 1, CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction = 1 + Pipeline stall clock cycles per instruction 7/6/2011 Computer Science 40

21 Performance of Pipelines with Stalls (cont d) If there is no pipeline overhead, Speedup = CPI unpipelined 1 + Pipeline stall cycles per instruction Clock cycle Clock cycle unpipeline pipeline In the simple case, the unpipelined CPI is equal to the depth of the pipeline Speedup = Pipeline depth 1 + Pipeline stall cycles per instruction 7/6/2011 Computer Science 41 Performance of Pipelines with Stalls (cont d) If pipelining improves the clock cycle time, Speedup = = CPI unpipelined Clock cycle unpipeline! CPI pipelined Clock cycle pipeline 1 Clock cycle unpipelined! 1 + Pipeline stall cycles per instruction Clock cycle pipelined Clock cycle unpipeline Clock cycle pipeline 7/6/2011 Computer Science 42

22 Performance of Pipelines with Stalls (cont d) In cases where the pipe stages are perfectly balanced and there is no overhead, Clock cycle pipelined = Clock cycle unpipelined Pipeline depth Pipeline depth = Clock cycle unpipeline Clock cycle pipeline Finally, Speedup = 1! Clock cycle unpipelined 1 + Pipeline stall cycles per instruction Clock cycle pipelined = Pipeline stall cycles per instruction! Pipeline depth 7/6/2011 Computer Science 43 Structural Hazards Structural hazards occur when some combination of instructions cannot be accommodated because of resource conflicts. Some resource has not been duplicated enough to allow all combinations of instructions in the pipeline to execute. Examples: Read access in and write access in WB to the register file A single-memory pipeline for data and instructions 7/6/2011 Computer Science 44

23 A Processor with Only One Memory Port 7/6/2011 Computer Science 45 A Pipeline Stalled for a Structural Hazard Instruction number Instruction i EX MEM WB Instruction i+1 EX MEM WB Instruction i+2 EX MEM WB Instruction i+3 stall stall stall stall stall Instruction i+3 EX MEM WB Instruction i+4 stall stall stall stall stall Instruction i+4 EX MEM W Solution: Separate (cache-)memory systems for instructions and data 7/6/2011 Computer Science 46

24 Example: How much the load structural hazard might cost? Suppose that Data references constitute 40% of the mix The ideal CPI of the pipelined processor is 1 The processor with the structural hazard has a clock rate that is 1.05 times higher than the clock rate of the processor without the hazard. ( Question: Disregarding any other performance losses, is the pipeline with or without the structural hazard faster, and by how much? 7/6/2011 Computer Science 47 Consideration about Structural Hazard A processor without structural hazards will always have a lower CPI. Why would a designer allow structural hazard? Tradeoff between cost and performance gains Since pipelining all the functional units, or duplicating them, may be too costly. " Processors that support both an instruction and a data cache access every cycle require twice as much total memory bandwidth, and often have higher bandwidth at the pin. 7/6/2011 Computer Science 48

25 Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instruction on a unpipelined processor. Example: DADD R1, R2, R3 DSUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11 Operation Destination, Source1, Source2 7/6/2011 Computer Science 49 Data Hazard Example Data hazard No hazard 7/6/2011 Computer Science 50

26 Minimizing Data Hazard Stalls by Forwarding If the result can be moved from the pipeline register where the DADD stores it to where the DSUB needs it, the need for a stall can be avoided. ) Data forwarding (bypassing) mechanism: ' The ALU result from both the EX/MEM and MEM/WB pipeline registers is always fed back to the ALU inputs ' If the forwarding hardware detects that the previous ALU operation has written the register corresponding to a source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file. 7/6/2011 Computer Science 51 Data Forwarding Data forwarding path 7/6/2011 Computer Science 52

27 Implementation of Data Forwarding 7/6/2011 Computer Science 53 Data Hazards Requiring Stalls Example LD R1, 0(R2) DSUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 7/6/2011 Computer Science 54

28 Pipeline Interlocking to Preserve Correct Execution LD R1,0(R2) EX MEM WB DSUB R4, R1, R5 EX MEM WB AND R6, R1, R7 EX MEM WB OR R8, R1, R9 EX MEM WB Pipeline interlock LD R1,0(R2) EX MEM WB DSUB R4, R1, R5 stall EX MEM WB AND R6, R1, R7 stall EX MEM WB OR R8, R1, R9 stall EX MEM WB 7/6/2011 CPI increases Computer by the length Science of the stall! 55 Control (Branch) Hazards Execution of branch instructions may or may not change the PC to something other than its current execution sequence. Branch Instruction EX MEM WB Branch Successor Branch Successor + 1 stall EX MEM EX WB MEM WB Branch Successor + 2 EX MEM WB Fetch is restarted once the branch target is known. 7/6/2011 Computer Science 56

29 Reducing Pipeline Branch Penalties Delayed Branch Branch instruction Sequential successor (branch delay slot) Branch target if taken Taken branch inst EX MEM WB Branch delay inst EX MEM WB Branch target EX MEM WB Branch target + 1 EX MEM WB 7/6/2011 Computer Science 57 Scheduling the Branch Delay Slot Execution of the instruction in the delay slot will be nullified when the branch is incorrectly predicted. 7/6/2011 Computer Science 58

30 How Is Pipelining Implemented? Simple (Non-Pipelined) Implementation of MIPS in 5 cycles 1. Instruction Fetch () IR"Mem[PC] NPC"PC+4 2. Instruction decode/register fetch () A"Regs[rs] B"Reg[rt] Imm"sing-extended immediate field of IR 3. Execution/effective address calculation EX) ( Memory reference ALUOutput"A+Imm ( Register-Register ALU inst. ALUOutput"A func B ( Register-Imm. ALU inst. ALUOutput"A op Imm ( Branch ALUOutput"NPC+(Imm << 2) Cond" (A == 0) 7/6/2011 Computer Science 59 Simple (Non-Pipelined) Implementation (Cotnd) 4. Memory access/branch completion (MEM) ( LMD"Mem[ALUOutput] or Mem[ALUOutput] "B ( Branch if (cond) PC " ALUOutput 5. Write-back (WB) ( Register-Register ALU inst. Regs[rd] " ALUOutput ( Register-Imm. ALU inst. Regs[rt] " ALUOutput ( Load Instruction Regs[rt] " LMD 7/6/2011 Computer Science 60

31 Implementation of the MIPS Data Path 7/6/2011 Computer Science 61 Implementation of the Pipelined MIPS Data Path 7/6/2011 Computer Science 62 Pipeline Registers

32 Events on Every Pipe Stage Stage Any instruction /.IR " Mem[PC] /.NPC, PC " (if ((EX/MEM.opcode == branch) & EX/MEM.cond) {EX/MEM.ALUOutput} else {PC+4}) /EX.A " Regs[/.IR[rs]]; /EX.B " Regs[/.IR[rt]]; /EX.NPC " /.NPC; /EX.IR " /.IR /EX.Imm " sing-extend(/.ir[immediate field]); ALU Instruction Load/Store Instruction Branch Instruction EX MEM EX/MEM.IR " /EX.IR; EX/MEM.ALUOutput " /EX.A func /EX.B; or EX/MEM.ALUOutput " /EX.A op /EX.Imm; MEM/WB.IR " EX/MEM.IR; MEM/WB.ALUOutput " EX/MEM.ALUOutput; WB Regs[MEM/WB.IR[rd]] " MEM/WB.ALUOutput; or Regs[MEM/WB.IR[rt]] " MEM/WB.ALUoutput; EX/MEM.IR " /EX.IR; EX/MEM.ALUOutput " /EX.A + /EX.Imm; EX/MEM.B " /EX.B; MEM/WB.IR " EX/MEM.IR; MEM/WB.LMD " Mem[EX/MEM.ALUOutput]; or Mem[EX/MEM.ALUOutput] " EX/MEM.B; For load only: Regs[MEM/WB.IR[rt]] " MEM/WB.LMD; 7/6/2011 Computer Science EX/MEM.ALUOutput " /EX.NPC + (/EX.Imm << 2); EX/MEM.cond " (/EX.A == 0); 63 Implementing the Control for the MIPS Pipeline Situation Example code sequence Action No dependence LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R6, R7 Dependence requiring stall Dependence overcome by forwarding LD R1, 45(R2) DADD R5, R1, R7 DSUB R8, R6, R7 OR R9, R6, R7 LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R1, R7 OR R9, R6, R7 No hazard possible because no dependence exists on R1 in the immediately following three instruction Comparators detect the use of R1 in the DADD and stall the DADD (and DSUB and OR) before the DADD begin EX Comparators detect use of R1 in DSUB and forward result of load to ALU in time for DSUB to begin EX Dependence with LD R1, 45(R2) No action required because the real of R1 by accesses in order DADD R5, R6, R7 OR occurs in the second half of the phase, DSUB R8, R6, R7 while the write of the loaded data occurred in OR R9, R1, R7 the first half 7/6/2011 Computer Science 64

33 Data Path for Forwarding 7/6/2011 Computer Science 65 Dealing with Branches in the Pipeline One more adder in the stage for reducing the branch penalty Branch target address calculation Branch condition check 7/6/2011 Computer Science 66

34 Extending the MIPS Pipeline to Handle Multicycle Operations MIPS pipeline with three additional unpipelined, FP unit MIPS pipeline with three additional pipelined, FP unit 7/6/2011 Computer Science 67 Pipeline Timing (Independent Operations) Instruction Pipe Stages MUL.D M1 M2 M3 M4 M5 M6 M7 ME M ADD.D A1 A2 A3 A4 ME M L.D EX ME M WB S.D EX ME M WB WB WB Red: where data are needed Blue: where a result is available 7/6/2011 Computer Science 68

35 Hazards in Longer Latency Pipelines 1. Because the divide unit is not fully pipelined, structural hazards can occur. These will need to be detected and issuing instructions will need to be stalled. 2. Because the instructions have varying running times, the number of register writes required in a cycle can be larger than1. 3. WAW (write after write) hazards are possible, since instructions no longer reach WB in order. Note that WAR (write after read) hazards are not possible, since the register reads always occur in 4. Instructions can complete in a different order than they were issued, causing problems with exceptions 5. Because of longer latency of operations, stalls for RAW hazards will be more frequent. 7/6/2011 Computer Science 69

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1 Computer System Hiroaki Kobayashi 6/16/2010 6/16/2010 Computer Science 1 Ver. 1.1 Agenda Basic model of modern computer systems Von Neumann Model Stored-program instructions and data are stored on memory

More information

Computer System. Hiroaki Kobayashi 7/25/2011. Agenda. Von Neumann Model Stored-program instructions and data are stored on memory

Computer System. Hiroaki Kobayashi 7/25/2011. Agenda. Von Neumann Model Stored-program instructions and data are stored on memory Computer System Hiroaki Kobayashi 7/25/2011 7/25/2011 Computer Engineering 1 Agenda Basic model of modern computer systems Von Neumann Model Stored-program instructions and data are stored on memory Fundamental

More information

Appendix C. Abdullah Muzahid CS 5513

Appendix C. Abdullah Muzahid CS 5513 Appendix C Abdullah Muzahid CS 5513 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero) Single address mode for load/store: base + displacement no indirection

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Appendix A. Overview

Appendix A. Overview Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined

More information

Execution/Effective address

Execution/Effective address Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput

More information

mywbut.com Pipelining

mywbut.com Pipelining Pipelining 1 What Is Pipelining? Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. Today, pipelining is the key implementation technique used to make

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Appendix C Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Pipelining Multiple instructions are overlapped in execution Each is in a different stage Each stage is called

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/47 CS4617 Computer Architecture Lectures 21 22: Pipelining Reference: Appendix C, Hennessy & Patterson Dr J Vaughan November 2013 MIPS data path implementation (unpipelined) Figure C.21 The implementation

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Pipelining: Basic and Intermediate Concepts

Pipelining: Basic and Intermediate Concepts Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of fpipelining i Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 Unpipelined

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Pipeline Review. Review

Pipeline Review. Review Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining

More information

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently Pipelining Computational assembly line Each step does a small fraction of the job All steps ideally operate concurrently A form of vertical concurrency Stage/segment - responsible for 1 step 1 machine

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

ECSE 425 Lecture 6: Pipelining

ECSE 425 Lecture 6: Pipelining ECSE 425 Lecture 6: Pipelining H&P, Appendix A Vu, Meyer Textbook figures 2007 Elsevier Science Last Time Processor Performance EquaQon System performance Benchmarks 2 Today Pipelining Basics RISC InstrucQon

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

Lecture 5: Pipelining Basics

Lecture 5: Pipelining Basics Lecture 5: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3) 1 The Assembly Line Unpipelined Start and finish a job

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Basic Pipelining Concepts

Basic Pipelining Concepts Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution

More information

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C 1 MIPS ISA AND PIPELINING OVERVIEW Appendix A and C OUTLINE Review of MIPS ISA Review on Pipelining 2 READING ASSIGNMENT ReadAppendixA ReadAppendixC 3 THEMIPS ISA (A.9) First MIPS in 1985 General-purpose

More information

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

DLX Unpipelined Implementation

DLX Unpipelined Implementation LECTURE - 06 DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow? Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Topic #6. Processor Design

Topic #6. Processor Design Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

CA226 Advanced Computer Architecture

CA226 Advanced Computer Architecture Stephen Blott Today: data hazards Table of Contents 1 2 MIPS Pipeline Recall: the MIPS pipeline implements instruction level parallelism ideally, up to five instructions are executed

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Pipelining. Pipeline performance

Pipelining. Pipeline performance Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or

More information

COSC4201. Prof. Mokhtar Aboelaze York University

COSC4201. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 3 Multi Cycle Operations Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RTI) 1 Multicycle Operations More than one function unit, each

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月 第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 chenwz@zju.edu.cn 浙江大学计算机学院 2014 年 10 月 1 3.3 The Major Hurdle of Pipelining Pipeline Hazards 本科回顾 ------- Appendix A.2 3.3.1 Taxonomy

More information