CS/ECE 552: Introduction To Computer Architecture 1

Size: px
Start display at page:

Download "CS/ECE 552: Introduction To Computer Architecture 1"

Transcription

1 ECE/CS 552: idterm Review Instructor: ikko H Lipasti Fall 2 University i of Wisconsin-adison i Lecture notes based on notes by ark Hill and John P Shen Updated by ikko Lipasti Computer Architecture Exercise in engineering tradeoff analysis Find the fastest/cheapest/power-efficient/etc solution Optimization problem with s of variables All the variables are changing At non-uniform rates With inflection points Only one guarantee: Today s right answer will be wrong tomorrow Two high-level effects: Technology push Application Pull Abstraction Difference between interface and implementation Interface: WHAT something does Implementation: HOW it does so What s the Big Deal? Tower of abstraction Complex interfaces implemented by layers below Abstraction hides detail Hundreds of engineers build one product Complexity unmanageable otherwise Application Program CS32 Operating System Compiler CS537 CS536 achine Language (ISA) CS354 Digital Logic ECE352 Electronic circuits ECE34 Semiconductor devices ECE335 Performance vs Design Time Time to market is critically important Eg, a new design may take 3 years It will be 3 times faster But if technology improves 5%/year In 3 years 5 3 = 338 So the new design is worse! (unless it also employs new technology) Bottom Line Designers must know BOTH software and hardware Both contribute to layers of abstraction IC costs and performance Compilers and Operating Systems Architecture

2 Performance Time and performance: achine A n times faster than achine B Iff Time(B)/Time(A) = n Iron Law: Performance = Time/program = Instructions Cycles = Time X X Program Instruction Cycle (code size) (CPI) (cycle time) Performance cont d Other etrics: IPS and FLOPS Beware of peak and omitted details Benchmarks: SPEC2 (95 in text) Summarize performance: A for time H for rate G for ratio Speedup Amdahl s Law: f f s Ch 2 Summary Basics Registers and ALU ops emory and load/store Branches and jumps Addressing odes Summary: Instruction Formats R: opcode rs rt rd shamt function I: opcode rs rt address/immediate J: opcode addr 6 26 Instruction decode: instruction bits Activate control signals Conclusions Simple and regular Constant length instructions, fields in same place Small and fast Small number of operands in registers Compromises inevitable Pipelining should not be hindered ake common case fast! Backwards compatibility! Basic Arithmetic and the ALU Number representations: 2 s complement, unsigned Addition/Subtraction Add/Sub ALU Full adder, ripple carry, subtraction Carry-lookahead addition Logical operations and, or, xor, nor, shifts Overflow Architecture 2

3 Unsigned Integers f(b3b) = b3 x b x 2 + b x 2 Treat as normal binary number Eg = x x x x x x 2 + x 2 = = 23 ax f( ) = 2 32 = 4,294,967,295 in f( ) = Range [,2 32 -] => # values (2 32 -) + = 2 32 Signed Integers 2 s complement f(b3 b b) = -b3 x b x 2 + b x 2 ax f( ) = 2 3 = in f( ) = -2 3 = (asymmetric) Range[-2 3,2 3 -] => # values( _ ) = 2 32 Eg 6 => + => Full Adder Full adder (a,b,c in ) => (c out, s) c out = two or more of (a, b, c in ) s = exactly one or three of (a,b,c in ) a b c in c out s Combined Ripple-carry Adder/Subtractor Control = => subtract XOR B with control and set c in to control Full Full Full Full Add Add Add Add er er er er a b a b a 2 b 2 C out a 3 b 3 operation c 4 4-bit Carry Lookahead Adder Carry Lookahead Block c Hierarchical Carry Lookahead for 6 bits c c 5 Carry Lookahead Block g 3 p 3 a 3 b 3 g 2 p 2 a 2 b 2 g p a b g p a b P G a,b 2-5 P G a,b 8- P G a 4-7 b 4-7 G P a -3 b -3 c 3 c 2 c c c 2 c 8 c 4 c s 3 s 2 s s s 2-5 s 8- s 4-7 s -3 Architecture 3

4 CLA: Compute G s and P s G 2,5 G 8, G 4,7 G,3 P 2,5 P 8, P 4,7 P,3 G 8,5 P 8,5 G,5 P,5 G,7 P,7 CLA: Compute Carries g 2 -g 5 g 8 -g g 4 -g 7 g -g 3 p 2 -p 5 p 8 -p p 4 -p 7 p -p 3 c4 c c 2 c 8 G 8 8, P 8, c 8 c G,7 P,7 c G 3,3 P,3 All Together Addition Overflow a invert carry in operation ux result = 5 > 4: + = =? 3 < X is f(2) : + = > Y is ~f(2) Overflow = f(2) * ~(a2)*~(b2) + ~f(2) * a(2) * b(2) b ux Add Subtraction Overflow No overflow on a-b if signs are the same Neg pos => neg ;; overflow otherwise Pos neg => pos ;; overflow otherwise Overflow = f(2) * ~(a2)*(b2) + ~f(2) * a(2) * ~b(2) What to do on Overflow? Ignore! (C language semantics) What about Java? (try/catch?) Flag condition code Sticky flag eg for floating point Otherwise gets in the way of fast hardware Trap possibly maskable IPS has eg add that traps, addu that does not Architecture 4

5 Ch 3 Summary Binary representations, signed/unsigned Arithmetic Full adder, ripple-carry, carry lookahead Carry-select, Carry-save Overflow, negative ore (multiply/divide/fp) later Logical Shift, and, or Ch 4 Processor Implementation Heart of 552 key to project Sequential logic design review (brief) Clock methodology (FSD) Datapath CPI Single instruction, 2 s complement, unsigned Control ultiple cycle implementation (information only) icroprogramming (information only) Exceptions Clocking ethology otivation Design data and control without considering clock Use Fully Synchronous Design (FSD) Just a convention to simplify design process Restricts design freedom Eliminates complexity, can guarantee timing correctness Not really feasible in real designs Even in 554 you will violate FSD Our ethodology Only flip-flops All on the same edge (eg falling) All with same clock No need to draw clock signals All logic finishes in one cycle Logic FFs Logic FFs Our ethodology, cont d No clock gating! Book has bad examples Correct design: new new state write AND clock write state current current Delayed Clocks (Gating) Clock Gated clock X Y X Clock D Delay Delay Problem: Some flip-flops receive gated clock late Data signal may violate setup & hold req t Y D Architecture 5

6 FSD Clocking Rules Clock Y T clock = cycle time T setup = FF setup time requirement T hold = FF hold time requirement T FF = FF combinational delay T comb = Combinational delay FSD Rules: T clock > T FF + T comb + T setup T FF + T comb > T hold Clock D Delay Y D All Together Register File? DFF Bit Slice CE D DFF Control Signals w/jumps Data_C(i) C_Adx 4 C Adx Decoder A_Adx 4 A Adx Decoder 5 i DFF DFF DFF 5 B_Adx 4 B Adx Decoder C = Write Port A,B = Ports Data_A(i) Data_B(i) ulti-cycle Implementation ulti-cycle Ctrl Signals Clock cycle = max(i-mem,reg-read+reg-write, ALU, d-mem) Reuse combination logic on different cycles One memory One ALU without other adders But Control is more complex (later) Need new registers to save values (eg IR) Used again on later cycles Logic that computes signals is reused Architecture 6

7 ulti-cycle Steps Step Description Sample Actions IF Fetch IR=E[PC] PC=PC+4 ID Decode A=RF(IR[25:2]) B=RF(IR[2:6]) Target=PC+SE(IR[5:] << 2) EX Execute ALUout = A + SE(IR[5:]) # lw/sw ALUout = A op B # rrr if (A==B) PC = target # beq em emory E[ALUout] = B # sw DR = E[ALUout] #lw RF(IR[5:]) = ALUout # rrr WB Writeback Reg(IR[2:6]) = DR # lw ulti-cycle Example (lw) EX E ALUSrcA = ALUSrcB = ALUOp = em IorD = WB LW Start LW SW SW RegDst = RegWrite emtoreg = IF em ALUSrcA= IorD = IRWrite ALUSrcB = ALUOp = PCWrite PCSrc = ALUSrcA = ALUSrcB = ALUOp = emwrite IorD = RRR BEQ ALUSrcA = ALUSrcB = ALUOp = PCWriteCond PCSource = WB RegDst = RegWrite emtoreg = ID ALUSrcA = ALUSrcB = ALUOp = J PCWrite PCSource = ulti-cycle Example (lw) icroprogramming Alternative way of specifying control FS State bubble Control signals in bubble Next state given by signals on arc Not a great language for specifying complex events Instead, treat as a programming problem icroprogramming Datapath remains the same Control is specified differently but does the same Each cycle a microprogram field specifies required control signals Label Alu Src Src2 Reg emory Pcwrite Next? Fetch Add Pc 4 pc Alu Alu + Add Pc Extshft Dispatch em Add A Extend Dispatch 2 Lw2 alu + Write mdr fetch Exceptions: Big Picture Two types: Interrupt (asynchronous) or Trap p( (synchronous) Hardware handles initial reaction Then invokes a software exception handler By convention, at eg xc O/S kernel provides code at the handler address Architecture 7

8 Exceptions: Hardware Sets state that identifies cause of exception IPS: in exception_code field of Cause register Changes to kernel mode for dangerous work ahead Disables interrupts IPS: recorded in status register Saves current PC (IPS: exception PC) Jumps to specific address (IPS: x88) Like a surprise JAL so can t clobber $3 Exceptions: Software Exception handler: IPS: ktext at x88 Set flag to detect incorrect entry Nested exception while in handler Save some registers Find exception type Eg I/O interrupt or syscall Jump to specific exception handler Exceptions: Software, cont d Handle specific exception Jump to clean-up to resume user program Restore registers Reset flag that detects incorrect entry Atomically Restore previous mode Enable interrupts Jump back to program (using EPC) Implementing Exceptions We worry only about hardware, not s/w IntCause undefined instruction arithmetic overflow Changes to the datapath New states in control FS FS With Exceptions Review Type Control Datapath Time (CPI, cycle time) Singlecycle ulticycle We want? Comb + end update No reuse cycle, (imem + reg + ALU + dmem) Comb + FS Reuse [3,5] cycles, update ax(imem, reg, ALU, dmem)?? ~ cycle, ax(imem, reg, ALU, dmem We will use pipelining to achieve last row Architecture 8

9 Pipelining (45-49) 49) Summary Big Picture Datapath Control Data Hazards Stalls Forwarding Control Hazards Exceptions Ideal Pipelining L L L n -- Gate 2 Delay n -- Gate 3 Delay L Comb Logic n Gate Delay L n -- Gate 3 Delay BW = ~(/n) n -- Gate 2 Delay BW = ~(2/n) n -- Gate 3 Delay BW = ~(3/n) Bandwidth increases linearly with pipeline depth Latency increases by latch delays L Ideal Pipelining Cycle: Instr: i F D X W i+ F D X W i+2 F D X W i+3 F D X W i+4 F D X W 2 3 Pipelining Idealisms Uniform subcomputations Can pipeline into stages with equal delay Identical computations Can fill pipeline with identical work Independent computations No relationships between work units Are these practical? No, but can get close enough to get significant speedup Complications Datapath Five (or more) instructions in flight Control ust correspond to multiple instructions Instructions may have data and control flow dependences Ie units of work are not independent One may have to stall and wait for another Program Data Dependences True dependence (RAW) j cannot execute until i produces its result Anti-dependence d (WAR) j cannot write its result until i has read its sources Output dependence (WAW) j cannot write its result until i has written its result D( i) R( j) R ( i ) D ( j ) D( i) D( j) Architecture 9

10 Control Dependences Resolution of Pipeline Hazards Conditional branches Branch must execute to determine which instruction to fetch next Instructions following a conditional branch are control dependent on the branch instruction Pipeline hazards Potential violations of program dependences ust ensure program dependences are not violated Hazard resolution Static: compiler/programmer guarantees correctness Dynamic: hardware performs checks at runtime Pipeline interlock Hardware mechanism for dynamic hazard resolution ust detect and enforce dependences at runtime Pipeline Hazards Pipelined Datapath Necessary conditions: WAR: write stage earlier than read stage Is this possible in IF-RD-EX-E-WB? WAW: write stage earlier than write stage Is this possible in IF-RD-EX-E-WB? RAW: read stage earlier than write stage Is this possible in IF-RD-EX-E-WB? If conditions not met, no need to resolve Check for both register and memory PC u x Add IF/ID ID/EX EX/E E/WB 4 Add Address Instruction memory Instruction register data register 2 Registers Write data 2 register Write data 6 32 Sign extend Shift left 2 u x Add result Zero ALU ALU result Address Write data Data memory data u x Pipelined Control PCSrc Pipelined Control PC u x Add IF/ID Add 4 Add result Shift left 2 ALUSrc Address Instruction memory Instruction register register 2 RegWr rite Registers Write register Write data Control data data 2 ID/EX WB EX u x Zero ALU ALU result EX/E WB E/WB WB Branch emwrite emtoreg Address data Data memory u x Write data Controlled by different instructions Decode instructions and pass the signals down the pipe pp Control sequencing is embedded in the pipeline Instruction 6 32 [5 ] Sign extend Instruction [2 6] Instruction [5 ] 6 ALU control ALUOp u x RegDst em Architecture

11 Data Hazards ust first detect hazards ID/EXWriteRegister = IF/IDRegister ID/EXWriteRegister = IF/IDRegister2 EX/EWriteRegister = IF/IDRegister EX/EWriteRegister = IF/IDRegister2 E/WBWriteRegister = IF/IDRegister E/WBWriteRegister = IF/IDRegister2 Forwarding Paths (ALU instructions) c b a ALU FORWARDING PATHS IF ID RD i+: R i+2: R i+3: R ALU E WB i: R (i i+) Forwarding via Path a i+: i: R R (i i+2) Forwarding via Path b i+2: i+: i: R R (i i+3) i writes R before i+3 reads R Implementation of ALU Forwarding Comp Comp Comp Comp Register File ALU Control Flow Hazards What to do? Always stall Easy to implement Performs poorly /6 th instructions are branches, each branch takes 3 cycles CPI = + 3 x /6 = 5 (lower bound) Control Flow Hazards Predict branch not taken Send sequential instructions down pipeline Kill instructions later if incorrect ust stop memory accesses and RF writes Including loads (why?) Late flush of instructions on misprediction Complex Global signal (wire delay) Exceptions Even worse: in one cycle I/O interrupt User trap to OS (EX) Illegal instruction (ID) Arithmetic overflow Hardware error Etc Interrupt priorities must be supported Architecture

12 Review Big Picture Datapath Control Data hazards Stalls Forwarding or bypassing Control flow hazards Branch prediction Exceptions IB RISC Experience [Agerwala and Cocke 987] Internal IB study: Limits of a scalar pipeline? emory Bandwidth Fetch instr/cycle from I-cache 4% of instructions are load/store (D-cache) Code characteristics (dynamic) Loads 25% Stores 5% ALU/RR 4% Branches 2% /3 unconditional (always taken /3 conditional taken, /3 conditional not taken Simplify Branches Assume 9% can be PC-relative 5% Overhead No register indirect, no register access from program Separate adder (like IPS R3) dependences Branch penalty reduced Total CPI: = 5 CPI = 87 IPC PC-relative Schedulable Penalty Yes (9%) Yes (5%) cycle Yes (9%) No (5%) cycle No (%) Yes (5%) cycle No (%) No (5%) 2 cycles Processor Performance Time Processor Performance = Program Instructions Cycles = Time X X Program Instruction Cycle (code size) (CPI) (cycle time) In the 98 s (decade of pipelining): CPI: 5 => 5 In the 99 s (decade of superscalar): CPI: 5 => 5 (best case) Revisit Amdahl s Law Sequential bottleneck lim v f f Even if v is infinite v Performance limited by nonvectorizable portion (-f) N No of Processors h - h - f f Time f Pipelined Performance odel N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) Architecture 2

13 Pipelined Performance odel Pipelined Performance odel N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) N Pipeline Depth -g g Tyranny of Amdahl s Law [Bob Colwell] When g is even slightly below %, a big performance hit will result Stalled cycles are the key adversary and must be minimized as much as possible otivation for Superscalar [Agerwala and Cocke] p Speedup p Speedup jumps from 3 to 43 for N=6, f=8, but s =2 instead of s= (scalar) n=6,s=2 n= n=2 n=6 n=4 Typical Range Vectorizability f Superscalar Proposal oderate tyranny of Amdahl s Law Ease sequential bottleneck ore generally applicable Robust (less sensitive to f) Revised Amdahl s Law: Speedup f s f v Limits on Instruction Level Parallelism (ILP) Weiss and Smith [984] 58 Sohi and Vajapeyam [987] 8 Tjaden and Flynn [97] 86 (Flynn s bottleneck) Tjaden and Flynn [973] 96 Uht [986] 2 Smith et al [989] 2 Jouppi and Wall [988] 24 Johnson [99] 25 Acosta et al [986] 279 Wedig [982] 3 Butler et al [99] 58 elvin and Patt [99] 6 Wall [99] 7 (Jouppi disagreed) Kuck et al [972] 8 Riseman and Foster [972] 5 (no control dependences) Nicolau and Fisher [984] 9 (Fisher s optimism) Superscalar Proposal Go beyond single instruction pipeline, achieve IPC > Dispatch multiple instructions per cycle Provide more generally applicable form of concurrency (not just vectors) Geared for sequential code that is hard to parallelize otherwise Exploit fine-grained or instruction-level parallelism (ILP) Architecture 3

14 Classifying ILP achines [Jouppi, DECWRL 99] Scalar pipelined Superpipelined Superscalar VLIW Superpipelined superscalar Review Summary Ch : Intro & performance Ch 2: Instruction Sets Ch 3: Arithmetic I Ch 4: Data path, control,,pipeliningp Details Fri /29 2:25-3:3 ( hour) in EH237 Closed books/notes/homeworks One page handwritten cheatsheet for quick reference A mix of short answer, design, analysis problems Architecture 4

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

Pipelining to Superscalar

Pipelining to Superscalar Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel

More information

Pipeline Processor Design

Pipeline Processor Design Pipeline Processor Design Beyond Pipeline Architecture Virendra Singh Computer Design and Test Lab. Indian Institute of Science Bangalore virendra@computer.org Advance Computer Architecture Branch Hazard

More information

Microprogramming. Microprogramming

Microprogramming. Microprogramming Microprogramming Alternative way of specifying control FSM State -- bubble control signals in bubble next state given by signals on arc not a great language to specify when things are complex Treat as

More information

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL Beyond Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

ECE/CS 552: Pipelining

ECE/CS 552: Pipelining ECE/CS 552: Pipelining Prof. ikko Lipasti Lecture notes based in part on slides created by ark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Forecast Big Picture Datapath Control Pipelining s Program

More information

ECE/CS 552: Pipeline Hazards

ECE/CS 552: Pipeline Hazards ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipeline Hazards Forecast Program Dependences

More information

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs

More information

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control معماري كامپيوتر پيشرفته معماري MIPS data path and control abbasi@basu.ac.ir Topics Building a datapath support a subset of the MIPS-I instruction-set A single cycle processor datapath all instruction actions

More information

RISC Processor Design

RISC Processor Design RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:

More information

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast

More information

ECE369. Chapter 5 ECE369

ECE369. Chapter 5 ECE369 Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1

More information

Lecture 5: The Processor

Lecture 5: The Processor Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and

More information

Chapter 5: The Processor: Datapath and Control

Chapter 5: The Processor: Datapath and Control Chapter 5: The Processor: Datapath and Control Overview Logic Design Conventions Building a Datapath and Control Unit Different Implementations of MIPS instruction set A simple implementation of a processor

More information

ENE 334 Microprocessors

ENE 334 Microprocessors ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1 55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Systems Architecture I

Systems Architecture I Systems Architecture I Topics A Simple Implementation of MIPS * A Multicycle Implementation of MIPS ** *This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from

More information

Superscalar Organization

Superscalar Organization Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average

More information

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

The Processor: Datapath & Control

The Processor: Datapath & Control Chapter Five 1 The Processor: Datapath & Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Processor: Multi- Cycle Datapath & Control

Processor: Multi- Cycle Datapath & Control Processor: Multi- Cycle Datapath & Control (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 27) COURSE

More information

RISC Design: Multi-Cycle Implementation

RISC Design: Multi-Cycle Implementation RISC Design: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR Microprogramming Exceptions and interrupts 9 CMPE Fall 26 A. Di Blas Fall 26 CMPE CPU Multicycle From single-cycle to Multicycle CPU with sequential control: Finite State Machine Textbook Edition: 5.4,

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

EECS 322 Computer Architecture Improving Memory Access: the Cache

EECS 322 Computer Architecture Improving Memory Access: the Cache EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow

More information

Topic #6. Processor Design

Topic #6. Processor Design Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

CC 311- Computer Architecture. The Processor - Control

CC 311- Computer Architecture. The Processor - Control CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor

More information

Single-Cycle Examples, Multi-Cycle Introduction

Single-Cycle Examples, Multi-Cycle Introduction Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of

More information

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón ICS 152 Computer Systems Architecture Prof. Juan Luis Aragón Lecture 5 and 6 Multicycle Implementation Introduction to Microprogramming Readings: Sections 5.4 and 5.5 1 Review of Last Lecture We have seen

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

ECE 313 Computer Organization EXAM 2 November 9, 2001

ECE 313 Computer Organization EXAM 2 November 9, 2001 ECE 33 Computer Organization EA 2 November 9, 2 This exam is open book and open notes. You have 5 minutes. Credit for problems requiring calculation will be given only if you show your work. Choose and

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Lets Build a Processor

Lets Build a Processor Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let s review Boolean Logic and build the ALU we ll need (Material from Appendix B) operation a 32 ALU result

More information

Lecture 10 Multi-Cycle Implementation

Lecture 10 Multi-Cycle Implementation Lecture 10 ulti-cycle Implementation 1 Today s enu ulti-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of Datapath and Control icroprogramming 2 ulti-cycle Solution

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

Implementing the Control. Simple Questions

Implementing the Control. Simple Questions Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) Label:... #assume not What is going on during the

More information

Advanced Computer Architecture Pipelining

Advanced Computer Architecture Pipelining Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,

More information

Multicycle Approach. Designing MIPS Processor

Multicycle Approach. Designing MIPS Processor CSE 675.2: Introduction to Computer Architecture Multicycle Approach 8/8/25 Designing MIPS Processor (Multi-Cycle) Presentation H Slides by Gojko Babić and Elsevier Publishing We will be reusing functional

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M CSE 22 Computer Organization Hugh Chesser, CSEB 2U Agenda Topics:. ultiple cycle implementation - complete Patterson: Appendix C, D 2 Breaking the Execution into Clock Cycles Execution of each instruction

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

CSE 2021 COMPUTER ORGANIZATION

CSE 2021 COMPUTER ORGANIZATION CSE 22 COMPUTER ORGANIZATION HUGH CHESSER CHESSER HUGH CSEB 2U 2U CSEB Agenda Topics:. Sample Exam/Quiz Q - Review 2. Multiple cycle implementation Patterson: Section 4.5 Reminder: Quiz #2 Next Wednesday

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

Chapter 5 Solutions: For More Practice

Chapter 5 Solutions: For More Practice Chapter 5 Solutions: For More Practice 1 Chapter 5 Solutions: For More Practice 5.4 Fetching, reading registers, and writing the destination register takes a total of 300ps for both floating point add/subtract

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle Boris Grot School of Informatics University of Edinburgh Previous lecture: single-cycle processor Inf2C Computer Systems - 2017-2018. Boris

More information

LECTURE 6. Multi-Cycle Datapath and Control

LECTURE 6. Multi-Cycle Datapath and Control LECTURE 6 Multi-Cycle Datapath and Control SINGLE-CYCLE IMPLEMENTATION As we ve seen, single-cycle implementation, although easy to implement, could potentially be very inefficient. In single-cycle, we

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Designing a Pipelined CPU

Designing a Pipelined CPU Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution

More information

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Initial Representation Finite State Diagram. Logic Representation Logic Equations Control Implementation Alternatives Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently;

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control Qing-song Shi http://.24.26.3 Email: zjsqs@zju.edu.cn Chapter 5 The processor : Datapath and

More information

Points available Your marks Total 100

Points available Your marks Total 100 CSSE 3 Computer Architecture I Rose-Hulman Institute of Technology Computer Science and Software Engineering Department Exam Name: Section: 3 This exam is closed book. You are allowed to use the reference

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Review: Abstract Implementation View

Review: Abstract Implementation View Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

CSE 2021 COMPUTER ORGANIZATION

CSE 2021 COMPUTER ORGANIZATION CSE 2021 COMPUTER ORGANIZATION HUGH LAS CHESSER 1012U HUGH CHESSER CSEB 1012U W10-M Agenda Topics: 1. Multiple cycle implementation review 2. State Machine 3. Control Unit implementation for Multi-cycle

More information