CS/ECE 552: Introduction To Computer Architecture 1
|
|
- Nancy Farmer
- 6 years ago
- Views:
Transcription
1 ECE/CS 552: idterm Review Instructor: ikko H Lipasti Fall 2 University i of Wisconsin-adison i Lecture notes based on notes by ark Hill and John P Shen Updated by ikko Lipasti Computer Architecture Exercise in engineering tradeoff analysis Find the fastest/cheapest/power-efficient/etc solution Optimization problem with s of variables All the variables are changing At non-uniform rates With inflection points Only one guarantee: Today s right answer will be wrong tomorrow Two high-level effects: Technology push Application Pull Abstraction Difference between interface and implementation Interface: WHAT something does Implementation: HOW it does so What s the Big Deal? Tower of abstraction Complex interfaces implemented by layers below Abstraction hides detail Hundreds of engineers build one product Complexity unmanageable otherwise Application Program CS32 Operating System Compiler CS537 CS536 achine Language (ISA) CS354 Digital Logic ECE352 Electronic circuits ECE34 Semiconductor devices ECE335 Performance vs Design Time Time to market is critically important Eg, a new design may take 3 years It will be 3 times faster But if technology improves 5%/year In 3 years 5 3 = 338 So the new design is worse! (unless it also employs new technology) Bottom Line Designers must know BOTH software and hardware Both contribute to layers of abstraction IC costs and performance Compilers and Operating Systems Architecture
2 Performance Time and performance: achine A n times faster than achine B Iff Time(B)/Time(A) = n Iron Law: Performance = Time/program = Instructions Cycles = Time X X Program Instruction Cycle (code size) (CPI) (cycle time) Performance cont d Other etrics: IPS and FLOPS Beware of peak and omitted details Benchmarks: SPEC2 (95 in text) Summarize performance: A for time H for rate G for ratio Speedup Amdahl s Law: f f s Ch 2 Summary Basics Registers and ALU ops emory and load/store Branches and jumps Addressing odes Summary: Instruction Formats R: opcode rs rt rd shamt function I: opcode rs rt address/immediate J: opcode addr 6 26 Instruction decode: instruction bits Activate control signals Conclusions Simple and regular Constant length instructions, fields in same place Small and fast Small number of operands in registers Compromises inevitable Pipelining should not be hindered ake common case fast! Backwards compatibility! Basic Arithmetic and the ALU Number representations: 2 s complement, unsigned Addition/Subtraction Add/Sub ALU Full adder, ripple carry, subtraction Carry-lookahead addition Logical operations and, or, xor, nor, shifts Overflow Architecture 2
3 Unsigned Integers f(b3b) = b3 x b x 2 + b x 2 Treat as normal binary number Eg = x x x x x x 2 + x 2 = = 23 ax f( ) = 2 32 = 4,294,967,295 in f( ) = Range [,2 32 -] => # values (2 32 -) + = 2 32 Signed Integers 2 s complement f(b3 b b) = -b3 x b x 2 + b x 2 ax f( ) = 2 3 = in f( ) = -2 3 = (asymmetric) Range[-2 3,2 3 -] => # values( _ ) = 2 32 Eg 6 => + => Full Adder Full adder (a,b,c in ) => (c out, s) c out = two or more of (a, b, c in ) s = exactly one or three of (a,b,c in ) a b c in c out s Combined Ripple-carry Adder/Subtractor Control = => subtract XOR B with control and set c in to control Full Full Full Full Add Add Add Add er er er er a b a b a 2 b 2 C out a 3 b 3 operation c 4 4-bit Carry Lookahead Adder Carry Lookahead Block c Hierarchical Carry Lookahead for 6 bits c c 5 Carry Lookahead Block g 3 p 3 a 3 b 3 g 2 p 2 a 2 b 2 g p a b g p a b P G a,b 2-5 P G a,b 8- P G a 4-7 b 4-7 G P a -3 b -3 c 3 c 2 c c c 2 c 8 c 4 c s 3 s 2 s s s 2-5 s 8- s 4-7 s -3 Architecture 3
4 CLA: Compute G s and P s G 2,5 G 8, G 4,7 G,3 P 2,5 P 8, P 4,7 P,3 G 8,5 P 8,5 G,5 P,5 G,7 P,7 CLA: Compute Carries g 2 -g 5 g 8 -g g 4 -g 7 g -g 3 p 2 -p 5 p 8 -p p 4 -p 7 p -p 3 c4 c c 2 c 8 G 8 8, P 8, c 8 c G,7 P,7 c G 3,3 P,3 All Together Addition Overflow a invert carry in operation ux result = 5 > 4: + = =? 3 < X is f(2) : + = > Y is ~f(2) Overflow = f(2) * ~(a2)*~(b2) + ~f(2) * a(2) * b(2) b ux Add Subtraction Overflow No overflow on a-b if signs are the same Neg pos => neg ;; overflow otherwise Pos neg => pos ;; overflow otherwise Overflow = f(2) * ~(a2)*(b2) + ~f(2) * a(2) * ~b(2) What to do on Overflow? Ignore! (C language semantics) What about Java? (try/catch?) Flag condition code Sticky flag eg for floating point Otherwise gets in the way of fast hardware Trap possibly maskable IPS has eg add that traps, addu that does not Architecture 4
5 Ch 3 Summary Binary representations, signed/unsigned Arithmetic Full adder, ripple-carry, carry lookahead Carry-select, Carry-save Overflow, negative ore (multiply/divide/fp) later Logical Shift, and, or Ch 4 Processor Implementation Heart of 552 key to project Sequential logic design review (brief) Clock methodology (FSD) Datapath CPI Single instruction, 2 s complement, unsigned Control ultiple cycle implementation (information only) icroprogramming (information only) Exceptions Clocking ethology otivation Design data and control without considering clock Use Fully Synchronous Design (FSD) Just a convention to simplify design process Restricts design freedom Eliminates complexity, can guarantee timing correctness Not really feasible in real designs Even in 554 you will violate FSD Our ethodology Only flip-flops All on the same edge (eg falling) All with same clock No need to draw clock signals All logic finishes in one cycle Logic FFs Logic FFs Our ethodology, cont d No clock gating! Book has bad examples Correct design: new new state write AND clock write state current current Delayed Clocks (Gating) Clock Gated clock X Y X Clock D Delay Delay Problem: Some flip-flops receive gated clock late Data signal may violate setup & hold req t Y D Architecture 5
6 FSD Clocking Rules Clock Y T clock = cycle time T setup = FF setup time requirement T hold = FF hold time requirement T FF = FF combinational delay T comb = Combinational delay FSD Rules: T clock > T FF + T comb + T setup T FF + T comb > T hold Clock D Delay Y D All Together Register File? DFF Bit Slice CE D DFF Control Signals w/jumps Data_C(i) C_Adx 4 C Adx Decoder A_Adx 4 A Adx Decoder 5 i DFF DFF DFF 5 B_Adx 4 B Adx Decoder C = Write Port A,B = Ports Data_A(i) Data_B(i) ulti-cycle Implementation ulti-cycle Ctrl Signals Clock cycle = max(i-mem,reg-read+reg-write, ALU, d-mem) Reuse combination logic on different cycles One memory One ALU without other adders But Control is more complex (later) Need new registers to save values (eg IR) Used again on later cycles Logic that computes signals is reused Architecture 6
7 ulti-cycle Steps Step Description Sample Actions IF Fetch IR=E[PC] PC=PC+4 ID Decode A=RF(IR[25:2]) B=RF(IR[2:6]) Target=PC+SE(IR[5:] << 2) EX Execute ALUout = A + SE(IR[5:]) # lw/sw ALUout = A op B # rrr if (A==B) PC = target # beq em emory E[ALUout] = B # sw DR = E[ALUout] #lw RF(IR[5:]) = ALUout # rrr WB Writeback Reg(IR[2:6]) = DR # lw ulti-cycle Example (lw) EX E ALUSrcA = ALUSrcB = ALUOp = em IorD = WB LW Start LW SW SW RegDst = RegWrite emtoreg = IF em ALUSrcA= IorD = IRWrite ALUSrcB = ALUOp = PCWrite PCSrc = ALUSrcA = ALUSrcB = ALUOp = emwrite IorD = RRR BEQ ALUSrcA = ALUSrcB = ALUOp = PCWriteCond PCSource = WB RegDst = RegWrite emtoreg = ID ALUSrcA = ALUSrcB = ALUOp = J PCWrite PCSource = ulti-cycle Example (lw) icroprogramming Alternative way of specifying control FS State bubble Control signals in bubble Next state given by signals on arc Not a great language for specifying complex events Instead, treat as a programming problem icroprogramming Datapath remains the same Control is specified differently but does the same Each cycle a microprogram field specifies required control signals Label Alu Src Src2 Reg emory Pcwrite Next? Fetch Add Pc 4 pc Alu Alu + Add Pc Extshft Dispatch em Add A Extend Dispatch 2 Lw2 alu + Write mdr fetch Exceptions: Big Picture Two types: Interrupt (asynchronous) or Trap p( (synchronous) Hardware handles initial reaction Then invokes a software exception handler By convention, at eg xc O/S kernel provides code at the handler address Architecture 7
8 Exceptions: Hardware Sets state that identifies cause of exception IPS: in exception_code field of Cause register Changes to kernel mode for dangerous work ahead Disables interrupts IPS: recorded in status register Saves current PC (IPS: exception PC) Jumps to specific address (IPS: x88) Like a surprise JAL so can t clobber $3 Exceptions: Software Exception handler: IPS: ktext at x88 Set flag to detect incorrect entry Nested exception while in handler Save some registers Find exception type Eg I/O interrupt or syscall Jump to specific exception handler Exceptions: Software, cont d Handle specific exception Jump to clean-up to resume user program Restore registers Reset flag that detects incorrect entry Atomically Restore previous mode Enable interrupts Jump back to program (using EPC) Implementing Exceptions We worry only about hardware, not s/w IntCause undefined instruction arithmetic overflow Changes to the datapath New states in control FS FS With Exceptions Review Type Control Datapath Time (CPI, cycle time) Singlecycle ulticycle We want? Comb + end update No reuse cycle, (imem + reg + ALU + dmem) Comb + FS Reuse [3,5] cycles, update ax(imem, reg, ALU, dmem)?? ~ cycle, ax(imem, reg, ALU, dmem We will use pipelining to achieve last row Architecture 8
9 Pipelining (45-49) 49) Summary Big Picture Datapath Control Data Hazards Stalls Forwarding Control Hazards Exceptions Ideal Pipelining L L L n -- Gate 2 Delay n -- Gate 3 Delay L Comb Logic n Gate Delay L n -- Gate 3 Delay BW = ~(/n) n -- Gate 2 Delay BW = ~(2/n) n -- Gate 3 Delay BW = ~(3/n) Bandwidth increases linearly with pipeline depth Latency increases by latch delays L Ideal Pipelining Cycle: Instr: i F D X W i+ F D X W i+2 F D X W i+3 F D X W i+4 F D X W 2 3 Pipelining Idealisms Uniform subcomputations Can pipeline into stages with equal delay Identical computations Can fill pipeline with identical work Independent computations No relationships between work units Are these practical? No, but can get close enough to get significant speedup Complications Datapath Five (or more) instructions in flight Control ust correspond to multiple instructions Instructions may have data and control flow dependences Ie units of work are not independent One may have to stall and wait for another Program Data Dependences True dependence (RAW) j cannot execute until i produces its result Anti-dependence d (WAR) j cannot write its result until i has read its sources Output dependence (WAW) j cannot write its result until i has written its result D( i) R( j) R ( i ) D ( j ) D( i) D( j) Architecture 9
10 Control Dependences Resolution of Pipeline Hazards Conditional branches Branch must execute to determine which instruction to fetch next Instructions following a conditional branch are control dependent on the branch instruction Pipeline hazards Potential violations of program dependences ust ensure program dependences are not violated Hazard resolution Static: compiler/programmer guarantees correctness Dynamic: hardware performs checks at runtime Pipeline interlock Hardware mechanism for dynamic hazard resolution ust detect and enforce dependences at runtime Pipeline Hazards Pipelined Datapath Necessary conditions: WAR: write stage earlier than read stage Is this possible in IF-RD-EX-E-WB? WAW: write stage earlier than write stage Is this possible in IF-RD-EX-E-WB? RAW: read stage earlier than write stage Is this possible in IF-RD-EX-E-WB? If conditions not met, no need to resolve Check for both register and memory PC u x Add IF/ID ID/EX EX/E E/WB 4 Add Address Instruction memory Instruction register data register 2 Registers Write data 2 register Write data 6 32 Sign extend Shift left 2 u x Add result Zero ALU ALU result Address Write data Data memory data u x Pipelined Control PCSrc Pipelined Control PC u x Add IF/ID Add 4 Add result Shift left 2 ALUSrc Address Instruction memory Instruction register register 2 RegWr rite Registers Write register Write data Control data data 2 ID/EX WB EX u x Zero ALU ALU result EX/E WB E/WB WB Branch emwrite emtoreg Address data Data memory u x Write data Controlled by different instructions Decode instructions and pass the signals down the pipe pp Control sequencing is embedded in the pipeline Instruction 6 32 [5 ] Sign extend Instruction [2 6] Instruction [5 ] 6 ALU control ALUOp u x RegDst em Architecture
11 Data Hazards ust first detect hazards ID/EXWriteRegister = IF/IDRegister ID/EXWriteRegister = IF/IDRegister2 EX/EWriteRegister = IF/IDRegister EX/EWriteRegister = IF/IDRegister2 E/WBWriteRegister = IF/IDRegister E/WBWriteRegister = IF/IDRegister2 Forwarding Paths (ALU instructions) c b a ALU FORWARDING PATHS IF ID RD i+: R i+2: R i+3: R ALU E WB i: R (i i+) Forwarding via Path a i+: i: R R (i i+2) Forwarding via Path b i+2: i+: i: R R (i i+3) i writes R before i+3 reads R Implementation of ALU Forwarding Comp Comp Comp Comp Register File ALU Control Flow Hazards What to do? Always stall Easy to implement Performs poorly /6 th instructions are branches, each branch takes 3 cycles CPI = + 3 x /6 = 5 (lower bound) Control Flow Hazards Predict branch not taken Send sequential instructions down pipeline Kill instructions later if incorrect ust stop memory accesses and RF writes Including loads (why?) Late flush of instructions on misprediction Complex Global signal (wire delay) Exceptions Even worse: in one cycle I/O interrupt User trap to OS (EX) Illegal instruction (ID) Arithmetic overflow Hardware error Etc Interrupt priorities must be supported Architecture
12 Review Big Picture Datapath Control Data hazards Stalls Forwarding or bypassing Control flow hazards Branch prediction Exceptions IB RISC Experience [Agerwala and Cocke 987] Internal IB study: Limits of a scalar pipeline? emory Bandwidth Fetch instr/cycle from I-cache 4% of instructions are load/store (D-cache) Code characteristics (dynamic) Loads 25% Stores 5% ALU/RR 4% Branches 2% /3 unconditional (always taken /3 conditional taken, /3 conditional not taken Simplify Branches Assume 9% can be PC-relative 5% Overhead No register indirect, no register access from program Separate adder (like IPS R3) dependences Branch penalty reduced Total CPI: = 5 CPI = 87 IPC PC-relative Schedulable Penalty Yes (9%) Yes (5%) cycle Yes (9%) No (5%) cycle No (%) Yes (5%) cycle No (%) No (5%) 2 cycles Processor Performance Time Processor Performance = Program Instructions Cycles = Time X X Program Instruction Cycle (code size) (CPI) (cycle time) In the 98 s (decade of pipelining): CPI: 5 => 5 In the 99 s (decade of superscalar): CPI: 5 => 5 (best case) Revisit Amdahl s Law Sequential bottleneck lim v f f Even if v is infinite v Performance limited by nonvectorizable portion (-f) N No of Processors h - h - f f Time f Pipelined Performance odel N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) Architecture 2
13 Pipelined Performance odel Pipelined Performance odel N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) N Pipeline Depth -g g Tyranny of Amdahl s Law [Bob Colwell] When g is even slightly below %, a big performance hit will result Stalled cycles are the key adversary and must be minimized as much as possible otivation for Superscalar [Agerwala and Cocke] p Speedup p Speedup jumps from 3 to 43 for N=6, f=8, but s =2 instead of s= (scalar) n=6,s=2 n= n=2 n=6 n=4 Typical Range Vectorizability f Superscalar Proposal oderate tyranny of Amdahl s Law Ease sequential bottleneck ore generally applicable Robust (less sensitive to f) Revised Amdahl s Law: Speedup f s f v Limits on Instruction Level Parallelism (ILP) Weiss and Smith [984] 58 Sohi and Vajapeyam [987] 8 Tjaden and Flynn [97] 86 (Flynn s bottleneck) Tjaden and Flynn [973] 96 Uht [986] 2 Smith et al [989] 2 Jouppi and Wall [988] 24 Johnson [99] 25 Acosta et al [986] 279 Wedig [982] 3 Butler et al [99] 58 elvin and Patt [99] 6 Wall [99] 7 (Jouppi disagreed) Kuck et al [972] 8 Riseman and Foster [972] 5 (no control dependences) Nicolau and Fisher [984] 9 (Fisher s optimism) Superscalar Proposal Go beyond single instruction pipeline, achieve IPC > Dispatch multiple instructions per cycle Provide more generally applicable form of concurrency (not just vectors) Geared for sequential code that is hard to parallelize otherwise Exploit fine-grained or instruction-level parallelism (ILP) Architecture 3
14 Classifying ILP achines [Jouppi, DECWRL 99] Scalar pipelined Superpipelined Superscalar VLIW Superpipelined superscalar Review Summary Ch : Intro & performance Ch 2: Instruction Sets Ch 3: Arithmetic I Ch 4: Data path, control,,pipeliningp Details Fri /29 2:25-3:3 ( hour) in EH237 Closed books/notes/homeworks One page handwritten cheatsheet for quick reference A mix of short answer, design, analysis problems Architecture 4
ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti
ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real
More informationPipelining to Superscalar
Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel
More informationPipeline Processor Design
Pipeline Processor Design Beyond Pipeline Architecture Virendra Singh Computer Design and Test Lab. Indian Institute of Science Bangalore virendra@computer.org Advance Computer Architecture Branch Hazard
More informationMicroprogramming. Microprogramming
Microprogramming Alternative way of specifying control FSM State -- bubble control signals in bubble next state given by signals on arc not a great language to specify when things are complex Treat as
More informationBeyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL
Beyond Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationECE/CS 552: Pipelining
ECE/CS 552: Pipelining Prof. ikko Lipasti Lecture notes based in part on slides created by ark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Forecast Big Picture Datapath Control Pipelining s Program
More informationECE/CS 552: Pipeline Hazards
ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipeline Hazards Forecast Program Dependences
More informationCOMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions
COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs
More informationﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control
معماري كامپيوتر پيشرفته معماري MIPS data path and control abbasi@basu.ac.ir Topics Building a datapath support a subset of the MIPS-I instruction-set A single cycle processor datapath all instruction actions
More informationRISC Processor Design
RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:
More informationCSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University
CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast
More informationECE369. Chapter 5 ECE369
Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1
More informationLecture 5: The Processor
Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and
More informationChapter 5: The Processor: Datapath and Control
Chapter 5: The Processor: Datapath and Control Overview Logic Design Conventions Building a Datapath and Control Unit Different Implementations of MIPS instruction set A simple implementation of a processor
More informationENE 334 Microprocessors
ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationPipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1
55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationSystems Architecture I
Systems Architecture I Topics A Simple Implementation of MIPS * A Multicycle Implementation of MIPS ** *This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from
More informationSuperscalar Organization
Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average
More informationEE382A Lecture 3: Superscalar and Out-of-order Processor Basics
EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationThe Processor: Datapath & Control
Chapter Five 1 The Processor: Datapath & Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions:
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationProcessor: Multi- Cycle Datapath & Control
Processor: Multi- Cycle Datapath & Control (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 27) COURSE
More informationRISC Design: Multi-Cycle Implementation
RISC Design: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationRISC Architecture: Multi-Cycle Implementation
RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay
More informationALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR
Microprogramming Exceptions and interrupts 9 CMPE Fall 26 A. Di Blas Fall 26 CMPE CPU Multicycle From single-cycle to Multicycle CPU with sequential control: Finite State Machine Textbook Edition: 5.4,
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationEECS 322 Computer Architecture Improving Memory Access: the Cache
EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow
More informationTopic #6. Processor Design
Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between
More informationCPE 335. Basic MIPS Architecture Part II
CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationEECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont
GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,
More informationRISC Architecture: Multi-Cycle Implementation
RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay
More informationCC 311- Computer Architecture. The Processor - Control
CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor
More informationSingle-Cycle Examples, Multi-Cycle Introduction
Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of
More informationLecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón
ICS 152 Computer Systems Architecture Prof. Juan Luis Aragón Lecture 5 and 6 Multicycle Implementation Introduction to Microprogramming Readings: Sections 5.4 and 5.5 1 Review of Last Lecture We have seen
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationECE 313 Computer Organization EXAM 2 November 9, 2001
ECE 33 Computer Organization EA 2 November 9, 2 This exam is open book and open notes. You have 5 minutes. Credit for problems requiring calculation will be given only if you show your work. Choose and
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationLets Build a Processor
Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let s review Boolean Logic and build the ALU we ll need (Material from Appendix B) operation a 32 ALU result
More informationLecture 10 Multi-Cycle Implementation
Lecture 10 ulti-cycle Implementation 1 Today s enu ulti-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of Datapath and Control icroprogramming 2 ulti-cycle Solution
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationTHE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationImplementing the Control. Simple Questions
Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) Label:... #assume not What is going on during the
More informationAdvanced Computer Architecture Pipelining
Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,
More informationMulticycle Approach. Designing MIPS Processor
CSE 675.2: Introduction to Computer Architecture Multicycle Approach 8/8/25 Designing MIPS Processor (Multi-Cycle) Presentation H Slides by Gojko Babić and Elsevier Publishing We will be reusing functional
More informationCS232 Final Exam May 5, 2001
CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationCSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M
CSE 22 Computer Organization Hugh Chesser, CSEB 2U Agenda Topics:. ultiple cycle implementation - complete Patterson: Appendix C, D 2 Breaking the Execution into Clock Cycles Execution of each instruction
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.
This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationCSE 2021 COMPUTER ORGANIZATION
CSE 22 COMPUTER ORGANIZATION HUGH CHESSER CHESSER HUGH CSEB 2U 2U CSEB Agenda Topics:. Sample Exam/Quiz Q - Review 2. Multiple cycle implementation Patterson: Section 4.5 Reminder: Quiz #2 Next Wednesday
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationProcessor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed
Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,
More informationChapter 5 Solutions: For More Practice
Chapter 5 Solutions: For More Practice 1 Chapter 5 Solutions: For More Practice 5.4 Fetching, reading registers, and writing the destination register takes a total of 300ps for both floating point add/subtract
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationInf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle
Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle Boris Grot School of Informatics University of Edinburgh Previous lecture: single-cycle processor Inf2C Computer Systems - 2017-2018. Boris
More informationLECTURE 6. Multi-Cycle Datapath and Control
LECTURE 6 Multi-Cycle Datapath and Control SINGLE-CYCLE IMPLEMENTATION As we ve seen, single-cycle implementation, although easy to implement, could potentially be very inefficient. In single-cycle, we
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationDesigning a Pipelined CPU
Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationInitial Representation Finite State Diagram. Logic Representation Logic Equations
Control Implementation Alternatives Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently;
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationComputer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control
Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control Qing-song Shi http://.24.26.3 Email: zjsqs@zju.edu.cn Chapter 5 The processor : Datapath and
More informationPoints available Your marks Total 100
CSSE 3 Computer Architecture I Rose-Hulman Institute of Technology Computer Science and Software Engineering Department Exam Name: Section: 3 This exam is closed book. You are allowed to use the reference
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationReview: Abstract Implementation View
Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationCSE 2021 COMPUTER ORGANIZATION
CSE 2021 COMPUTER ORGANIZATION HUGH LAS CHESSER 1012U HUGH CHESSER CSEB 1012U W10-M Agenda Topics: 1. Multiple cycle implementation review 2. State Machine 3. Control Unit implementation for Multi-cycle
More information