You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

Size: px
Start display at page:

Download "You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011"

Transcription

1 CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz Paallel Theads Assigned to coe e.g., Lookup, Ads Paallel Instuctions >1 one time e.g., 5 pipelined instuctions Paallel Data >1 data one time e.g., Add of 4 pais of wods Hadwae desciptions All gates functioning in paallel at same time You Ae Hee! Haness Paallelism & Achieve High Pefomance Hadwae Today s Lectue Waehouse Scale Compute Coe Memoy Cache Compute Coe (Cache) Input/Output Coe Instuction Unit(s) Functional Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 Smat Phone Logic Gates 2 Review: Hazads Situations that pevent stating the next instuction in the next clock cycle 1. Stuctual hazads A equied esouce is busy (e.g., needed in multiple stages) 2. Data hazad Data dependency between instuctions. Need to wait fo pevious instuction to complete its data ead/wite 3. Contol hazad Flow of execution depends on pevious instuction 3 4 Review: Load / Banch Delay Slots Stall is equivalent to nop lw $t0, 0($t1) nop sub $t3,$t0,$t2 and $t5,$t0,$t4 ALU I$ Reg D$ Reg bub bub bub bub bub ble ble ble ble ble o $t7,$t0,$t6 I$ ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg ALU Reg D$ 5 6 1

2 Geate Instuction-Level Paallelism (ILP) Deepe pipeline (5 => 10 => 15 stages) Less wok pe stage shote clock cycle Multiple issue is supescala Replicate pipeline stages multiple pipelines Stat multiple instuctions pe clock cycle CPI < 1, so use Instuctions Pe Cycle (IPC) E.g., 4 GHz 4-way multiple-issue 16 BIPS, peak CPI = 0.25, peak IPC = 4 But dependencies educe this in pactice 4.10 Paallelism and Advanced Inst uction Level Paallelism Multiple Issue Static multiple issue Compile goups instuctions to be issued togethe Packages them into issue slots Compile detects and avoids hazads Dynamic multiple issue CPU examines instuction steam and chooses instuctions to issue each cycle Compile can help by eodeing instuctions CPU esolves hazads using advanced techniques at untime 7 8 Supescala Laundy: Paallel pe stage 6 PM AM Time T a A (light clothing) s (dak clothing) k B C (vey dity clothing) O D (light clothing) d E (dak clothing) e F (vey dity clothing) Moe esouces, HW to match mix of paallel tasks? Pipeline Depth and Issue Width Intel Pocessos ove Time Micopocesso Yea Clock Rate Pipeline Stages Issue width Coes Powe i MHz W Pentium MHz W Pentium Po MHz W P4 Willamette MHz W P4 Pescott MHz W Coe 2 Conoe MHz W Coe 2 Yokfield MHz W Coe i7 Gulftown MHz W 9 10 Pipeline Depth and Issue Width Static Multiple Issue Clock Powe Pipeline Stages Issue width Coes Compile goups instuctions into issue packets Goup of instuctions that can be issued on a single cycle Detemined by pipeline esouces equied Think of an issue packet as a vey long instuction (VLIW) Specifies multiple concuent opeations

3 Scheduling Static Multiple Issue Compile must emove some/all hazads Reode instuctions into issue packets No dependencies with a packet Possibly some dependencies between packets Vaies between ISAs; compile must know! Pad with nop if necessay MIPS with Static Dual Issue Dual-issue packets One ALU/banch instuction One load/stoe instuction 64-bit aligned ALU/banch, then load/stoe Pad an unused instuction with nop Addess Instuction type Pipeline Stages n ALU/banch IF ID EX MEM WB n + 4 Load/stoe IF ID EX MEM WB n + 8 ALU/banch IF ID EX MEM WB n + 12 Load/stoe IF ID EX MEM WB n + 16 ALU/banch IF ID EX MEM WB n + 20 Load/stoe IF ID EX MEM WB Hazads in the Dual-Issue MIPS Moe instuctions executing in paallel EX data hazad Fowading avoided stalls with single-issue Now can t use ALU esult in load/stoe in same packet add $t0, $s0, $s1 load $s2, 0($t0) Split into two packets, effectively a stall Load-use hazad Still one cycle use latency, but now two instuctions Moe aggessive scheduling equied Scheduling Example Schedule this fo dual-issue MIPS Loop: lw $t0, 0($s1) # $t0=aay element addu $t0, $t0, $s2 # add scala in $s2 sw $t0, 0($s1) # stoe esult addi $s1, $s1, 4 # decement pointe bne $s1, $zeo, Loop # banch $s1!=0 ALU/banch Load/stoe cycle Loop: nop lw $t0, 0($s1) 1 addi $s1, $s1, 4 nop 2 addu $t0, $t0, $s2 nop 3 bne $s1, $zeo, Loop sw $t0, 4($s1) 4 IPC = 5/4 = 1.25 (c.f. peak IPC = 2) Loop Unolling Replicate loop body to expose moe paallelism Use diffeent egistes pe eplication Called egiste it enaming Avoid loop-caied anti-dependencies Stoe followed by a load of the same egiste Aka name dependence Reuse of a egiste name Loop Unolling Example ALU/banch Load/stoe cycle Loop: addi $s1, $s1, 16 lw $t0, 0($s1) 1 nop lw $t1, 12($s1) 2 addu $t0, $t0, $s2 lw $t2, 8($s1) 3 addu $t1, $t1, $s2 lw $t3, 4($s1) 4 addu $t2, $t2, $s2 sw $t0, 16($s1) 5 addu $t3, $t4, $s2 sw $t1, 12($s1) 6 nop sw $t2, 8($s1) 7 bne $s1, $zeo, Loop sw $t3, 4($s1) 8 IPC = 14/8 = 1.75 Close to 2, but at cost of egistes and size

4 Administivia Poject 2 Pat 2 due Sunday. Slides at end of July 12 lectue contain useful info. Lab 12 cancelled! Replaced with fee study session whee you can catch up on labs / wok on poject 2. The TAs will still be thee. Poject 3 will be posted late Sunday (7/31) Two-stage pipelined CPU in Logisim Dynamic Multiple Issue Supescala pocessos CPU decides whethe to issue 0, 1, 2, instuctions each cycle Avoiding stuctual t and ddata hazads Avoids need fo compile scheduling Though it may still help Code semantics ensued by the CPU Dynamic Pipeline Scheduling Allow the CPU to execute instuctions out of ode to avoid stalls But commit esult to egistes in ode Example lw $t0, 20($s2) addu $t1, $t0, $t2 subu $s4, $s4, $t3 slti $t5, $s4, 20 Can stat subu while addu is waiting fo lw Why Do Dynamic Scheduling? Why not just let the compile schedule? Not all stalls ae pedicable e.g., cache misses Can t always schedule aound banches Banch outcome is dynamically detemined Diffeent implementations of an ISA have diffeent latencies and hazads 23 7/28/2011 Summe Lectue #

5 Speculation Guess what to do with an instuction Stat opeation as soon as possible Check whethe guess was ight If so, complete the opeation If not, oll-back and do the ight thing Common to both static and dynamic multiple issue Examples Speculate on banch outcome (Banch Pediction) Roll back if path taken is diffeent Speculate on load Roll back if location is updated 25 T a s k O d e Pipeline Hazad: Matching socks in late load 6 PM AM A bubble B C D E F A depends on D; stall since folde tied up; Time 26 Out-of-Ode Laundy: Don t Wait T a s k 6 PM AM A B C D bubble Time O d E e F A depends on D; est continue; need moe esouces to allow out-of-ode 27 Out-of-Ode Execution (1/3) Basically, unoll loops in hadwae 1. Fetch instuctions in pogam ode ( 4/clock) 2. Pedict banches as taken/untaken 3. To avoid hazads on egistes, ename egistes using a set of intenal egistes (~80 egistes) 28 Out-of-Ode Execution (2/3) Basically, unoll loops in hadwae 4. Collection of enamed instuctions might execute in a window (~60 instuctions) 5. Execute instuctions with eady opeands in 1 of multiple functional units (ALUs, FPUs, Ld/St) 6. Buffe esults of executed instuctions until pedicted banches ae esolved in eode buffe 29 Out-of-Ode Execution (3/3) Basically, unoll loops in hadwae 7. If pedicted banch coectly, commit esults in pogam ode 8. If pedicted banch incoectly, discad all dependent esults and stat with coect PC 30 5

6 Dynamically Scheduled CPU Banch pediction, Registe enaming Peseves dependencies Out-Of-Ode Intel All use O-O-O since 2001 Wait hee until all opeands available Micopocesso Yea Clock Rate Pipeline Stages Issue width Out-of-ode/ Speculation Coes Powe i MHz 5 1 No 1 5W Pentium MHz 5 2 No 1 10W Pentium Po MHz 10 3 Yes 1 29W P4 Willamette MHz 22 3 Yes 1 75W Execute and Hold Reode buffe fo egiste and memoy wites Can supply opeands fo issued instuctions Results also sent to any waiting esevation stations P4 Pescott MHz 31 3 Yes 1 103W Coe MHz 14 4 Yes 2 75W Coe 2 Yokfield MHz 16 4 Yes 4 95W Coe i7 Gulftown MHz 16 4 Yes 6 130W x86 instuctions RISC opeations Queues: RISC ops - 24 intege ops - 36 FP/SSE ops - 44 ld/st AMD Opteon X4 Micoachitectue enaming 16 achitectual egistes 72 physical egistes 4.11 Real Stuff: The AMD Opteon X4 (Bacelona) Pipeline AMD Opteon X4 Pipeline Flow Fo intege opeations 12 stages (Floating Point is 17 stages) Up to 106 RISC-ops in pogess Intel Nehalem is 16 stages fo intege opeations, details not evealed, but likely simila to above. Intel calls RISC opeations Mico opeations o μops

7 Does Multiple Issue Wok? The BIG Pictue Yes, but not as much as we d like Pogams have eal dependencies that limit ILP Some dependencies ae had to eliminate eg e.g., pointe aliasing Some paallelism is had to expose Limited window size duing instuction issue Memoy delays and limited bandwidth Had to keep pipelines full Speculation can help if done well New-School Machine Stuctues (It s a bit moe complicated!) Softwae Paallel Requests Assigned to compute e.g., Seach Katz Paallel Theads Assigned to coe e.g., Lookup, Ads Paallel Instuctions >1 one time e.g., 5 pipelined instuctions Paallel Data >1 data one time e.g., Add of 4 pais of wods Hadwae desciptions All gates functioning in paallel at same time Haness Paallelism & Achieve High Pefomance Hadwae Waehouse Scale Compute Poject 1 Coe Memoy Lab 14 Input/Output Coe Instuction Unit(s) Functional Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 Smat Phone Compute Coe (Cache) Poject 2 Main Memoy Logic Gates Poject 39 3 Big Pictue on Paallelism Two types of paallelism in applications 1. Data-Level Paallelism (DLP): aises because thee ae many data items that can be opeated on at the same time 2. Task-Level Paallelism: aises because tasks of wok ae ceated that can opeate lagely in paallel 40 Big Pictue on Paallelism Hadwae can exploit app DLP and Task LP in fou ways: 1. Instuction-Level Paallelism: Hadwae exploits application DLP using ideas like pipelining and speculative execution 2. SIMD achitectues:exploit p app DLP by applying a single instuction to a collection of data in paallel 3. Thead-Level Paallelism: exploits eithe app DLP o TLP in a tightly-coupled hadwae model that allows fo inteaction among paallel theads 4. Request-Level Paallelism: exploits paallelism among lagely decoupled tasks and is specified by the pogamme of the opeating system 41 Pee Instuction Inst LP, SIMD, Thead LP, Request LP ae examples of Paallelism above the Instuction Set Achitectue Paallelism explicitly at the level of the ISA Paallelism below the level of the ISA Inst. LP SIMD Th. LP Req. LP Red = = = = = Geen = = = Puple = Blue 42 7

8 Pee Answe Inst LP, SIMD, Thead LP, Request LP ae examples of Paallelism above the Instuction Set Achitectue Paallelism explicitly at the level of the ISA Paallelism below the level of the ISA Inst. LP SIMD Th. LP Req. LP Red = = = = = Geen = = = Puple = Blue 43 Pee Question State if the following techniques ae associated pimaily with a softwae- o hadwae-based appoach to exploiting ILP (in some cases, the answe may be both): Supescala, Out-of- Ode execution, Speculation, Registe Renaming Out of Ode Supescala Speculation Registe Renaming Red HW HW HW HW SW SW SW SW Geen Both Both Both Both HW HW Both Both Puple HW HW HW Both Blue HW HW HW SW 44 Pee Answe State if the following techniques ae associated pimaily with a softwae- o hadwae-based appoach to exploiting ILP (in some cases, the answe may be both): Supescala, Out-of- Ode execution, Speculation, Registe Renaming Out of Ode Supescala Speculation Registe Renaming Red HW HW HW HW Oange SW SW SW SW Geen Both Both Both Both HW HW Both Both Pink HW HW HW Both Blue HW HW HW SW 45 And in Conclusion, Big Ideas of Instuction Level Paallelism Pipelining, Hazads, and Stalls Fowading, Speculation to ovecome Hazads Multiple issue to incease pefomance IPC instead of CPI Dynamic Execution: Supescala in-ode issue, banch pediction, egiste enaming, out-ofode execution, in-ode commit unoll loops in HW, hide cache misses 46 But wait thee s moe??? If we have time Review: C Memoy Management C has thee pools of data memoy (+ memoy) Static stoage: global vaiable stoage, basically pemanent, entie pogam un The Stack: local vaiable stoage, paametes, etun addess The Heap (dynamic stoage): malloc() gabs space fom hee, fee() etuns it Common (Dynamic) Memoy Poblems Using uninitialized values Accessing memoy beyond you allocated egion Impope use of fee/ealloc by messing with the pointe handle etuned by malloc Memoy leaks: mismatched malloc/fee pais pevents accesses between and 48 (via vitual memoy) 8

9 Simplest Model Only one pogam unning on the compute Addesses in the pogam ae exactly the physical memoy addesses Extensions to the simple model: What if less physical memoy than full addess space? What if we want to un multiple pogams at the same time? Poblem #1: Physical Memoy Less Than the Full Addess One achitectue, many implementations, with possibly diffeent amounts of memoy Memoy used to vey expensive and physically bulky Whee does the gow fom then? Logical Vitual Real Idea: Level of Indiection to Ceate Illusion of Lage Physical Memoy Hi Ode Bits of Vitual Addess Addess Map o Table Hi Ode Bits of Physical Addess Logical Vitual Physical Real Vitual Page Page Addess Addess 51 Poblem #2: Multiple Pogams Shaing the Machine s Addess How can we un multiple pogams without accidentally stepping on same addesses? How can we potect pogams fom clobbeing each othe? Application 1 52 Application 2 Idea: Level of Indiection to Ceate Illusion of Sepaate Addess s One table pe unning application OR Real swap table contents when switching 53 Extension to the Simple Model Multiple pogams shaing the same addess space E.g., Opeating system uses low end of addess ange shaed with application Multiple pogams in shaed (vitual) addess space Static ti management: fixed patitioning/allocation i of space Dynamic management: pogams come and go, take diffeent amount of time to execute, use diffeent amounts of memoy How can we potect pogams fom clobbeing each othe? How can we allocate memoy to applications on demand? 54 9

10 (4 GB - 64 MB) ~ hex ~ 0FFF FFFF hex 2 28 bytes (64 MB) Static Division of Shaed Addess Application Opeating System E.g., how to manage the caving up of the addess space among and applications? Whee does the end and the application begin? Dynamic management, with potection, would be bette! 55 Fist Idea: Base + Bounds Registes fo Location Independence Max add Location-independent pogams Pogamming and stoage management ease: need fo a base egiste Potection Independent pogams should not affect each othe inadvetently: need fo a bound egiste Histoically, base + bounds egistes wee a vey ealy idea in compute achitectue 0 add pog1 pog2 56 Physical Memoy Simple Base and Bound Tanslation lw X Bound Registe Effective Addess Base Registe Segment Length Bounds Violation? Physical Addess cuent segment Base Physical Addess Pogam Addess Base and bounds egistes ae visible/accessible to pogamme Tap to if bounds violation detected ( seg fault / coe dumped ) 57 + Physica al Memoy pgm 2 Pogams Shaing Memoy pgms 4 & 5 aive pgm 2 pgm 4 pgm 5 8K pgms 2 & 5 leave pgm 4 Why do we want to un multiple pogams? Run othes while waiting fo I/O What pevents pogams fom accessing each othe s data? fee 8K 58 Student Roulette? Restiction on Base + Bounds Regs Want only the Opeating System to be able to change Base and Bound Registes Pocessos need diffeent execution modes 1. Use mode: can use Base and Bound Registes, but cannot change them 2. Supeviso mode: can use and change Base and Bound Registes Also need Mode Bit (0=Use, 1=Supeviso) to detemine pocesso mode Also need way fo pogam in Use Mode to invoke opeating system in Supeviso Mode, and vice vesa 59 pgm 2 Pogams Shaing Memoy pgms 4 & 5 aive pgm 2 pgm 4 pgm 5 8K pgms 2 & 5 leave pgm 4 fee 8K As pogams come and go, the stoage is fagmented. Theefoe, at some stage pogams have to be moved aound to compact the stoage. Easy way to do this? 60 Student Roulette? 10

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design

More information

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned

More information

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruction Level Parallelism: Multiple Instruction Issue

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruction Level Parallelism: Multiple Instruction Issue CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruction Level Parallelism: Multiple Instruction Issue Guest Lecturer: Justin Hsia You Are Here! Software Hardware Parallel Requests

More information

COSC 6385 Computer Architecture. - Pipelining

COSC 6385 Computer Architecture. - Pipelining COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped

More information

Lecture #22 Pipelining II, Cache I

Lecture #22 Pipelining II, Cache I inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html

More information

The Processor: Improving Performance Data Hazards

The Processor: Improving Performance Data Hazards The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called

More information

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,

More information

Introduction To Pipelining. Chapter Pipelining1 1

Introduction To Pipelining. Chapter Pipelining1 1 Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?

More information

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined

More information

Computer Science 141 Computing Hardware

Computer Science 141 Computing Hardware Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431

More information

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines 1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20 Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards) Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg

More information

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1 CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4) PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,

More information

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol

More information

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CS 2461: Computer Architecture 1 Program performance and High Performance Processors Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

CSE4201. Computer Architecture

CSE4201. Computer Architecture CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

CENG 3420 Lecture 07: Pipeline

CENG 3420 Lecture 07: Pipeline CENG 3420 Lectue 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Sping 2017 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Review from last lecture

Review from last lecture CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect

More information

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M

More information

DYNAMIC STORAGE ALLOCATION. Hanan Samet

DYNAMIC STORAGE ALLOCATION. Hanan Samet ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

THE THETA BLOCKCHAIN

THE THETA BLOCKCHAIN THE THETA BLOCKCHAIN Theta is a decentalized video steaming netwok, poweed by a new blockchain and token. By Theta Labs, Inc. Last Updated: Nov 21, 2017 esion 1.0 1 OUTLINE Motivation Reputation Dependent

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing

More information

Any modern computer system will incorporate (at least) two levels of storage:

Any modern computer system will incorporate (at least) two levels of storage: 1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

A Memory Efficient Array Architecture for Real-Time Motion Estimation

A Memory Efficient Array Architecture for Real-Time Motion Estimation A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN

More information

DYNAMIC STORAGE ALLOCATION. Hanan Samet

DYNAMIC STORAGE ALLOCATION. Hanan Samet ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 074 e-mail: hjs@umiacs.umd.edu

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

GARBAGE COLLECTION METHODS. Hanan Samet

GARBAGE COLLECTION METHODS. Hanan Samet gc0 GARBAGE COLLECTION METHODS Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

GCC-AVR Inline Assembler Cookbook Version 1.2

GCC-AVR Inline Assembler Cookbook Version 1.2 GCC-AVR Inline Assemble Cookbook Vesion 1.2 About this Document The GNU C compile fo Atmel AVR isk pocessos offes, to embed assembly language code into C pogams. This cool featue may be used fo manually

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Query Language #1/3: Relational Algebra Pure, Procedural, and Set-oriented

Query Language #1/3: Relational Algebra Pure, Procedural, and Set-oriented Quey Language #1/3: Relational Algeba Pue, Pocedual, and Set-oiented To expess a quey, we use a set of opeations. Each opeation takes one o moe elations as input paamete (set-oiented). Since each opeation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly 332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

arxiv: v1 [cs.lo] 3 Dec 2018

arxiv: v1 [cs.lo] 3 Dec 2018 A high-level opeational semantics fo hadwae weak memoy models axiv:1812.00996v1 [cs.lo] 3 Dec 2018 Abstact Robet J. Colvin School of Electical Engineeing and Infomation Technology The Univesity of Queensland

More information

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives SPARK: Soot Reseach Kit Ondřej Lhoták Objectives Spak is a modula toolkit fo flow-insensitive may points-to analyses fo Java, which enables expeimentation with: vaious paametes of pointe analyses which

More information

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

Multidimensional Testing

Multidimensional Testing Multidimensional Testing QA appoach fo Stoage netwoking Yohay Lasi Visuality Systems 1 Intoduction Who I am Yohay Lasi, QA Manage at Visuality Systems Visuality Systems the leading commecial povide of

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Parallelism, Multicore, and Synchronization

Parallelism, Multicore, and Synchronization Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore

More information

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2 Compile constuction 2009 Lectue 2 Code geneation 1: Geneating code The Java Vitual Machine Data types Pimitive types, including intege and floating-point types of vaious sizes and the boolean type. The

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Inteconnection Netwoks fo HPC Systems Fall 2016 Avinash Kaanth Kodi School of Electical Engineeing and Compute Science Ohio Univesity Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement: Inteconnection

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Chapter 4. The Processor. Jiang Jiang

Chapter 4. The Processor. Jiang Jiang Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance

More information

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number. Illustative G-C Simila cicles Alignments to Content Standads: G-C.A. Task (a, b) x y Fo this poblem, is a point in the - coodinate plane and is a positive numbe. a. Using a tanslation and a dilation, show

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAE COMPRESSION STANDARDS Lesson 17 JPE-2000 Achitectue and Featues Instuctional Objectives At the end of this lesson, the students should be able to: 1. State the shotcomings of JPE standad.

More information

Monitors. Lecture 6. A Typical Monitor State. wait(c) Signal and Continue. Signal and What Happens Next?

Monitors. Lecture 6. A Typical Monitor State. wait(c) Signal and Continue. Signal and What Happens Next? Monitos Lectue 6 Monitos Summay: Last time A combination of data abstaction and mutual exclusion Automatic mutex Pogammed conditional synchonisation Widely used in concuent pogamming languages and libaies

More information

Lecture 2: Processor and Pipelining 1

Lecture 2: Processor and Pipelining 1 The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient

More information

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4 12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

A Novel Parallel Deadlock Detection Algorithm and Architecture

A Novel Parallel Deadlock Detection Algorithm and Architecture A Novel Paallel Deadlock Detection Aloithm and Achitectue Pun H. Shiu 2, Yudon Tan 2, Vincent J. Mooney III {ship, ydtan, mooney}@ece.atech.ed }@ece.atech.edu http://codesin codesin.ece.atech.eduedu,2

More information

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc - CentOS 5.2 - Linux Uses Guide - Linux Command SYNOPSIS [-V] [--vesion] [-h] [--help] [-e sciptexpession] [--expession=sciptexpession] [-f sciptfile] [--file=sciptfile] [file...] DESCRIPTION is a evese-polish

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design S 152 ompute chitectue and Engineeing Lectue 11 Multicycle ontolle Design Oveview of ontol ontol may be designed using one of seveal initial epesentations. The choice of sequence contol, and how logic

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

How many times is the loop executed? middle = (left+right)/2; if (value == arr[middle]) return true;

How many times is the loop executed? middle = (left+right)/2; if (value == arr[middle]) return true; This lectue Complexity o binay seach Answes to inomal execise Abstact data types Stacks ueues ADTs, Stacks, ueues 1 binayseach(int[] a, int value) { while (ight >= let) { { i (value < a[middle]) ight =

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information