CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue
|
|
- Raymond Gyles Strickland
- 5 years ago
- Views:
Transcription
1 CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned to compute e.g., Seach Katz Paallel Theads Assigned to coe e.g., Lookup, Ads So7wae Paallel InstucVons >1 one Vme e.g., 5 pipelined instucvons Paallel Data >1 data one Vme e.g., Add of 4 pais of wods Hadwae descipvons All one Vme Pogamming Languages You Ae Hee! Haness Paallelism & Achieve High Pefomance Hadwae Today s Lectue Waehouse Scale Compute Coe Memoy Input/Output InstucVon Unit(s) Main Memoy Coe Smat Phone Logic Gates 2 Compute (Cache) Coe FuncVonal Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 The BIG Pictue Review Pipelining impoves pefomance by inceasing instucvon thoughput: exploits ILP Executes mulvple instucvons in paallel Each instucvon has the same latency Subject to hazads Stuctue, data, contol Stalls educe pefomance But ae equied to get coect esults Compile can aange code to avoid hazads and stalls Requies knowledge of the pipeline stuctue Agenda Moe ILP InstucVon Scheduling Administivia Out of Ode ExecuVon Paallelism Big Pictue And, in Conclusion, 3 4 Agenda Moe ILP InstucVon Scheduling Administivia Out of Ode ExecuVon Paallelism Big Pictue And, in Conclusion, Geate InstucVon- Level Paallelism (ILP) Deepe pipeline (5 => 10 => 15 stages) Less wok pe stage shote clock cycle MulVple issue supescala Replicate pipeline stages mulvple pipelines Stat mulvple instucvons pe clock cycle CPI < 1, so use InstucVons Pe Cycle (IPC) E.g., 4GHz 4- way mulvple- issue 16 BIPS, peak CPI = 0.25, peak IPC = 4 But dependencies educe this in pacvce 5 6 1
2 MulVple Issue StaVc mulvple issue Compile goups instucvons to be issued togethe Packages them into issue slots Compile detects and avoids hazads Dynamic mulvple issue CPU examines instucvon steam and chooses instucvons to issue each cycle Compile can help by eodeing instucvons CPU esolves hazads using advanced techniques at unvme Supescala Laundy: Paallel pe stage T a s k 6 PM AM A B C Time (light clothing) (dak clothing) (vey dity clothing) O D (light clothing) d E (dak clothing) e F (vey dity clothing) Moe esouces, HW to match mix of paallel tasks? 7 8 Pipeline Depth and Issue Width Intel Pocessos ove Time Micopocesso Yea Clock Rate Pipeline Stages Issue width Coes Powe i MHz W Pentium MHz W Pentium Po MHz W P4 Willamette MHz W P4 Pescott MHz W Coe 2 Conoe MHz W Coe 2 Yokfield MHz W Coe i7 Gulftown MHz W Pipeline Depth and Issue Width Clock 1000 Powe 100 Pipeline Stages Issue width 10 Coes StaVc MulVple Issue Compile goups instucvons into issue packets Goup of instucvons that can be issued on a single cycle Detemined by pipeline esouces equied Think of an issue packet as a vey long instucvon Specifies mulvple concuent opeavons Scheduling StaVc MulVple Issue Compile must emove some/all hazads Reode instucvons into issue packets No dependencies with a packet Possibly some dependencies between packets Vaies between ISAs; compile must know! Pad with nop if necessay
3 MIPS with StaVc Dual Issue Two- issue packets One ALU/banch instucvon One load/stoe instucvon 64- bit aligned ALU/banch, then load/stoe Pad an unused instucvon with nop Addess Instuction type Pipeline Stages n ALU/banch IF ID EX MEM WB n + 4 Load/stoe IF ID EX MEM WB n + 8 ALU/banch IF ID EX MEM WB n + 12 Load/stoe IF ID EX MEM WB n + 16 ALU/banch IF ID EX MEM WB n + 20 Load/stoe IF ID EX MEM WB Hazads in the Dual- Issue MIPS Moe instucvons execuvng in paallel EX data hazad Fowading avoided stalls with single- issue Now can t use ALU esult in load/stoe in same packet add $t0, $s0, $s1 load $s2, 0($t0) Split into two packets, effecvvely a stall Load- use hazad SVll one cycle use latency, but now two instucvons Moe aggessive scheduling equied Scheduling Example Schedule this fo dual- issue MIPS Loop: lw $t0, 0($s1) # $t0=aay element addu $t0, $t0, $s2 # add scala in $s2 sw $t0, 0($s1) # stoe esult addi $s1, $s1, 4 # decement pointe bne $s1, $zeo, Loop # banch $s1!=0 ALU/banch Load/stoe cycle Loop: nop lw $t0, 0($s1) 1 addi $s1, $s1, 4 nop 2 addu $t0, $t0, $s2 nop 3 bne $s1, $zeo, Loop sw $t0, 4($s1) 4 n IPC = 5/4 = 1.25 (c.f. peak IPC = 2) Loop Unolling Replicate loop body to expose moe paallelism Reduces loop- contol ovehead Use diffeent egistes pe eplicavon Called egiste enaming Avoid loop- caied anv- dependencies Stoe followed by a load of the same egiste Aka name dependence Reuse of a egiste name Loop Unolling Example ALU/banch Load/stoe cycle Loop: addi $s1, $s1, 16 lw $t0, 0($s1) 1 nop lw $t1, 12($s1) 2 addu $t0, $t0, $s2 lw $t2, 8($s1) 3 addu $t1, $t1, $s2 lw $t3, 4($s1) 4 addu $t2, $t2, $s2 sw $t0, 16($s1) 5 addu $t3, $t4, $s2 sw $t1, 12($s1) 6 nop sw $t2, 8($s1) 7 bne $s1, $zeo, Loop sw $t3, 4($s1) 8 IPC = 14/8 = 1.75 Close to 2, but at cost of egistes and code size Dynamic MulVple Issue Supescala pocessos CPU decides whethe to issue 0, 1, 2, each cycle Avoiding stuctual and data hazads Avoids the need fo compile scheduling Though it may svll help Code semanvcs ensued by the CPU
4 Dynamic Pipeline Scheduling Allow the CPU to execute instucvons out of ode to avoid stalls But commit esult to egistes in ode Example lw $t0, 20($s2) addu $t1, $t0, $t2 subu $s4, $s4, $t3 slti $t5, $s4, 20 Can stat subu while addu is waivng fo lw Why Do Dynamic Scheduling? Why not just let the compile schedule code? Not all stalls ae pedicable e.g., cache misses Can t always schedule aound banches Banch outcome is dynamically detemined Diffeent implementavons of an ISA have diffeent latencies and hazads Agenda Moe ILP InstucVon Scheduling Administivia Out of Ode ExecuVon Paallelism Big Pictue And, in Conclusion, 21 Administivia As of today, made 1 pass ove all Big Ideas in Compute Achitectue Following lectues go into moe depth on topics you ve aleady seen while you wok on pojects 1 lectue in moe depth on Caches 1 on C stoage management 1 on Dependability/Reliability/Redundancy 1 on PotecVon, Vitual Memoy, Vitual Machines 1 on ExcepVons, Taps, Inteupts 2 on Moden Phone and Compute Achitectues 1 on Pogamming Contest (Exta Cedit Poject 5) 1 on Couse Wap- up and Review 22 Agenda Moe ILP InstucVon Scheduling Administivia Out of Ode ExecuVon Paallelism Big Pictue And, in Conclusion, SpeculaVon Guess what to do with an instucvon Stat opeavon as soon as possible Check whethe guess was ight If so, complete the opeavon If not, oll- back and do the ight thing Common to stavc and dynamic mulvple issue Examples Speculate on banch outcome (Banch PedicVon) Roll back if path taken is diffeent Speculate on load Roll back if locavon is updated
5 Pipeline Hazad: Matching socks in late load T a s k O d e 6 PM AM A B C D E F bubble A depends on D; stall since folde Ved up; Time 25 Out- of- Ode Laundy: Don t Wait 6 PM AM T Time a s k A B C bubble O d e D E F A depends on D; est convnue; need moe esouces to allow out- of- ode 26 Out- of- Ode ExecuVon (1/2) Basically, unoll loops in hadwae 1. Fetch instucvons in pogam ode ( 4/clock) 2. Pedict banches as taken/untaken 3. To avoid hazads on egistes, ename egistes using a set of intenal egistes (~80 egistes) 4. CollecVon of enamed instucvons might execute in a window (~60 instucvons) 5. Execute instucvons with eady opeands in 1 of mulvple func(onal units (ALUs, FPUs, Ld/St) 27 Out- of- Ode ExecuVon (2/2) Basically, unoll loops in hadwae 6. Buffe esults of executed instucvons unvl pedicted banches ae esolved in eode buffe 7. If pedicted banch coectly, commit esults in pogam ode 8. If pedicted banch incoectly, discad all dependent esults and stat with coect PC 28 Out Of Ode Intel All use OOO since 2001 AMD Opteon X4 Micoachitectue 72 physical egistes Micopocesso Yea Clock Rate Pipeline Stages Issue width Out-of-ode/ Speculation Coes i MHz 5 1 No 1 5W Powe Pentium MHz 5 2 No 1 10W Pentium Po MHz 10 3 Yes 1 29W P4 Willamette MHz 22 3 Yes 1 75W P4 Pescott MHz 31 3 Yes 1 103W Coe MHz 14 4 Yes 2 75W Coe 2 Yokfield MHz 16 4 Yes 4 95W Coe i7 Gulftown MHz 16 4 Yes 6 130W Chapte 4 The Pocesso
6 AMD Opteon X4 Pipeline Flow Fo intege opeavons Dynamically Scheduled CPU Peseves dependencies Hold pending opeands n n 12 stages (FloaVng Point is 17 stages) Up to 106 RISC- ops in pogess n Intel Nehalem is 16 stages fo intege opeavons, details not evealed, but likely simila to above+ n Intel calls RISC opeavons Mico opeavons o μops Reodes buffe fo egiste wites Can supply opeands fo issued instucvons Results also sent to any waivng esevavon stavons Does MulVple Issue Wok? The BIG Pictue Yes, but not as much as we d like Pogams have eal dependencies that limit ILP Some dependencies ae had to eliminate e.g., pointe aliasing Some paallelism is had to expose Limited window size duing instucvon issue Memoy delays and limited bandwidth Had to keep pipelines full SpeculaVon can help if done well Big Pictue on Paallelism Two types of paallelism in applica(ons 1. Data- Level Paallelism (DLP): aises because thee ae many data items that can be opeated on at the same Vme 2. Task- Level Paallelism (TLP): aises because tasks of wok ae ceated that can opeate lagely in paallel Fall Lectue # Big Pictue on Paallelism Hadwae can exploit app Data LP and Task LP in 4 ways: 1. Instuc(on- Level Paallelism: Hadwae exploits applicavon DLP using ideas like pipelining and speculavve execuvon 2. SIMD achitectues: exploit app DLP by applying a single instucvon to a collecvon of data in paallel 3. Thead- Level Paallelism: exploits eithe app DLP o TLP in a Vghtly- coupled hadwae model that allows fo inteacvon among paallel theads 4. Request- Level Paallelism: exploits paallelism among lagely decoupled tasks and is specified by the pogamme of the opeavng system And in Conclusion.. Pipeline challenge is hazads Fowading helps w/many data hazads Delayed banch helps with contol hazad in 5 stage pipeline Load delay slot / intelock necessay Moe aggessive pefomance: Longe pipelines Supescala Out- of- ode execuvon SpeculaVon
7 Pee QuesVon State if following techniques ae associated pimaily with a softwae- o hadwae-based appoach to exploiting ILP (in some cases, the answe may be both): Supescala, Out-of- Ode execution, Speculation, Registe Renaming Supe- scala Out of Ode Specu- lavon Registe Renaming Oange HW HW HW HW Geen HW HW Both Both Pink HW HW HW Both HW HW HW SW Pee QuesVon State if following techniques ae associated pimaily with a softwae- o hadwae-based appoach to exploiting ILP (in some cases, the answe may be both): Supescala, Out-of- Ode execution, Speculation, Registe Renaming Supe- scala Out of Ode Specu- lavon Registe Renaming Oange HW HW HW HW Geen HW HW Both Both Pink HW HW HW Both HW HW HW SW Pee InstucVon Inst LP, SIMD, Thead LP, Request LP ae examples of Paallelism above ( ) the InstucVon Set Achitectue Paallelism explicitly at (=) the level of the ISA Paallelism below ( ) the level of the ISA Inst. LP SIMD Th. LP Req. LP Oange = = Geen = = Pink = = Pee InstucVon Inst LP, SIMD, Thead LP, Request LP ae examples of Paallelism above ( ) the InstucVon Set Achitectue Paallelism explicitly at (=) the level of the ISA Paallelism below ( ) the level of the ISA Inst. LP SIMD Th. LP Req. LP Oange = = Geen = = Pink = = Pee InstucVon Pee Answe I. Thanks to pipelining, I have educed the time it took me to wash my one shit. II. Longe pipelines ae always a win (since less wok pe stage & a faste clock). I. Thanks to pipelining, I have educed the Vme it took me to wash my one shit. II. Longe pipelines ae always a win (since less wok pe stage & a faste clock). A)(oange) I is Tue and II is Tue B)(geen) I is False and II is Tue C)(pink) I is Tue and II is False Red Oange Geen I is Tue and II is Tue I is False and II is Tue I is Tue and II is False
8 Pee QuesVon Not all instucvons ae acvve in evey stage of the 5- stage pipeline. Ignoing the effects of hazads, which of the following is tue? 1. Allowing jumps, banches, and ALU instucvons to take fewe stages than the 5 equied by the load instucvon will incease pipeline pefomance fo most pogams. 2. You cannot make ALU instucvons take fewe cycles because of the wite back of the esult, but banches and jumps can take fewe cycles, so thee is some oppotunity fo impovement. 3. Instead of tying to make instucvons take fewe cycles, we should exploe making the pipeline longe, so that instucvons take moe cycles, but the cycles ae shote. This could impove pefomance. 4. The numbe of pipe stages pe instucvon affects thoughput, not latency. Oange: 1 Geen: 2 Pink: 3 Pee Answe Not all instucvons ae acvve in evey stage of the 5- stage pipeline. Ignoing the effects of hazads, which of the following is tue? 1. Allowing jumps, banches, and ALU instucvons to take fewe stages than the 5 equied by the load instucvon will incease pipeline pefomance fo most pogams. 2. You cannot make ALU instucvons take fewe cycles because of the wite back of the esult, but banches and jumps can take fewe cycles, so thee is some oppotunity fo impovement. 3. Instead of tying to make instucvons take fewe cycles, we should exploe making the pipeline longe, so that instucvons take moe cycles, but the cycles ae shote. This could impove pefomance. 4. The numbe of pipe stages pe instucvon affects thoughput, not latency. Oange: 1 Geen: 2 Pink: And in Conclusion, Big Ideas of InstucVon Level Paallelism Pipelining, Hazads, and Stalls Fowading, SpeculaVon to ovecome Hazads MulVple issue to incease pefomance IPC instead of CPI Dynamic ExecuVon: Supescala in- ode issue, banch pedicvon, egiste enaming, out- of- ode execuvon, in- ode commit unoll loops in HW, hide cache misses 45 8
You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011
CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism
Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design
More informationCS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia
CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned
More informationLecture 8 Introduction to Pipelines Adapated from slides by David Patterson
Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.
More informationThe Processor: Improving Performance Data Hazards
The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline
More informationECE331: Hardware Organization and Design
ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam
More informationIntroduction To Pipelining. Chapter Pipelining1 1
Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?
More informationUCB CS61C : Machine Structures
inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called
More informationLecture #22 Pipelining II, Cache I
inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html
More informationCOSC 6385 Computer Architecture. - Pipelining
COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped
More informationComputer Science 141 Computing Hardware
Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationComputer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture
Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More informationCOEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines
1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationChapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)
Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruction Level Parallelism: Multiple Instruction Issue
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruction Level Parallelism: Multiple Instruction Issue Guest Lecturer: Justin Hsia You Are Here! Software Hardware Parallel Requests
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20
Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCS 2461: Computer Architecture 1 Program performance and High Performance Processors
Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks
More informationCENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu
CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol
More informationUser Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)
PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationLecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining
EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining
More informationCENG 3420 Lecture 07: Pipeline
CENG 3420 Lectue 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Sping 2017 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in
More informationReview from last lecture
CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect
More informationCSE4201. Computer Architecture
CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationChapter 4 The Processor (Part 4)
Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationAdapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]
Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationEN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationPre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly
332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationReview: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade
EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More information5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationTHE THETA BLOCKCHAIN
THE THETA BLOCKCHAIN Theta is a decentalized video steaming netwok, poweed by a new blockchain and token. By Theta Labs, Inc. Last Updated: Nov 21, 2017 esion 1.0 1 OUTLINE Motivation Reputation Dependent
More informationParallelism, Multicore, and Synchronization
Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore
More informationAccelerating Storage with RDMA Max Gurtovoy Mellanox Technologies
Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability
More informationChapter 4. The Processor. Jiang Jiang
Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance
More informationLecture 2: Processor and Pipelining 1
The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient
More informationAny modern computer system will incorporate (at least) two levels of storage:
1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationIF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4
12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationMultiple Issue ILP Processors. Summary of discussions
Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type
More informationA Memory Efficient Array Architecture for Real-Time Motion Estimation
A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN
More information1.3 Multiplexing, Time-Switching, Point-to-Point versus Buses
http://achvlsi.ics.foth.g/~kateveni/534 1.3 Multiplexing, Time-Switching, Point-to-Point vesus Buses n R m Aggegation (multiplexing) Distibution (demultiplexing) Simplest Netwoking, like simplest pogamming:
More informationEfficient Execution Path Exploration for Detecting Races in Concurrent Programs
IAENG Intenational Jounal of Compute Science, 403, IJCS_40_3_02 Efficient Execution Path Exploation fo Detecting Races in Concuent Pogams Theodous E. Setiadi, Akihiko Ohsuga, and Mamou Maekaa Abstact Concuent
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More information5008: Computer Architecture HW#2
5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be
More informationMultidimensional Testing
Multidimensional Testing QA appoach fo Stoage netwoking Yohay Lasi Visuality Systems 1 Intoduction Who I am Yohay Lasi, QA Manage at Visuality Systems Visuality Systems the leading commecial povide of
More informationEECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)
Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static
More informationFinal Exam Fall 2008
COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationQuery Language #1/3: Relational Algebra Pure, Procedural, and Set-oriented
Quey Language #1/3: Relational Algeba Pue, Pocedual, and Set-oiented To expess a quey, we use a set of opeations. Each opeation takes one o moe elations as input paamete (set-oiented). Since each opeation
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationarxiv: v1 [cs.lo] 3 Dec 2018
A high-level opeational semantics fo hadwae weak memoy models axiv:1812.00996v1 [cs.lo] 3 Dec 2018 Abstact Robet J. Colvin School of Electical Engineeing and Infomation Technology The Univesity of Queensland
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationOverview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design
S 152 ompute chitectue and Engineeing Lectue 11 Multicycle ontolle Design Oveview of ontol ontol may be designed using one of seveal initial epesentations. The choice of sequence contol, and how logic
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationHow many times is the loop executed? middle = (left+right)/2; if (value == arr[middle]) return true;
This lectue Complexity o binay seach Answes to inomal execise Abstact data types Stacks ueues ADTs, Stacks, ueues 1 binayseach(int[] a, int value) { while (ight >= let) { { i (value < a[middle]) ight =
More informationDYNAMIC STORAGE ALLOCATION. Hanan Samet
ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu
More informationA Novel Parallel Deadlock Detection Algorithm and Architecture
A Novel Paallel Deadlock Detection Aloithm and Achitectue Pun H. Shiu 2, Yudon Tan 2, Vincent J. Mooney III {ship, ydtan, mooney}@ece.atech.ed }@ece.atech.edu http://codesin codesin.ece.atech.eduedu,2
More information