Uniprocessors. HPC Prof. Robert van Engelen

Size: px
Start display at page:

Download "Uniprocessors. HPC Prof. Robert van Engelen"

Transcription

1 Uiprocessors HPC Prof. Robert va Egele

2 Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures Pipelie ad vector machies Istructio set architectures Shared memory Istructio schedulig ad executio Distributed memory Message passig Data storage Parallel programmig models Memory hierarchy Shared vs distributed memory Caches, TLB Hybrid Compiler optimizatios BSP HPC 2

3 Levels of Parallelism Superscalar ad multicore processors (fie grai) (stacked, fie grai to coarse grai) HPC 3

4 Processor Families Family CISC Vector RISC VLIW Istructio Set Architecture (ISA) DEC VAX Itel 80x86 (IA-32) CRAY Covex HP PA-RISC SGI MIPS DEC Alpha Su Sparc IBM PowerPC Multiflow Cydrome Itel IA-64 Processors VAX-11/780 Itel Petium Pro CRAY T90 Covex C-4 PA-8600 MIPS R10000 Alpha21264 Su UltraSparc-III IBM Power3 Multiflow Trace Cydrome Cydra-5 Itel Itaium HPC 4

5 CISC, RISC, VLIW, ad Vector Processor Families CISC (complex istructio set computer) CISC ISAs offer specialized istructios of various (ad variable) legth Istructios typically executed i microcode RISC (reduced istructio set computer) No microcode ad relatively few istructios May RISC processors are superscalar (for istructio-level parallelism) Oly load ad store istructios access memory Sigle commo istructio word legth More registers tha CISC VLIW (very log istructio word) Budles multiple istructios for parallel executio Depedeces betwee istructios i a budle are prohibited More registers tha CISC ad RISC Vector machies Sigle istructios operate o vectors of data (ot ecessarily parallel) Multiple vector istructios ca be chaied HPC 5

6 Superscalar Architectures HPC 6

7 Istructio Pipelie Five istructios are i the pipelie at differet stages A istructio pipelie icreases the istructio badwidth Classic 5-stage pipelie: IF istructio fetch ID istructio decode EX execute (fuctioal uits) MEM load/store WB write back to registers ad forward results ito the pipelie whe eeded by aother istructio HPC 7

8 Istructio Pipelie Example Example 4-stage pipelie Four istructios (gree, purple, blue, red) are processed i this order Cycle 0: istructios are waitig Cycle 1: Gree is fetched Cycle 2: Purple is fetched, gree decoded Cycle 3: Blue is fetched, purple decoded, gree executed Cycle 4: Red is fetched, blue decoded, purple executed, gree i write-back Etc. HPC 8

9 Istructio Pipelie Hazards Pipelie hazards From data depedeces Forwardig i the WB stage ca elimiate some data depedece hazards From istructio fetch latecies (e.g. I-cache miss) From memory load latecies (e.g. D-cache miss) A hazard is resolved by stallig the pipelie, which causes a bubble of oe ore more cycles Example Suppose a stall of oe cycle occurs i the IF stage of the purple istructio Cycle 3: Purple caot be decoded ad a o-operatio (NOP) is iserted HPC 9

10 N-way Superscalar RISC 2-way superscalar RISC pipelie RISC istructios have the same word size Ca fetch multiple istructios without havig to kow the istructio cotet of each N-way superscalar RISC processors fetch N istructios each cycle Icreases the istructio-level parallelism HPC 10

11 A CISC Istructio Set i a Superscalar Architecture Petium processors traslate CISC istructios to RISC-like µops Advatages: Higher istructio badwidth Maitais istructio set architecture (ISA) compatibility Petium 4 has a 31 stage pipelie divided ito three mai stages: Fetch ad decode Executio Retiremet Simplified block diagram of the Itel Petium 4 HPC 11

12 Istructio Fetch ad Decode Simplified block diagram of the Itel Petium 4 Petium 4 decodes istructios ito µops ad deposits the µops i a trace cache Allows the processor to fetch the µops trace of a istructio that is executed agai (e.g. i a loop) Istructios are fetched: Normally i the same order as stored i memory Or fetched from brach targets predicted by the brach predictio uit Petium 4 oly decodes oe istructio per cycle, but ca deliver up to three µops per cycle to the executio stage RISC architectures typically fetch multiple istructios per cycle HPC 12

13 Istructio Executio Stage Simplified block diagram of the Itel Petium 4 Executes multiple µops i parallel Istructio-level parallelism (ILP) The scheduler marks a µop for executio whe all operads of the a µop are ready The µops o which a µop depeds must be executed first A µop ca be executed out-oforder i which it appeared Petium 4: a µop is re-executed whe its operads were ot ready O Petium 4 there are 4 ports to sed a µop ito Each port has oe or more fixed executio uits Port 0: ALU0, FPMOV Port 1: ALU1, INT, FPEXE Port 2: LOAD Port 3: STORE HPC 13

14 Retiremet Looks for istructios to mark completed Are all µops of the istructio executed? Are all µops of the precedig istructio retired? (puttig istructios back i order) Notifies the brach predictio uit whe a brach was icorrectly predicted Processor stops executig the wrogly predicted istructios ad discards them (takes»10 cycles) Petium 4 retires up to 3 istructios per clock cycle Simplified block diagram of the Itel Petium 4 HPC 14

15 Software Optimizatio to Icrease CPU Throughput Processors ru at maximum speed (high istructio per cycle rate (IPC)) whe 1. There is a good mix of istructios (with low latecies) to keep the fuctioal uits busy 2. Operads are available quickly from registers or D-cache 3. The FP to memory operatio ratio is high (FP : MEM > 1) 4. Number of data depedeces is low 5. Braches are easy to predict The processor ca oly improve #1 to a certai level with out-of-order schedulig ad partly #2 with hardware prefetchig Compiler optimizatios effectively target #1-3 The programmer ca help improve #1-5 HPC 15

16 Istructio Latecy ad Throughput a=u*v; b=w*x; c=y*z Latecy: the umber of clocks to complete a istructio whe all of its iputs are ready Throughput: the umber of clocks to wait before startig a idetical istructio Idetical istructios are those that use the same executio uit The example shows three multiply operatios, assumig there is oly oe multiply executio uit Actual typical latecies (i cycles) Iteger add: 1 FP add: 3 FP multiplicatio: 3 FP divisio: 31 HPC 16

17 Istructio Latecy Case Study Cosider two versios of Euclid s algorithm 1. Modulo versio 2. Repetitive subtractio versio Which is faster? it fid_gcf1(it a, it b) { while (1) { a = a % b; if (a == 0) retur b; if (a == 1) retur 1; b = b % a; if (b == 0) retur a; if (b == 1) retur 1; } } Modulo versio it fid_gcf2(it a, it b) { while (1) { if (a > b) a = a - b; else if (a < b) b = b - a; else retur a; } } Repetitive subtractio HPC 17

18 Istructio Latecy Case Study Cosider the cycles estimated for the case a=48 ad b=40 Istructio # Latecy Cycles Istructio # Latecy Cycles Modulo Subtract Compare Compare Brach Brach Other Other 0 Total Total Modulo versio Repetitive subtractio Executio time for all values of a ad b i [ ] Modulo versio Repetitive subtractio Bleded versio sec sec sec HPC 18

19 Data Depedeces (w*x)*(y*z) Istructio level parallelism is limited by data depedeces Types of depedeces: RAW: read-after-write also called flow depedece WAR: write-after-read also called ati depedece WAW: write-after-write also called output depedece The example shows a RAW depedece WAR ad WAW depedeces exist because of storage locatio reuse (overwrite with ew value) WAR ad WAW are sometimes called false depedeces RAW is a true depedece HPC 19

20 Data Depedece Case Study Removig redudat operatios by (re)usig (temporary) space may icrease the umber of depedeces Example: two versios to iitialize a fiite differece matrix 1. Recurret versio with lower FP operatio cout 2. No-recurret versio with fewer depedeces Which is fastest depeds o effectiveess of loop optimizatio ad istructio schedulig by compiler (ad processor) to hide latecies ad the umber of distict memory loads dxi=1.0/h(1) do i=1, dxo=dxi dxi=1.0/h(i+1) diag(i)=dxo+dxi offdiag(i)=-dxi eddo With recurrece (cross iter dep ot show) WAR RAW do i=1, dxo=1.0/h(i) dxi=1.0/h(i+1) diag(i)=dxo+dxi offdiag(i)=-dxi eddo Without recurrece (cross iter dep ot show) HPC 20

21 Case Study (1) dxi = 1.0/h[0]; for (i=1; i<; i++) { dxo = dxi; dxi = 1.0/h[i+1]; diag[i] = dxo+dxi; offdiag[i] = -dxi; } Itel Core 2 Duo 2.33 GHz for (i=1; i<; i++) { dxo = 1.0/h[i]; dxi = 1.0/h[i+1]; diag[i] = dxo+dxi; offdiag[i] = -dxi; } gcc O3 fdiit.c time./a.out sec gcc O3 fdiit.c time./a.out sec Oe cross-iteratio flow depedece but fewer memory loads Oe more memory load but fewer deps Depedece spas multiple istructios, so has o big impact HPC 21

22 Timig compariso of o-optimized kerels recurret versio: sec o-recurret versio: sec Case Study (2) Timig compariso of compiler optimized kerels recurret versio: sec o-recurret versio: sec f77 -g fdiit.f -o fdiit collect -o fdiit.er./fdiit UltraSparc IIIi 1.2 GHz f77 -fast -g fdiit.f -o fdiit collect -o fdiit.er./fdiit HPC 22

23 Istructio Schedulig z = (w*x)*(y*z); k = k+2; m = k+ = +4; Istructio schedulig ca hide data depedece latecies With static schedulig the compiler moves idepedet istructios up/dow to fixed positios With dyamic schedulig the processor executes istructios out of order whe they are ready The example shows a optimal istructio schedule, assumig there is oe multiply ad oe add executio uit Note that m=k+ uses the old value of (WAR depedece) Advaced processors remove WAR depedeces from the executio pipelie usig register reamig HPC 23

24 Hardware Register Reamig r0 = r1 + r2 r2 = r7 + r4 Sequetial r0 = r1 + r2; r2 = r7 + r4 r2 = r2 Parallel Pitfall: it does ot help to maually remove WAR depedeces i program source code by itroducig extra scalar variables, because the compiler s register allocator reuses registers assiged to these variables ad thereby reitroduces the WAR depedeces at the register level Register reamig performed by a processor removes uecessary WAR depedeces caused by register reuse Fixed umber of registers are assiged by compiler to hold temporary values A processor s register file icludes a set of hidde registers A hidde register is used i place of a actual register to elimiate a WAR hazard The WB stage stores the result i the destiatio register HPC 24

25 double *p, a[10]; *p = 0; s += a[i]; r0 = 0 r1 = p r2 = a+i store r0 i M[r1] load M[r2] ito r3 Data Speculatio r2 = a+i adv load M[r2] ito r3 r0 = 0 r1 = p store r0 i M[r1] check adv load address: reload r3 A load should be iitiated as far i advace as possible to hide memory latecy Whe a store to address A1 is followed by a load from address A2 the there is a RAW depedece whe A1=A2 A compiler assumes there is a RAW depedece if it caot disprove A1=A2 The advaced load istructio allows a process to igore a potetial RAW depedece ad sort out the coflict at ru time whe the store address is kow ad is the same HPC 25

26 Cotrol Speculatio if (i<) x = a[i] if (i>) jump to skip load a[i] ito r0 skip: speculative load a[i] ito r0 if (i>) jump to skip check spec load exceptios skip: Cotrol speculatio allows coditioal istructios to be executed before the coditioal brach i which the istructio occurs Hide memory latecies A speculative load istructio performs a load operatio Eables loadig early Exceptios are igored A check operatio verifies that whether the load trigged a exceptio (e.g. bus error) Reraise the exceptio HPC 26

27 Hardware Brach Predictio for (a=0; a<100; a++) { if (a % 2 == 0) do_eve(); else do_odd(); } Simple brach patter that is predicted correctly by processor for (a=0; a<100; a++) { if (flip_coi() == HEADS) do_heads(); else do_tails(); } Radom brach patter that is difficult to predict Brach predictio is a architectural feature that eables a processor to fetch istructios of a target brach Whe predicted correctly there is o brach pealty Whe ot predicted correctly, the pealty is typically >10 cycles Brach predictio uses a history mechaism per brach istructio by storig a compressed form of the past brachig patter HPC 27

28 Improvig Brach Predictio I a complicated brach test i C/C++ move the simplest to predict coditio to the frot of the cojuctio if (i == 0 && a[i] > 0) this example also has the added beefit of testig the more costly a[i]>0 less frequetly Rewrite cojuctios to logical expressios if (t1==0 && t2==0 && t3==0) Þ if ( (t1 t2 t3) == 0 ) Use max/mi or arithmetic to avoid braches if (a >= 255) a = 255; Þ a = mi(a, 255); Note that i C/C++ the cod?the:else operator ad the && ad operators result i braches! HPC 28

29 Memory hierarchy Data Storage Performace of storage CPU ad memory Virtual memory, TLB, ad pagig Cache HPC 29

30 Memory Hierarchy faster volatile Storage systems are orgaized i a hierarchy: Speed Cost Volatility cheaper per byte HPC 30

31 Performace of Storage Registers are fast, typically oe clock cycle to access data Cache access takes tes of cycles Memory access takes hudreds of cycles Movemet betwee levels of storage hierarchy ca be explicit or implicit HPC 31

32 CPU ad Memory CPU TLB registers I-cache D-cache L1 Cache O/off chip Uified L2 Cache Memory Bus Mai Memory I/O Bus Disk HPC 32

33 Memory Access A logical address is traslated ito a physical address i virtual memory usig a page table The traslatio lookaside buffer (TLB) is a efficiet o-chip address traslatio cache Memory is divided ito pages Virtual memory systems store pages i memory (RAM) ad o disk Page fault: page is fetched from disk L1 caches (o chip): I-cache stores istructios D-cache stores data L2 cache (E-cache, o/off chip) Is typically uified: stores istructios ad data HPC 33

34 Locality i a Memory Referece Patter The workig set model A process ca be i physical memory if ad oly if all of the pages that it is curretly usig (the most recetly used pages) ca be i physical memory Page thrashig occurs if pages are moved excessively betwee physical memory ad disk Operatig system picks ad adjusts workig set size OS attempts to miimize page faults HPC 34

35 Traslatio Lookaside Buffer Logical address to physical address traslatio with TLB lookup to limit page table reads (page table is stored i physical memory) Logical pages are mapped to physical pages i memory (typical page size is 4KB) HPC 35

36 Fidig TLB Misses collect -o testprog.er -h DTLB_miss,o./testprog HPC 36

37 Caches Direct mapped cache: each locatio i mai memory ca be cached by just oe cache locatio N-way associative cache: each locatio i mai memory ca be cached by oe of N cache locatios Fully associative: each locatio i mai memory ca be cached by ay cache locatio Experimet shows SPEC CPU2000 bechmark cache miss rates HPC 37

38 Cache Details A N-way associative cache has a set of N cache lies per row A cache lie ca be 8 to 512 bytes Loger lies icrease memory badwidth performace, but space ca be wasted whe applicatios access data i a radom order A typical 8-way L1 cache (o-chip) has 64 rows with 64 byte cache lies Cache size = 64 rows x 8 ways x 64 bytes = bytes A typical 8-way L2 cache (o/off chip) has 1024 rows with 128 byte cache lies HPC 38

39 Cache Misses Compulsory miss: readig some_static_data[0] for (i = 0; i < ; i++) X[i] = some_static_data[i]; for (i = 0; i < ; i++) X[i] = X[i] + Y[i]; Capacity miss: X[i] o loger i cache Coflict miss: X[i] ad Y[i] are mapped to the same cache lie (e.g. whe cache is direct mapped) Compulsory misses Caused by the first referece to a datum Misses are effectively reduced by prefetchig Capacity misses Cache size is fiite Misses are reduced by limitig the workig set size of the applicatio Coflict misses Replacemet misses are caused by the choice of victim cache lie to evict by the replacemet policy Mappig misses are caused by level of associativity HPC 39

40 False Sharig O a multi-core processor each core has a L1 cache ad shares the L2 cache with other cores False sharig occurs whe two caches of processors (or cores) cache two differet o-shared data items that reside o the same cache lie Cache coherecy protocol marks the cache lie (o all cores) dirty to force reload To avoid false sharig: Allocate o-shared data o differet cache lies (usig malloc) Limit the use of global variables HPC 40

41 Fidig Cache Misses collect -o test.memory.1.er -S off -p o -h ecstall,o,cycles,o./cachetest -g memory.erg collect -o test.memory.2.er -S off -h icstall,o,dcstall,o./cachetest -g memory.erg HPC 41

42 Further Readig [HPC] pages 7-56 Optioal: [OPT] pages HPC 42

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Architecture and Compilers PART I. HPC Fall 2008 Prof. Robert van Engelen

Architecture and Compilers PART I. HPC Fall 2008 Prof. Robert van Engelen Architecture and Compilers PART I HPC Fall 2008 Prof. Robert van Engelen Overview PART I Uniprocessors PART II Multiprocessors Processors Parallel models Processor architectures Instruction set architectures

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors CS252 Sprig 2017 Graduate Computer Architecture Lecture 6: Out-of-Order Processors Lisa Wu, Krste Asaovic http://ist.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 2 WU UCB CS252 SP17 Last Time i Lecture

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Arquitectura de Computadores

Arquitectura de Computadores Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Lecture 1: Introduction and Fundamental Concepts 1

Lecture 1: Introduction and Fundamental Concepts 1 Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios

More information

Chapter 3. Floating Point Arithmetic

Chapter 3. Floating Point Arithmetic COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 3 Floatig Poit Arithmetic Review - Multiplicatio 0 1 1 0 = 6 multiplicad 32-bit ALU shift product right multiplier add

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Executio Prof. Our Mutlu ETH Zurich Sprig 2018 27 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Algorithm. Counting Sort Analysis of Algorithms

Algorithm. Counting Sort Analysis of Algorithms Algorithm Coutig Sort Aalysis of Algorithms Assumptios: records Coutig sort Each record cotais keys ad data All keys are i the rage of 1 to k Space The usorted list is stored i A, the sorted list will

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

A collection of open-sourced RISC-V processors

A collection of open-sourced RISC-V processors Riscy Processors A collectio of ope-sourced RISC-V processors Ady Wright, Sizhuo Zhag, Thomas Bourgeat, Murali Vijayaraghava, Jamey Hicks, Arvid Computatio Structures Group, CSAIL, MIT 4 th RISC-V Workshop

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 8 Strigs ad Vectors Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Slide 8-3 8.1 A Array Type for Strigs A Array Type for Strigs C-strigs ca be used to represet strigs

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units Desig of Digital Circuits Lecture 21: SIMD Processors II ad Graphics Processig Uits Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 17 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea 5-1 Chapter 5 Processor Desig Advaced Topics Chapter 5: Processor Desig Advaced Topics Topics 5.3 Microprogrammig Cotrol store ad microbrachig Horizotal ad vertical microprogrammig 5- Chapter 5 Processor

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

COP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen

COP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen COP4020 mig Laguages Compilers ad Iterpreters Prof. Robert va Egele Overview Commo compiler ad iterpreter cofiguratios Virtual machies Itegrated developmet eviromets Compiler phases Lexical aalysis Sytax

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff Computer rchitecture Microcomputer rchitecture ad Iterfacig Colorado School of Mies Professor William Hoff Computer Hardware Orgaizatio Processor Performs all computatios; coordiates data trasfer Iput

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 8 Strigs ad Vectors Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Copyright 2015 Pearso Educatio, Ltd..

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

Examples and Applications of Binary Search

Examples and Applications of Binary Search Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

ECE5917 SoC Architecture: MP SoC Part 1. Tae Hee Han: Semiconductor Systems Engineering Sungkyunkwan University

ECE5917 SoC Architecture: MP SoC Part 1. Tae Hee Han: Semiconductor Systems Engineering Sungkyunkwan University ECE5917 SoC Architecture: MP SoC Part 1 Tae Hee Ha: tha@skku.edu Semicoductor Systems Egieerig Sugkyukwa Uiversity Outlie Overview Parallelism Data-Level Parallelism Istructio-Level Parallelism Thread-Level

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018 Fudametals of Chapter 1 Microprocessor ad Microcotroller Dr. Farid Farahmad Updated: Tuesday, Jauary 16, 2018 Evolutio First came trasistors Itegrated circuits SSI (Small-Scale Itegratio) to ULSI Very

More information

Pipelining Exercises, Continued

Pipelining Exercises, Continued Pipelining Exercises, Continued. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Chapter 2. C++ Basics. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 2. C++ Basics. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 2 C++ Basics Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 2.1 Variables ad Assigmets 2.2 Iput ad Output 2.3 Data Types ad Expressios 2.4 Simple Flow of Cotrol 2.5 Program

More information

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers Outlie CSCI 4730 s! What is a s?!! System Compoet Architecture s Overview Questios What is a?! What are the major operatig system compoets?! What are basic computer system orgaizatios?! How do you commuicate

More information