CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

Size: px
Start display at page:

Download "CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago"

Transcription

1 CMSC Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago

2 Course Evaluatio Very importat Please fill out! 2

3 Lab3 Brach Predictio Competitio 8 teams etered the competitio, extra credits give to all Evaluated based o correctess, performace gai, ad writeup uality Ross Rauber ad Oliver Tsag, 39.32% improvemet Zaye Khouja ad Aviash Rao, 32.72% improvemet Owe Frazier ad Jaseph Maues, 31.97% improvemet 3

4 Lecture Outlie Multi-core cotiued 4

5 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 5

6 Cache Coherece 6

7 The VI (Valid/Ivalid) Protocol PrRd / BusRd PrRd/-- Valid Ivalid PrWr / BusWr BusWr Write-through, owrite-allocate cache Actios of the local processor o the cache block: PrRd, PrWr, Actios o the bus to commuicate to memory ad other processors: BusRd, BusWr PrWr / BusWr ObservedEvet/Actio 7

8 A More Sophisticated Protocol: MSI Used with writeback caches Exted metadata per block to ecode three states: M(odified): cache lie is the oly cached copy ad is dirty S(hared): cache lie is potetially oe of several cached copies I(valid): cache lie is ot preset i this cache 8

9 MSI State Machie Upgrade Write-back, write-allocate cache Abbrevia -tio Actio ObservedEvet/Actio Dowgrade (bus iitiated) PrRd PrWr BusRd BusRdX Flush Processor read Processor write Bus read Bus read exclusive (read with itet to modify; must ivalidate all other cache copies) Puts dirty data o bus to update memory ad supply data to other processors 9

10 MSI Protocol Walkthrough 1. If the cache block is modified a. PrRr or PrWr: this is a cache hit. Just retur the value or update the cache value. No eed to go to memory or talk to other processors, ad the block remais modified 10

11 MSI Protocol Walkthrough 1. If the cache block is modified b. BusRd: others wish to read the block; put dirty data o bus; block is dowgraded to shared 11

12 MSI Protocol Walkthrough 1. If the cache block is modified c. BusRdX: others wish to write to the block; put dirty data o bus; block is dowgraded to ivalid 12

13 MSI Protocol Walkthrough 2. If the cache block is shared a. PrRd: cache hit; BusRd: others are just readig the data; othig to be doe 13

14 MSI Protocol Walkthrough 2. If the cache block is shared b. PrWr: we wish to write but other cores are sharig this block; so geerate a BusRdX operatio to ivalidate other copies; the block is upgraded to modified 14

15 MSI Protocol Walkthrough 2. If the cache block is shared c. BusRdX: aother core wats to write to the block, must ivalidate our copy; the block is dowgraded to ivalid 15

16 MSI Protocol Walkthrough 3. If the cache block is ivalid a. PrRd: cache miss ad we just wat to read. Geerate a BusRd operatio to get data (from memory or aother core). The block is upgraded to shared 16

17 MSI Protocol Walkthrough 3. If the cache block is ivalid b. PrWr: cache miss ad we wat to write. Geerate a BusRdX operatio to get data (from memory or aother core) ad ivalidate other copies. The block is upgraded to modified 17

18 The Problem with MSI A block is i o cache to begi with Problem: O a read, the block immediately goes to Shared state although it may be the oly copy to be cached (i.e., o other processor will cache it) Why is this a problem? Suppose the cache that read the block wats to write to it at some poit It eeds to broadcast ivalidate eve though it has the oly cached copy! If the cache kew it had the oly cached copy i the system, it could have writte to the block without otifyig ay other cache à saves uecessary broadcasts of ivalidatios 18

19 The Solutio: MESI Idea: Add aother state idicatig that this is the oly cached copy ad it is clea. Exclusive state Block is placed ito the exclusive state if, durig BusRd, o other cache had it Reuires a shared sigal to detect if other caches have a copy of the block; caches assert the sigal if they have a copy Silet trasitio ExclusiveàModified is possible o write! MESI is also called the Illiois protocol Papamarcos ad Patel, A low-overhead coherece solutio for multiprocessors with private cache memories, ISCA

20 MESI State Machie PrRd ad cache miss: depedig o if other caches have a copy, trasitio from I to S or E E to M occurs if PrWr is observed E to S occurs if BusRd is observed E to I occurs if BusRdX is observed [Culler, David 97] 20

21 Eve More Sophisticated Cache Coherece Protocols? The protocol ca be optimized with more states ad predictio mechaisms to + Reduce uecessary ivalidates ad trasfers of blocks However, more states ad optimizatios -- Are more difficult to desig ad verify (lead to more cases to take care of, race coditios) -- Provide dimiishig returs 21

22 False Sharig P1 ld word0 st word0 ld word0 st word0 Cache block/lie: P2 ld word3 st word3 ld word3 st word3 word0 word1 word2 word3 22

23 Quick Tip to Avoid False Sharig DO Map variables writte by differet processors o differet cache blocks Group variables writte by the same processor ito the same cache block DON T Group variables writte by differet processors ito the same cache block 23

24 Which Is Better? it sum [NUM_PROCS]; it product [NUM_PROCS]; sum[mynum]++; product[mynum] *=2; typedef struct { it sum; it product; } Proc; Proc x[num_procs]; x[mynum].sum++; x[mynum].product*=2; 24

25 Takeaway Cache coherece is critical for esurig correctess Software-maaged cache coherece very difficult Hardware coherece protocols to help programmers write correct ad high-performace programs Soopig cache protocols VI MSI MESI (lab5) MOESI (commo i practice) Directory-based cache coherece More scalable 25

26 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 26

27 Memory Cosistecy 27

28 Motivatioal Example Dekker s algorithm for critical sectios [Adve WRL Research Report 95] Ca the two processors be i the critical sectio at the same time give that they both obey the vo Neuma model? 28

29 Motivatioal Example Ituitio: Assume P1 is i critical sectio, which meas Flag2 must be 0, which meas P2 caot have executed Flag2 = 1, which meas meas P2 caot be i the critical sectio. [Adve WRL Research Report 95] 29

30 Both Processors i Critical Sectio! Cosider a store buffer (aka. write buffer) Remember this from OoO? Ca also be used with i-order executio! load processor store (ad load bypassig) cache 30

31 Both Processors i Critical Sectio! Cycle 1 (A): value writte i P1 s store buffer, P1 thiks A is executed, but memory is ot updated util cycle 51 Cycle 1 (X): value writte i P2 s store buffer, P2 thiks X is executed, but memory is ot updated util cycle 52 Cycle 2 (B): P1 still sees 0 i Flag2, so it eters critical sectio Cycle 2 (Y): P2 still sees 0 i Flag1, so it eters critical sectio A B X Y [Adve WRL Research Report 95] 31

32 Both Processors i Critical Sectio! What happeed? P1 s view of memory operatios P2 s view of memory operatios A (cycle 1) X (cycle 1) B (cycle 2) Y (cycle 2) X (cycle 51) A (cycle 52) A appeared to happe before X X appeared to happe before A 32

33 The Problem The two processors did NOT see the same order of operatios to memory The happeed before relatioship betwee multiple updates to memory was icosistet betwee the two processors poits of view As a result, each processor thought the other was ot i the critical sectio 33

34 How Ca We Solve The Problem? Idea: Seuetial cosistecy I. All processors see the same order of operatios to memory i.e., all memory operatios happe i a order (called the global total order) that is cosistet across all processors II. Withi this global order, each processor s operatios appear i seuetial order with respect to its ow operatios. Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Trasactios o Computers,

35 Aother Way of Iterpretig SC The whole system (all processors ad memory) sees the same order of all fours memory operatio combiatios performed by ay processor Load à load Load à store Store à store Store à load 35

36 Seuetially Cosistet Operatio Orders Potetial correct global orders (all are correct): A B X Y A X B Y A X Y B X A B Y A X X A Y B B Y X Y A B [Adve WRL Research Report 95] Which order (iterleavig) is observed depeds o implemetatio ad dyamic latecies 36

37 Issues with Seuetial Cosistecy (SC)? Nice abstractio for programmig, ituitive Two issues Orderig reuiremets too coservative Limits the aggressiveess of performace ehacemet techiues E.g., ca t use a store buffer 37

38 Total Store Order (TSO) Remember, for seuetial cosistecy, The whole system (all processors ad memory) sees the same order of all fours memory operatio combiatios performed by ay processor Load à load, load à store, store à store, store à load TSO relaxes the store à load orderig reuiremet Major beefit: a FIFO-based store buffer ca be used Moder ISAs that uses the TSO model SPARC Also similar to X86 38

39 Total Store Order (TSO) Example TSO allows both P1 ad P2 to be i the critical sectio P2 is allowed to see B (load) before A (store) P1 is allowed to see Y (load) before X (store) How should a programmer fix Dekker s algorithm? A B X Y [Adve WRL Research Report 95] 39

40 Memory Fece All memory operatios before a fece must complete ad visible to other processors before fece is executed All memory operatios after the fece must wait for the fece to complete Feces complete i program order A B X Y [Adve WRL Research Report 95] 40

41 The Geeral Problem of Memory Orderig A cotract betwee software ad hardware specified by the ISA ISA specifies what programmers ca assume about memory orderig, e.g., whether seuetial cosistecy (or aother memory cosistecy model) is provided Preservig a ituitive model (e.g., seuetial cosistecy) simplifies programmer s life But makes the hardware desiger s life difficult (limits performace optimizatios that ca be used) Aother example of the programmer-microarchitect tradeoff 41

42 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 42

43 Sychroizatio 43

44 Race Coditio Upredictable results, called race coditios, ca happe if we do t cotrol access to shared variables A cocurrecy problem; ca occur i sigle processors also E.g., x++ from multiple threads assume x is iitialized to 0. What is the value of x after the followig executio? CPU 1 CPU2 Ld r1, x Ld r1, x Add r1, r1, 1 Add r1, r1, 1 St r1, x St r1, x 44

45 Coordiatig Access to Shared Data Locks: simple primitive to esure updates to sigle variables occur withi a critical sectio May variatios (spilocks, semaphores, ) CPU 1 LOCK x Ld r1, x Add r1, r1, 1 St r1, x UNLOCK x CPU2 LOCK x wait wait lock acuired Ld r1, x Add r1, r1, 1 45

46 Locks / Critical Sectios Eforce mutually exclusive access to shared data Oly oe thread ca be executig it at a time Coteded critical sectios make threads wait à threads causig serializatio ca be o the critical path Each thread: loop { Compute lock(a) Update shared data ulock(a) } N C 46

47 How NOT To Implemet Locks Lock: while (lock_var == 1); lock_var = 1; Ulock: lock_var = 0; What s the problem? Testig if lock_var is 1 ad settig it to 1 are ot atomic i.e., aother processor ca set lock_var to 1 i betwee à Multiple processors acuire the lock! 47

48 Atomic Read & Write Istructios Aka. read-modify-write Specify a memory locatio ad a register I. Value i mem locatio read ito a register II. Aother value stored ito locatio May variats based o what values are allowed i II Simple example: test&set Read memory locatio ito specified register Store costat 1 ito locatio 48

49 Usig Test&Set to Implemet a Lock Iitialize locatio to 0 lock: t&s register, locatio //atomic read-modify-write bz lock //if ot 0, try agai ret //locked; value i locatio is 1 ulock: st locatio, #0 ret //write 0 to locatio 49

50 May Others Other read-modify-write primitives Swap Compare&swap More facy implemetatios to avoid spiig, reduce memory traffic, promote fairess, etc. All details are defied i ISA 50

51 Course Summary ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig: basic, depedecy hadlig, brach predictio Advaced uarch: OOO, SIMD, VLIW, superscalar Caches (advaced) Virtual memory DRAM Multi-core ALL DONE! 51

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig

More information

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 Lecture 13 ulti-core Processors Dr. Hayde Kwok-Hay o 100,000 10,000 Departmet of Electrical ad Electroic Egieerig 1 Performace (vs. VAX-11/780) Ed of a Era 1000 100 10 AX-11/780,

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Threads and Concurrency in Java: Part 2

Threads and Concurrency in Java: Part 2 Threads ad Cocurrecy i Java: Part 2 1 Waitig Sychroized methods itroduce oe kid of coordiatio betwee threads. Sometimes we eed a thread to wait util a specific coditio has arise. 2003--09 T. S. Norvell

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

TRANSACTION MANAGEMENT [CH 16]

TRANSACTION MANAGEMENT [CH 16] Sprig 2017 TRANSACTION MANAGEMENT [CH 16] 4/25/17 CS 564: Database Maagemet Systems; (c) Jigesh M. Patel, 2013 1 Trasactio Maagemet Read (A); Check (A > $25); Pay ($25); A = A 25; Write (A); Yes You Read

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Computer Architecture ELEC2401 & ELEC3441 Lecture 15 ultithreadig & ulti-core Processors Dr. Hayde Kwok-Hay So 100,000 10,000 Departmet of Electrical ad Electroic Egieerig 1 Performace (vs. VAX-11/780)

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University 18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 21 Cocurrecy Cotrol Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Cocurrecy cotrol protocols Set of rules to guaratee

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

Lecture-22 (Cache Coherence Protocols) CS422-Spring

Lecture-22 (Cache Coherence Protocols) CS422-Spring Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1 Reliable Trasmissio Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Reliable Trasmissio Hello! My computer s ame is Alice. Alice Bob Hello! Alice. Sprig 2018 CS 438 Staff - Uiversity of Illiois 2 Reliable

More information

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Definitions. Error. A wrong decision made during software development

Definitions. Error. A wrong decision made during software development Debuggig Defiitios Error A wrog decisio made durig software developmet Defiitios 2 Error A wrog decisio made durig software developmet Defect bug sometimes meas this The term Fault is also used Property

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012) Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by

More information

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It

More information

Processor Architecture

Processor Architecture Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their

More information

A collection of open-sourced RISC-V processors

A collection of open-sourced RISC-V processors Riscy Processors A collectio of ope-sourced RISC-V processors Ady Wright, Sizhuo Zhag, Thomas Bourgeat, Murali Vijayaraghava, Jamey Hicks, Arvid Computatio Structures Group, CSAIL, MIT 4 th RISC-V Workshop

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

A Basic Snooping-Based Multi-Processor Implementation

A Basic Snooping-Based Multi-Processor Implementation Lecture 11: A Basic Snooping-Based Multi-Processor Implementation Parallel Computer Architecture and Programming Tsinghua has its own ice cream! Wow! CMU / 清华 大学, Summer 2017 Review: MSI state transition

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

CS 11 C track: lecture 1

CS 11 C track: lecture 1 CS 11 C track: lecture 1 Prelimiaries Need a CMS cluster accout http://acctreq.cms.caltech.edu/cgi-bi/request.cgi Need to kow UNIX IMSS tutorial liked from track home page Track home page: http://courses.cms.caltech.edu/courses/cs11/material

More information

Bluespec-3: Modules & Interfaces. Bluespec: State and Rules organized into modules

Bluespec-3: Modules & Interfaces. Bluespec: State and Rules organized into modules Bluespec-3: Modules & Iterfaces Arvid Computer Sciece & Artificial Itelligece Lab Massachusetts Istitute of Techology Based o material prepared by Bluespec Ic, Jauary 2005 February 28, 2005 L09-1 Bluespec:

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Advanced OpenMP. Lecture 3: Cache Coherency

Advanced OpenMP. Lecture 3: Cache Coherency Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable

More information

Computer Architecture

Computer Architecture Computer Architecture Overview Prof. Tie-Fu Che Dept. of Computer Sciece Natioal Chug Cheg Uiv Sprig 2002 Overview- Computer Architecture Course Focus Uderstadig the desig techiques, machie structures,

More information

Uniprocessors. HPC Prof. Robert van Engelen

Uniprocessors. HPC Prof. Robert van Engelen Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures

More information

n Haskell n Syntax n Lazy evaluation n Static typing and type inference n Algebraic data types n Pattern matching n Type classes

n Haskell n Syntax n Lazy evaluation n Static typing and type inference n Algebraic data types n Pattern matching n Type classes Aoucemets Quiz 7 HW 9 is due o Friday Raibow grades HW 1-6 plus 8. Please, read our commets o 8! Exam 1-2 Quiz 1-6 Ay questios/cocers, let us kow ASAP Last Class Haskell Sytax Lazy evaluatio Static typig

More information

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth

More information

The MESI State Transition Graph

The MESI State Transition Graph Small-scale shared memory multiprocessors Semantics of the shared address space model (Ch. 5.3-5.5) Design of the M(O)ESI snoopy protocol Design of the Dragon snoopy protocol Performance issues Synchronization

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

Goals of the Lecture UML Implementation Diagrams

Goals of the Lecture UML Implementation Diagrams Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from

More information

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies

More information

Chapter 3. More Flow of Control. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 3. More Flow of Control. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 3 More Flow of Cotrol Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 3.1 Usig Boolea Expressios 3.2 Multiway Braches 3.3 More about C++ Loop Statemets 3.4 Desigig Loops Copyright

More information

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

Cluster Computing Spring 2004 Paul A. Farrell

Cluster Computing Spring 2004 Paul A. Farrell Cluster Computig Sprig 004 3/18/004 Parallel Programmig Overview Task Parallelism OS support for task parallelism Parameter Studies Domai Decompositio Sequece Matchig Work Assigmet Static schedulig Divide

More information

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache

More information

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1 Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)

More information

Snooping-Based Cache Coherence

Snooping-Based Cache Coherence Lecture 10: Snooping-Based Cache Coherence Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Elle King Ex s & Oh s (Love Stuff) Once word about my code profiling skills

More information

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017 Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff Computer rchitecture Microcomputer rchitecture ad Iterfacig Colorado School of Mies Professor William Hoff Computer Hardware Orgaizatio Processor Performs all computatios; coordiates data trasfer Iput

More information

% Sun Logo for. X3T10/95-229, Revision 0. April 18, 1998

% Sun Logo for. X3T10/95-229, Revision 0. April 18, 1998 Su Microsystems, Ic. 2550 Garcia Aveue Moutai View, CA 94045 415 960-1300 X3T10/95-229, Revisio 0 April 18, 1998 % Su Logo for Joh Lohmeyer Chairperso, X3T10 Symbios Logic Ic. 1635 Aeroplaza Drive Colorado

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

Federated Transaction Management with Snapshot Isolation

Federated Transaction Management with Snapshot Isolation Federated Trasactio Maagemet with Sapshot Isolatio Ralf Schekel, Gerhard Weikum Norbert Weißeberg Xuequ Wu Uiversity of the Saarlad Frauhofer ISST Deutsche Telekom AG email {schekel,weikum}@cs.ui-sb.de

More information