CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago
|
|
- Delphia Daniel
- 5 years ago
- Views:
Transcription
1 CMSC Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago
2 Course Evaluatio Very importat Please fill out! 2
3 Lab3 Brach Predictio Competitio 8 teams etered the competitio, extra credits give to all Evaluated based o correctess, performace gai, ad writeup uality Ross Rauber ad Oliver Tsag, 39.32% improvemet Zaye Khouja ad Aviash Rao, 32.72% improvemet Owe Frazier ad Jaseph Maues, 31.97% improvemet 3
4 Lecture Outlie Multi-core cotiued 4
5 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 5
6 Cache Coherece 6
7 The VI (Valid/Ivalid) Protocol PrRd / BusRd PrRd/-- Valid Ivalid PrWr / BusWr BusWr Write-through, owrite-allocate cache Actios of the local processor o the cache block: PrRd, PrWr, Actios o the bus to commuicate to memory ad other processors: BusRd, BusWr PrWr / BusWr ObservedEvet/Actio 7
8 A More Sophisticated Protocol: MSI Used with writeback caches Exted metadata per block to ecode three states: M(odified): cache lie is the oly cached copy ad is dirty S(hared): cache lie is potetially oe of several cached copies I(valid): cache lie is ot preset i this cache 8
9 MSI State Machie Upgrade Write-back, write-allocate cache Abbrevia -tio Actio ObservedEvet/Actio Dowgrade (bus iitiated) PrRd PrWr BusRd BusRdX Flush Processor read Processor write Bus read Bus read exclusive (read with itet to modify; must ivalidate all other cache copies) Puts dirty data o bus to update memory ad supply data to other processors 9
10 MSI Protocol Walkthrough 1. If the cache block is modified a. PrRr or PrWr: this is a cache hit. Just retur the value or update the cache value. No eed to go to memory or talk to other processors, ad the block remais modified 10
11 MSI Protocol Walkthrough 1. If the cache block is modified b. BusRd: others wish to read the block; put dirty data o bus; block is dowgraded to shared 11
12 MSI Protocol Walkthrough 1. If the cache block is modified c. BusRdX: others wish to write to the block; put dirty data o bus; block is dowgraded to ivalid 12
13 MSI Protocol Walkthrough 2. If the cache block is shared a. PrRd: cache hit; BusRd: others are just readig the data; othig to be doe 13
14 MSI Protocol Walkthrough 2. If the cache block is shared b. PrWr: we wish to write but other cores are sharig this block; so geerate a BusRdX operatio to ivalidate other copies; the block is upgraded to modified 14
15 MSI Protocol Walkthrough 2. If the cache block is shared c. BusRdX: aother core wats to write to the block, must ivalidate our copy; the block is dowgraded to ivalid 15
16 MSI Protocol Walkthrough 3. If the cache block is ivalid a. PrRd: cache miss ad we just wat to read. Geerate a BusRd operatio to get data (from memory or aother core). The block is upgraded to shared 16
17 MSI Protocol Walkthrough 3. If the cache block is ivalid b. PrWr: cache miss ad we wat to write. Geerate a BusRdX operatio to get data (from memory or aother core) ad ivalidate other copies. The block is upgraded to modified 17
18 The Problem with MSI A block is i o cache to begi with Problem: O a read, the block immediately goes to Shared state although it may be the oly copy to be cached (i.e., o other processor will cache it) Why is this a problem? Suppose the cache that read the block wats to write to it at some poit It eeds to broadcast ivalidate eve though it has the oly cached copy! If the cache kew it had the oly cached copy i the system, it could have writte to the block without otifyig ay other cache à saves uecessary broadcasts of ivalidatios 18
19 The Solutio: MESI Idea: Add aother state idicatig that this is the oly cached copy ad it is clea. Exclusive state Block is placed ito the exclusive state if, durig BusRd, o other cache had it Reuires a shared sigal to detect if other caches have a copy of the block; caches assert the sigal if they have a copy Silet trasitio ExclusiveàModified is possible o write! MESI is also called the Illiois protocol Papamarcos ad Patel, A low-overhead coherece solutio for multiprocessors with private cache memories, ISCA
20 MESI State Machie PrRd ad cache miss: depedig o if other caches have a copy, trasitio from I to S or E E to M occurs if PrWr is observed E to S occurs if BusRd is observed E to I occurs if BusRdX is observed [Culler, David 97] 20
21 Eve More Sophisticated Cache Coherece Protocols? The protocol ca be optimized with more states ad predictio mechaisms to + Reduce uecessary ivalidates ad trasfers of blocks However, more states ad optimizatios -- Are more difficult to desig ad verify (lead to more cases to take care of, race coditios) -- Provide dimiishig returs 21
22 False Sharig P1 ld word0 st word0 ld word0 st word0 Cache block/lie: P2 ld word3 st word3 ld word3 st word3 word0 word1 word2 word3 22
23 Quick Tip to Avoid False Sharig DO Map variables writte by differet processors o differet cache blocks Group variables writte by the same processor ito the same cache block DON T Group variables writte by differet processors ito the same cache block 23
24 Which Is Better? it sum [NUM_PROCS]; it product [NUM_PROCS]; sum[mynum]++; product[mynum] *=2; typedef struct { it sum; it product; } Proc; Proc x[num_procs]; x[mynum].sum++; x[mynum].product*=2; 24
25 Takeaway Cache coherece is critical for esurig correctess Software-maaged cache coherece very difficult Hardware coherece protocols to help programmers write correct ad high-performace programs Soopig cache protocols VI MSI MESI (lab5) MOESI (commo i practice) Directory-based cache coherece More scalable 25
26 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 26
27 Memory Cosistecy 27
28 Motivatioal Example Dekker s algorithm for critical sectios [Adve WRL Research Report 95] Ca the two processors be i the critical sectio at the same time give that they both obey the vo Neuma model? 28
29 Motivatioal Example Ituitio: Assume P1 is i critical sectio, which meas Flag2 must be 0, which meas P2 caot have executed Flag2 = 1, which meas meas P2 caot be i the critical sectio. [Adve WRL Research Report 95] 29
30 Both Processors i Critical Sectio! Cosider a store buffer (aka. write buffer) Remember this from OoO? Ca also be used with i-order executio! load processor store (ad load bypassig) cache 30
31 Both Processors i Critical Sectio! Cycle 1 (A): value writte i P1 s store buffer, P1 thiks A is executed, but memory is ot updated util cycle 51 Cycle 1 (X): value writte i P2 s store buffer, P2 thiks X is executed, but memory is ot updated util cycle 52 Cycle 2 (B): P1 still sees 0 i Flag2, so it eters critical sectio Cycle 2 (Y): P2 still sees 0 i Flag1, so it eters critical sectio A B X Y [Adve WRL Research Report 95] 31
32 Both Processors i Critical Sectio! What happeed? P1 s view of memory operatios P2 s view of memory operatios A (cycle 1) X (cycle 1) B (cycle 2) Y (cycle 2) X (cycle 51) A (cycle 52) A appeared to happe before X X appeared to happe before A 32
33 The Problem The two processors did NOT see the same order of operatios to memory The happeed before relatioship betwee multiple updates to memory was icosistet betwee the two processors poits of view As a result, each processor thought the other was ot i the critical sectio 33
34 How Ca We Solve The Problem? Idea: Seuetial cosistecy I. All processors see the same order of operatios to memory i.e., all memory operatios happe i a order (called the global total order) that is cosistet across all processors II. Withi this global order, each processor s operatios appear i seuetial order with respect to its ow operatios. Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Trasactios o Computers,
35 Aother Way of Iterpretig SC The whole system (all processors ad memory) sees the same order of all fours memory operatio combiatios performed by ay processor Load à load Load à store Store à store Store à load 35
36 Seuetially Cosistet Operatio Orders Potetial correct global orders (all are correct): A B X Y A X B Y A X Y B X A B Y A X X A Y B B Y X Y A B [Adve WRL Research Report 95] Which order (iterleavig) is observed depeds o implemetatio ad dyamic latecies 36
37 Issues with Seuetial Cosistecy (SC)? Nice abstractio for programmig, ituitive Two issues Orderig reuiremets too coservative Limits the aggressiveess of performace ehacemet techiues E.g., ca t use a store buffer 37
38 Total Store Order (TSO) Remember, for seuetial cosistecy, The whole system (all processors ad memory) sees the same order of all fours memory operatio combiatios performed by ay processor Load à load, load à store, store à store, store à load TSO relaxes the store à load orderig reuiremet Major beefit: a FIFO-based store buffer ca be used Moder ISAs that uses the TSO model SPARC Also similar to X86 38
39 Total Store Order (TSO) Example TSO allows both P1 ad P2 to be i the critical sectio P2 is allowed to see B (load) before A (store) P1 is allowed to see Y (load) before X (store) How should a programmer fix Dekker s algorithm? A B X Y [Adve WRL Research Report 95] 39
40 Memory Fece All memory operatios before a fece must complete ad visible to other processors before fece is executed All memory operatios after the fece must wait for the fece to complete Feces complete i program order A B X Y [Adve WRL Research Report 95] 40
41 The Geeral Problem of Memory Orderig A cotract betwee software ad hardware specified by the ISA ISA specifies what programmers ca assume about memory orderig, e.g., whether seuetial cosistecy (or aother memory cosistecy model) is provided Preservig a ituitive model (e.g., seuetial cosistecy) simplifies programmer s life But makes the hardware desiger s life difficult (limits performace optimizatios that ca be used) Aother example of the programmer-microarchitect tradeoff 41
42 Topics i Parallel Computer Architecture Cache coherece Esure correct operatio i the presece of private caches Memory cosistecy: orderig of memory operatios What should the programmer expect the hardware to provide? Shared memory sychroizatio Istructios to perform atomic operatios (e.g., for locks) 42
43 Sychroizatio 43
44 Race Coditio Upredictable results, called race coditios, ca happe if we do t cotrol access to shared variables A cocurrecy problem; ca occur i sigle processors also E.g., x++ from multiple threads assume x is iitialized to 0. What is the value of x after the followig executio? CPU 1 CPU2 Ld r1, x Ld r1, x Add r1, r1, 1 Add r1, r1, 1 St r1, x St r1, x 44
45 Coordiatig Access to Shared Data Locks: simple primitive to esure updates to sigle variables occur withi a critical sectio May variatios (spilocks, semaphores, ) CPU 1 LOCK x Ld r1, x Add r1, r1, 1 St r1, x UNLOCK x CPU2 LOCK x wait wait lock acuired Ld r1, x Add r1, r1, 1 45
46 Locks / Critical Sectios Eforce mutually exclusive access to shared data Oly oe thread ca be executig it at a time Coteded critical sectios make threads wait à threads causig serializatio ca be o the critical path Each thread: loop { Compute lock(a) Update shared data ulock(a) } N C 46
47 How NOT To Implemet Locks Lock: while (lock_var == 1); lock_var = 1; Ulock: lock_var = 0; What s the problem? Testig if lock_var is 1 ad settig it to 1 are ot atomic i.e., aother processor ca set lock_var to 1 i betwee à Multiple processors acuire the lock! 47
48 Atomic Read & Write Istructios Aka. read-modify-write Specify a memory locatio ad a register I. Value i mem locatio read ito a register II. Aother value stored ito locatio May variats based o what values are allowed i II Simple example: test&set Read memory locatio ito specified register Store costat 1 ito locatio 48
49 Usig Test&Set to Implemet a Lock Iitialize locatio to 0 lock: t&s register, locatio //atomic read-modify-write bz lock //if ot 0, try agai ret //locked; value i locatio is 1 ulock: st locatio, #0 ret //write 0 to locatio 49
50 May Others Other read-modify-write primitives Swap Compare&swap More facy implemetatios to avoid spiig, reduce memory traffic, promote fairess, etc. All details are defied i ISA 50
51 Course Summary ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig: basic, depedecy hadlig, brach predictio Advaced uarch: OOO, SIMD, VLIW, superscalar Caches (advaced) Virtual memory DRAM Multi-core ALL DONE! 51
CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationThreads and Concurrency in Java: Part 1
Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationThreads and Concurrency in Java: Part 1
Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationCMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle
More informationProgramming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen
Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig
More informationCMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award
More informationComputer Architecture ELEC3441
Computer Architecture ELEC3441 Lecture 13 ulti-core Processors Dr. Hayde Kwok-Hay o 100,000 10,000 Departmet of Electrical ad Electroic Egieerig 1 Performace (vs. VAX-11/780) Ed of a Era 1000 100 10 AX-11/780,
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must
More informationThreads and Concurrency in Java: Part 2
Threads ad Cocurrecy i Java: Part 2 1 Waitig Sychroized methods itroduce oe kid of coordiatio betwee threads. Sometimes we eed a thread to wait util a specific coditio has arise. 2003--09 T. S. Norvell
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationData diverse software fault tolerance techniques
Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationTRANSACTION MANAGEMENT [CH 16]
Sprig 2017 TRANSACTION MANAGEMENT [CH 16] 4/25/17 CS 564: Database Maagemet Systems; (c) Jigesh M. Patel, 2013 1 Trasactio Maagemet Read (A); Check (A > $25); Pay ($25); A = A 25; Write (A); Yes You Read
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead
More informationComputer Architecture ELEC2401 & ELEC3441
Computer Architecture ELEC2401 & ELEC3441 Lecture 15 ultithreadig & ulti-core Processors Dr. Hayde Kwok-Hay So 100,000 10,000 Departmet of Electrical ad Electroic Egieerig 1 Performace (vs. VAX-11/780)
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationEE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control
EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More informationCMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago
CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationBasic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.
5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 21 Cocurrecy Cotrol Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Cocurrecy cotrol protocols Set of rules to guaratee
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationStructuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software
Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued
More informationCourse Site: Copyright 2012, Elsevier Inc. All rights reserved.
Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.
More informationLecture-22 (Cache Coherence Protocols) CS422-Spring
Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationReliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1
Reliable Trasmissio Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Reliable Trasmissio Hello! My computer s ame is Alice. Alice Bob Hello! Alice. Sprig 2018 CS 438 Staff - Uiversity of Illiois 2 Reliable
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationChapter 4 The Datapath
The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that
More informationChapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings
Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The
More informationReview: The ACID properties
Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to
More informationDesign of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018
Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed
More informationSCI Reflective Memory
Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o
More informationInstruction and Data Streams
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More informationAPPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS
APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful
More informationDefinitions. Error. A wrong decision made during software development
Debuggig Defiitios Error A wrog decisio made durig software developmet Defiitios 2 Error A wrog decisio made durig software developmet Defect bug sometimes meas this The term Fault is also used Property
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms
More informationAnnouncements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components
Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,
More informationLecture 28: Data Link Layer
Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig
More informationSoftware development of components for complex signal analysis on the example of adaptive recursive estimation methods.
Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationCOSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1
COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,
More informationLecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationComputer Architecture ELEC3441
CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by
More informationLecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It
More informationProcessor Architecture
Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their
More informationA collection of open-sourced RISC-V processors
Riscy Processors A collectio of ope-sourced RISC-V processors Ady Wright, Sizhuo Zhag, Thomas Bourgeat, Murali Vijayaraghava, Jamey Hicks, Arvid Computatio Structures Group, CSAIL, MIT 4 th RISC-V Workshop
More informationEnd Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization
Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed
More informationA Basic Snooping-Based Multi-Processor Implementation
Lecture 11: A Basic Snooping-Based Multi-Processor Implementation Parallel Computer Architecture and Programming Tsinghua has its own ice cream! Wow! CMU / 清华 大学, Summer 2017 Review: MSI state transition
More informationChapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4
More informationCS 11 C track: lecture 1
CS 11 C track: lecture 1 Prelimiaries Need a CMS cluster accout http://acctreq.cms.caltech.edu/cgi-bi/request.cgi Need to kow UNIX IMSS tutorial liked from track home page Track home page: http://courses.cms.caltech.edu/courses/cs11/material
More informationBluespec-3: Modules & Interfaces. Bluespec: State and Rules organized into modules
Bluespec-3: Modules & Iterfaces Arvid Computer Sciece & Artificial Itelligece Lab Massachusetts Istitute of Techology Based o material prepared by Bluespec Ic, Jauary 2005 February 28, 2005 L09-1 Bluespec:
More informationThe University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.
Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive
More informationAdvanced OpenMP. Lecture 3: Cache Coherency
Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable
More informationComputer Architecture
Computer Architecture Overview Prof. Tie-Fu Che Dept. of Computer Sciece Natioal Chug Cheg Uiv Sprig 2002 Overview- Computer Architecture Course Focus Uderstadig the desig techiques, machie structures,
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationn Haskell n Syntax n Lazy evaluation n Static typing and type inference n Algebraic data types n Pattern matching n Type classes
Aoucemets Quiz 7 HW 9 is due o Friday Raibow grades HW 1-6 plus 8. Please, read our commets o 8! Exam 1-2 Quiz 1-6 Ay questios/cocers, let us kow ASAP Last Class Haskell Sytax Lazy evaluatio Static typig
More informationMulticore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh
Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth
More informationThe MESI State Transition Graph
Small-scale shared memory multiprocessors Semantics of the shared address space model (Ch. 5.3-5.5) Design of the M(O)ESI snoopy protocol Design of the Dragon snoopy protocol Performance issues Synchronization
More informationPage 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!
Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio
More informationGoals of the Lecture UML Implementation Diagrams
Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from
More informationUH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu
UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies
More informationChapter 3. More Flow of Control. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 3 More Flow of Cotrol Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 3.1 Usig Boolea Expressios 3.2 Multiway Braches 3.3 More about C++ Loop Statemets 3.4 Desigig Loops Copyright
More informationIntroduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition
Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by
More informationCS2410 Computer Architecture. Flynn s Taxonomy
CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)
More informationCluster Computing Spring 2004 Paul A. Farrell
Cluster Computig Sprig 004 3/18/004 Parallel Programmig Overview Task Parallelism OS support for task parallelism Parameter Studies Domai Decompositio Sequece Matchig Work Assigmet Static schedulig Divide
More informationCache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri
Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache
More informationSwitching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1
Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)
More informationSnooping-Based Cache Coherence
Lecture 10: Snooping-Based Cache Coherence Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Elle King Ex s & Oh s (Love Stuff) Once word about my code profiling skills
More informationComputer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017
Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationComputer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff
Computer rchitecture Microcomputer rchitecture ad Iterfacig Colorado School of Mies Professor William Hoff Computer Hardware Orgaizatio Processor Performs all computatios; coordiates data trasfer Iput
More information% Sun Logo for. X3T10/95-229, Revision 0. April 18, 1998
Su Microsystems, Ic. 2550 Garcia Aveue Moutai View, CA 94045 415 960-1300 X3T10/95-229, Revisio 0 April 18, 1998 % Su Logo for Joh Lohmeyer Chairperso, X3T10 Symbios Logic Ic. 1635 Aeroplaza Drive Colorado
More informationLecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein
068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig
More informationChapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig
More informationΤεχνολογία Λογισμικού
ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr
More informationFederated Transaction Management with Snapshot Isolation
Federated Trasactio Maagemet with Sapshot Isolatio Ralf Schekel, Gerhard Weikum Norbert Weißeberg Xuequ Wu Uiversity of the Saarlad Frauhofer ISST Deutsche Telekom AG email {schekel,weikum}@cs.ui-sb.de
More information