Intro to Multiprocessors

Size: px
Start display at page:

Download "Intro to Multiprocessors"

Transcription

1 The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple processors with a single shared address space Cluster multiple computers (each with their own address space) connected over a local area network (LN) functioning as a single system.1.2 pplications Needing Supercomputing Energy (plasma physics (simulating fusion reactions), geophysical (petroleum) exploration) DoE stockpile stewardship (to ensure the safety and reliability of the nation s stockpile of nuclear weapons) Earth and climate (climate and weather prediction, earthquake, tsunami prediction and mitigation of risks) Transportation (improving vehicles airflow dynamics, fuel consumption, crashworthiness, noise reduction) Bioinformatics and computational biology (genomics, protein folding, designer drugs) Societal health and safety (pollution reduction, disaster planning, terrorist action detection) Encountering mdahl s Law Speedup due to enhancement E is Exec time w/o E Speedup w/ E = Exec time w/ E Suppose that enhancement E accelerates a fraction F (F <1) of the task by a factor S (S>1) and the remainder of the task is unaffected ExTime w/ E = ExTime w/o E Speedup w/ E =.3.4

2 Encountering mdahl s Law Speedup due to enhancement E is Exec time w/o E Speedup w/ E = Exec time w/ E Suppose that enhancement E accelerates a fraction F (F <1) of the task by a factor S (S>1) and the remainder of the task is unaffected ExTime w/ E = ExTime w/o E ((1-F) + F/S) Speedup w/ E = 1 / ((1-F) + F/S) Examples: mdahl s Law Speedup w/ E = Consider an enhancement which runs 20 times faster but which is only usable 25% of the time. Speedup w/ E = What if its usable only 15% of the time? Speedup w/ E = mdahl s Law tells us that to achieve linear speedup with 100 processors, none of the original computation can be scalar! To get a speedup of 99 from 100 processors, the percentage of the original program that could be scalar would have to be 0.01% or less.5.6 Examples: mdahl s Law Speedup w/ E = 1 / ((1-F) + F/S) Consider an enhancement which runs 20 times faster but which is only usable 25% of the time. Speedup w/ E = 1/( /20) = 1.31 What if its usable only 15% of the time? Speedup w/ E = 1/( /20) = 1.17 mdahl s Law tells us that to achieve linear speedup with 100 processors, none of the original computation can be scalar! To get a speedup of 99 from 100 processors, the percentage of the original program that could be scalar would have to be 0.01% or less.7 Supercomputer Style Migration (Top500) Nov data Clusters Constellations SIMDs MPPs s Uniproc's Cluster whole computers interconnected using their I/O bus Constellation a cluster that uses an multiprocessor as the building block In the last 8 years uniprocessor and SIMDs disappeared while Clusters and Constellations grew from 3% to 80%.8

3 Multiprocessor/Clusters Key Questions Q1 How do they share data? Q2 How do they coordinate? Q3 How scalable is the architecture? How many processors can be supported? Flynn s Classification Scheme SISD single instruction, single data stream aka uniprocessor - what we have been talking about all semester SIMD single instruction, multiple data streams single control unit broadcasting operations to multiple datapaths MISD multiple instruction, single data no such machine (although some people put vector machines in this category) MIMD multiple instructions, multiple data streams aka multiprocessors (s, MPPs, clusters, NOWs) Now obsolete except for SIMD s Example SIMD Machines Single control unit Multiple datapaths (processing elements s) running in parallel Q1 s are interconnected (usually via a mesh or torus) and exchange/share data as directed by the control unit Q2 Each performs the same operation on its own local data Illiac IV DP MPP CM-2 MP-1216 ICL Maker UIUC Goodyear Thinking Machines MasPar Year # s 64 4,096 16,384 65,536 16,384 # b/ Max memory (MB) clock (MHz) System BW (MB/s) 2,560 2,560 20,480 16,384 23,

4 Multiprocessor Basic Organizations s connected by a single bus s connected by a network Communication model Physical connection Message passing address Network Bus UM # of Proc 8 to to to 64 8 to to 36 ddress ( ) Multi s Q1 Single address space shared by all the processors Q2 s coordinate/communicate through shared variables in memory (via loads and stores) Use of shared data must be coordinated via synchronization primitives (locks) UMs (uniform memory access) aka (symmetric multiprocessors) all accesses to main memory take the same amount of time no matter which processor makes the request or which location is requested s (nonuniform memory access) some main memory accesses are faster than others depending on the processor making the request and which location is requested can scale to larger sizes than UMs so are potentially higher performance N/UM Remote ccess Times (RMT) Single Bus ( ddress UM) Multi s Year Type Max Proc Interconnection Network RMT (ns) Sun Starfire ddress buses, data switch 500 Cache Cache Cache Cray 3TE HP V way 3D torus 8 x 8 crossbar Single Bus SGI Origin 3000 Compaq lphaserver GS Sun V Fat tree Switched bus Switched bus Caches are used to reduce latency and to lower bus traffic I/O HP Superdome 9000 NS Columbia Switched bus Fat tree 275??? Must provide hardware to ensure that caches and memory are consistent (cache coherency) Must provide a hardware mechanism to support process synchronization.15.16

5 Message Passing Multiprocessors Each processor has its own private address space Q1 s share data by explicitly sending and receiving information (messages) Q2 Coordination is built into message passing primitives (send and receive) Summary Flynn s classification of processors SISD, SIMD, MIMD Q1 How do processors share data? Q2 How do processors coordinate their activity? Q3 How scalable is the architecture (what is the maximum number of processors)? address multis UMs and s Bus connected (shared address UMs) multis Cache coherency hardware to ensure data consistency Synchronization primitives for synchronization Bus traffic limits scalability of architecture (< ~ 36 processors) Message passing multis Review: Where are We Now? Multiprocessor Basics Q1 How do they share data? Output Output Q2 How do they coordinate? Datapath Input Input Datapath Q3 How scalable is the architecture? How many processors? # of Proc Multiprocessor multiple processors with a single shared address space Cluster multiple computers (each with their own address space) connected over a local area network (LN) functioning as a single system Communication model Physical connection Message passing address Network Bus UM 8 to to to 64 8 to to

6 Single Bus ( ddress UM) Multi s Proc1 Proc2 Proc3 Proc4 Caches Caches Caches Caches Multiprocessor Cache Coherency Cache coherency protocols Bus snooping cache controllers monitor shared bus traffic with duplicate address tag hardware (so they don t interfere with processor s access to the cache) Single Bus I/O Proc1 Proc2 ProcN Caches are used to reduce latency and to lower bus traffic Write-back caches used to keep bus traffic at a minimum Must provide hardware to ensure that caches and memory are consistent (cache coherency) Must provide a hardware mechanism to support process synchronization.21 Snoop DCache Snoop DCache Snoop DCache Single Bus I/O.22 Bus Snooping Protocols Multiple copies are not a problem when reading must have exclusive access to write a word What happens if two processors try to write to the same shared data word in the same clock cycle? The bus arbiter decides which processor gets the bus first (and this will be the processor with the first exclusive access). Then the second processor will get exclusive access. Thus, bus arbitration forces sequential behavior. This sequential consistency is the most conservative of the memory consistency models. With it, the result of any execution is the same as if the accesses of each processor were kept in order and the accesses among different processors were interleaved. ll other processors sharing that data must be informed of writes Handling Writes Ensuring that all other processors sharing data are informed of writes can be handled two ways: 1. Write-update (write-broadcast) writing processor broadcasts new data over the bus, all copies are updated ll writes go to the bus higher bus traffic Since new values appear in caches sooner, can reduce latency 2. Write-invalidate writing processor issues invalidation signal on bus, cache snoops check to see if they have a copy of the data, if so they invalidate their cache block containing the word (this allows multiple readers but only one writer) Uses the bus only on the first write lower bus traffic, so better use of bus bandwidth.23.24

7 Write-Invalidate CC Protocol write (miss) Invalid Modified (dirty) read (hit) or write (hit or miss) read (miss) write (hit or miss) (clean) write-back caching protocol in black read (hit or miss) Write-Invalidate CC Protocol write (miss) Invalid send invalidate write-back due to read (miss) by another processor to this block Modified (dirty) read (hit) or write (hit) read (miss) receives invalidate (write by another processor to this block) send invalidate write (hit or miss) (clean) read (hit or miss) write-back caching protocol in black signals from the processor coherence additions in red signals from the bus coherence additions in blue Write-Invalidate CC Examples I = invalid (many), S = shared (many), M = modified (only one) S I S I Write-Invalidate CC Examples I = invalid (many), S = shared (many), M = modified (only one) 3. snoop sees 1. read miss for 1. write miss for read request for & lets MM 4. gets from MM 2. writes & supply & changes its state 4. change changes its state S I to S state to I S I to M 2. read request for 3. P2 sends invalidate for M I M I 3. snoop sees read 1. read miss for request Proc for, 1writes- back to MM 4. gets from MM 5. change & changes its state M I state to I to M 5. P2 sends invalidate for 2. read request for 4. change state to I M 1. write miss for 2. writes & changes its state I to M 3. P2 sends invalidate for.27.28

8 Data Miss Rates data has lower spatial and temporal locality Share data misses often dominate cache behavior even though they may only be 10% to 40% of the data accesses 64KB 2-way set associative data cache with 32B blocks Hennessy & Patterson, Computer rchitecture: Quantitative pproach Capacity miss rate Coherence miss rate FFT Capacity miss rate Coherence miss rate Ocean.29 Block Size Effects Writes to one word in a multi-word block mean either the full block is invalidated (write-invalidate) or the full block is exchanged between processors (write-update) - lternatively, could broadcast only the written word Multi-word blocks can also result in false sharing: when two processors are writing to two different variables in the same cache block With write-invalidate false sharing increases cache miss rates Proc1 Proc2 Compilers can help reduce false sharing by allocating highly correlated data to the same cache block B 4 word cache block.30 Other Coherence Protocols There are many variations on cache coherence protocols MESI Cache Coherency Protocol shared read nother write-invalidate protocol used in the Pentium 4 (and many other micro s) is MESI with four states: Modified same Exclusive only one copy of the shared data is allowed to be cached; memory has an up-to-date copy - Since there is only one copy of the block, write hits don t need to send invalidate signal multiple copies of the shared data may be cached (i.e., data permitted to be cached with more than one processor); memory has an up-to-date copy Invalid same nother processor has read/write miss for this block Invalid (not valid block) [Write back block] Modified (dirty) write or read hit write miss Invalidate for this block write [Write back block] [Write back block] shared read miss [Send invalidate signal] exclusive read miss exclusive read miss (clean) shared read Exclusive (clean) exclusive read exclusive read.31.32

9 Commercial Single Backplane Multiprocessors Compaq PL IBM R40 lphaserver SGI Pow Chal Sun 6000 Pentium Pro PowerPC lpha MIPS R10000 UltraSPRC # proc MHz BW/ system Summary Key questions Q1 - How do processors share data? Q2 - How do processors coordinate their activity? Q3 - How scalable is the architecture (what is the maximum number of processors)? Bus connected (shared address UM s( s)) multi s Cache coherency hardware to ensure data consistency Synchronization primitives for synchronization Scalability of bus connected UMs limited (< ~ 36 processors) because the three desirable bus characteristics - high bandwidth - low latency - long length are incompatible Network connected s are more scalable.33.34

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Multiprocessors 1. Outline

Multiprocessors 1. Outline Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4 Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized

More information

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Chapter 5. Thread-Level Parallelism

Chapter 5. Thread-Level Parallelism Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

COSC4201 Multiprocessors

COSC4201 Multiprocessors COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Chapter-4 Multiprocessors and Thread-Level Parallelism

Chapter-4 Multiprocessors and Thread-Level Parallelism Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

CSE 260 Introduction to Parallel Computation

CSE 260 Introduction to Parallel Computation CSE 260 Introduction to Parallel Computation Larry Carter carter@cs.ucsd.edu Office Hours: AP&M 4101 MW 10:00-11 or by appointment 9/20/2001 Topics Instances Principles Theory Hardware specific machines

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

EE382 Processor Design. Processor Issues for MP

EE382 Processor Design. Processor Issues for MP EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Scalable Multiprocessors

Scalable Multiprocessors Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Chapter 4 Data-Level Parallelism

Chapter 4 Data-Level Parallelism CS359: Computer Architecture Chapter 4 Data-Level Parallelism Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1 Outline 4.1 Introduction 4.2 Vector Architecture

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering CS650 Computer Architecture Lecture 10 Introduction to Multiprocessors and PC Clustering Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 10: Intro to Multiprocessors/Clustering

More information

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) 1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory

More information

PARALLEL COMPUTER ARCHITECTURES

PARALLEL COMPUTER ARCHITECTURES 8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different

More information

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014 Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share

More information

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 7: Support for Parallelism

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 7: Support for Parallelism 5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner Topic 7: Support for Parallelism These slides are mostly taken verbatim, or with minor changes,

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information