INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Size: px
Start display at page:

Download "INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing"

Transcription

1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 11 Title: Multiprocessors - Classification and Shared Architectures Summary: Multiprocessor classification; MIMD architectures (shared memory and distributed memory); coherency and consistency. 2010/2011 Nuno.Roma@ist.utl.pt

2 Architectures for Embedded Computing Multiprocessors: Classification and Shared Architectures Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 46 Previous Class with Shared In the previous class... Multiple-issue processors; Superscalar processors; Very Long Instrucion Word (VLIW) processors; Code optimization for multiple-issue processors; Multi-threading. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 46

3 Road Map with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 46 Summary with Shared Today: Multiprocessor classification; MIMD architectures: Shared memory; Distributed memory: Distributed shared memory; Multi-computers; coherency and consistency. Bibliography: Computer Architecture: a Quantitative Approach, Chapter 4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 46

4 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 46 Parallel Processing Objectives: with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46

5 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Better use of the silicon space, by integrating several processors (cores) in a single chip: Chip Multiprocessador (CMP); Interconnection of several independent processors (e.g.: clusters, grids, etc.) Difficulties: Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46

6 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Better use of the silicon space, by integrating several processors (cores) in a single chip: Chip Multiprocessador (CMP); Interconnection of several independent processors (e.g.: clusters, grids, etc.) Difficulties: Parallelizing the software... Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46 Parallel Processing Example: Homogeneous multi-core processor with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 46

7 Classification of Multi-Processor Systems Type Architecture Management Examples with Shared General Purpose Processor (GPP) Homogeneous Hardware - Intel, AMD, IBM Power, SUN etc. multi-core families Dedicated Processors / Accelerators Heterogeneous Misc. Hardware + Software - Cell (PS3) - GPUs (NVidia); - FPGA/ASIC dedicated accelerators. Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 46 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46

8 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Parallel execution in several processors in a single computer: Multiprocessors Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46

9 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Parallel execution in several processors in a single computer: Multiprocessors Parallel execution in several computers: Clusters, Grids Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 46

10 Multiprocessor Classes SISD (Single Instruction, Single Data): uniprocessor case; with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46

11 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; MISD (Multiple Instruction, Single Data): each processor executes a different instruction, but all process the same data set: There isn t any commercial solution of this type; Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; MISD (Multiple Instruction, Single Data): each processor executes a different instruction, but all process the same data set: There isn t any commercial solution of this type; MIMD (Multiple Instruction, Multiple Data): each processor executes independent instructions over an independent data set. Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46

12 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 46 with Shared More popular due to: Greater flexibility; Same components as the uni-processors. Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 46

13 with Shared More popular due to: Greater flexibility; Same components as the uni-processors. MIMD architectures can be divided into two classes: Shared (e.g.: multi-core processors); Distributed (e.g.: clusters, grids, etc.). Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 46 Shared with Shared Shared memory architecture, also known by: Uniform Access (UMA) or by: Symmetric Shared- Multiprocessors (SMP) Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 46

14 Distributed with Shared Distributed memory architecture, also known by: Non-Uniform Access (NUMA) Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 46 Shared vs Distributed with Shared In distributed architectures, most memory accesses are done in the local memory: Allows greater memory access bandwidths; Reduction of the access time. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 46

15 Shared vs Distributed with Shared In distributed architectures, most memory accesses are done in the local memory: Allows greater memory access bandwidths; Reduction of the access time. However, Communication between processors is more complex; Increased access time to the data stored in the other processors local memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 46 with Shared with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 46

16 Shared with Shared Uniform Access (UMA) or Symmetric Shared- Multiprocessors (SMP) Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 46 MIMD Processing with Shared Example: Homogeneous multi-core processor with Shared sharing: Level 1 Caches (L1) - Private; Level 2 Caches (L2): Private (e.g.: AMD); Shared (e.g.: Intel); Level 3 Cache (L3) - Shared; Main memory - Shared. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 46

17 Coherency with Shared Example, considering a write-through cache: Time Event Cache Cache up A up B Address X up A reads M[X] up B reads M[X] up A 0 M[X] In multi-processors, the migration and the replication of data are normal and expected events. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 46 Coherency with Shared A memory system is said to be coherent when the read operation from a given memory position returns the most recent value that was written into that memory position. Coherency: Defines which values can be returned in a read; Read and write access behavior to a certain memory position by a given processor. Consistency: Defines when a given written value is returned by a subsequent read; Read and write access behavior to a certain memory position by several different processors (synchronization). Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 46

18 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; One read of M[X] by P i, after a write on M[X] by P j, always returns the value written by P j, if the read and write are sufficiently separated in time and no other writes to M[X] occur between the two accesses. Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46

19 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; One read of M[X] by P i, after a write on M[X] by P j, always returns the value written by P j, if the read and write are sufficiently separated in time and no other writes to M[X] occur between the two accesses. Writes to the same location are serialized; two writes to the same location by any two processors are seen in the same order by all processors. Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46

20 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). What happens if a given processor is allowed to proceed while the write operation (slower) is taking place (e.g.: by using write-buffers)? It is possible that both P1 and P2 processors do not have access to the most recent values of B and A before the evaluation of the test condition. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). What happens if a given processor is allowed to proceed while the write operation (slower) is taking place (e.g.: by using write-buffers)? It is possible that both P1 and P2 processors do not have access to the most recent values of B and A before the evaluation of the test condition. Sequential consistency: the program only proceeds after all processors have been informed about the write operation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46

21 Coherency Protocols with Shared The coherency protocols keep and check the status of the shared memory blocks: Snooping protocols: Each cache has a copy of the shared block s data and of the corresponding sharing status: there isn t any centralized status. Each cache controller listen to the memory bus, to determine whether or not it has a copy of the block that is being requested on the bus. Write-invalidate protocols; Write-update or broadcast protocols; Directory based protocols: the status of each shared block is kept in a centralized directory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 46 Snooping + Write-Invalidate Protocols with Shared Example, considering a write-through cache: up Bus Cache Cache Action Action up A up B Address X 0 up A reads M[X] Miss in X 0 0 up B reads M[X] Miss in X up A 1 M[X] Invalidation of X 1 1 up B reads M[X] Miss in X Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 46

22 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; The access to the memory bus imposes a natural serialization of the simultaneous write operations; Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46

23 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; The access to the memory bus imposes a natural serialization of the simultaneous write operations; The invalidation can be optimized by using an extra bit in the cache that indicates if that block s data is being shared or not: Valid Shared Meaning 0 - Invalid: the most recent value of that block is not present 1 1 Shared: that block is currently stored in several caches 1 0 Exclusive: currently, that block is only stored in this cache Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46 Snooping + Write-Update Protocols with Shared Also known as Broadcast Protocol: up Bus Cache Cache Action Action up A up B Address X 0 up A reads M[X] Miss in X 0 0 up B reads M[X] Miss in X up A 1 M[X] Broadcast of X up B reads M[X] Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 46

24 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46

25 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. The delay between a write and a subsequent read (by other processor) is smaller with the broadcast protocol. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. The delay between a write and a subsequent read (by other processor) is smaller with the broadcast protocol. Invalidation protocols are by far the most used, since they require a much smaller bandwidth in the memory bus. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46

26 Directory Based Protocols with Shared In a directory based protocol the status of each block is kept in a centralized directory. Operations (just as before): Handle read misses; Handle writes to shared blocks; (Write misses correspond to these two, in sequence). Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 46 Directory Based Protocols with Shared Block status definition: Uncached: No processor has a copy of the cache block; Shared: One or more processors have the block cached, and the value in memory is up to date (as well as in all the caches); Exclusive: Exactly one processor has a copy of the cache block, and it has written the block, so the memory copy is out of date. The processor is called the owner of the block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 46

27 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Greater miss-rate; Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46

28 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Greater miss-rate; Increase of the number of accesses to the central memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46

29 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Coherency: Real: the word is really shared; False: miss due to simultaneous accesses by different processors to distinct words that belong to the same block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Coherency: Real: the word is really shared; False: miss due to simultaneous accesses by different processors to distinct words that belong to the same block. Global performance depends on: Number of processors; Capacity of each cache; Caches block size. (to be seen in the next classes) Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46

30 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 46 Distributed Architecture with Shared Distributed memory architecture, also known as: Non-Uniform Access (NUMA) Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 46

31 MIMD Processing Examples: Cluster & Grids with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 46 Distributed Architecture with Shared In distributed memory architectures, it is necessary to transfer the data between the several different memories. Two approaches are usually adopted to manage this transfer: Distributed Shared (DSM): The memories are physically separated, but they are logically accessed in the same addressing space. Multi-computers: Logically separated addressing spaces: each node is just like an independent computer, with its own resources, which are not acceded by the remaining processing nodes. Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 46

32 Distributed Shared (DSM) with Shared Processors share the same addressing space: A given physical address points to the same memory position in the several existing processors; accesses with load and store instructions, independently of the target memory device (either local or remote); Access time depends on the target memory device (either local or remote (NUMA)). Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 46 Multi-Computers with Shared Each processor has its own resources and addressing space, working just like an independent computer: It is not different from a cluster; Data transfer between processors requires a specific communication system to exchange message between the processors (remote procedure call, RPC). Prof. Nuno Roma ACE 2010/11 - DEI-IST 38 / 46

33 DSM vs Multi-Computers with Shared Advantages of DSM: Easiness to program and simplification of the compiler; Lower communication cost when reduced data volumes are transferred; Natural use of the caches. Advantages of Multi-Computers: Simpler hardware; Explicit communication; Easiness to emulate DSM. Prof. Nuno Roma ACE 2010/11 - DEI-IST 39 / 46 Coherency in DSM with Shared Snooping protocols are not viable! Solutions: Only private data is stored in cache; Directory based protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 40 / 46

34 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. By software, it is possible to convert shared data to private data (by copying the block from the remote memory) Simplified hardware; There is little support using current compilers Left to the programmer responsibility! Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46

35 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. By software, it is possible to convert shared data to private data (by copying the block from the remote memory) Simplified hardware; There is little support using current compilers Left to the programmer responsibility! However: Very complex implementation; Conservative approach: in case of doubt, the block is considered to be shared. Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46

36 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Alternative: Distribute the directory in order to reduce the contention in acceeding the directory: each processor keeps local information concerning the set of shared blocks that are stored in its memory; Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Alternative: Distribute the directory in order to reduce the contention in acceeding the directory: each processor keeps local information concerning the set of shared blocks that are stored in its memory; Optimization to massive parallel systems (>200): Only keep information about the blocks that are effectively under use. Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46

37 Directory Based Protocols with Shared Block status definition: Uncached: No processor has a copy of the cache block; Shared: One or more processors have the block cached; Exclusive: Exactly one processor has a copy of the cache block, and it has written the block. Operations: Handle read misses; Handle writes to shared blocks; (Write misses correspond to these two, in sequence). Prof. Nuno Roma ACE 2010/11 - DEI-IST 43 / 46 New Problems in DSM Architectures with Shared There is no common bus: The bus cannot be used to arbitrate (serialize) the accesses; The operations are no longer atomic. The protocol is implemented with messages: All requests must have explicit answers. Prof. Nuno Roma ACE 2010/11 - DEI-IST 44 / 46

38 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 45 / 46 with Shared Syncronization and Multi-Processor Systems; SIMD Architectures (examples): Cell (STI - Sony, Toshiba, IBM); GPUs (NVidia, ATI). Prof. Nuno Roma ACE 2010/11 - DEI-IST 46 / 46

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 14

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 07

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Chapter-4 Multiprocessors and Thread-Level Parallelism

Chapter-4 Multiprocessors and Thread-Level Parallelism Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

COSC4201 Multiprocessors

COSC4201 Multiprocessors COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains: The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4 Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections ) Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections 4.1-4.2) 1 Taxonomy SISD: single instruction and single data stream: uniprocessor

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains: The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the

More information

Multiprocessors 1. Outline

Multiprocessors 1. Outline Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

DISTRIBUTED SHARED MEMORY

DISTRIBUTED SHARED MEMORY DISTRIBUTED SHARED MEMORY COMP 512 Spring 2018 Slide material adapted from Distributed Systems (Couloris, et. al), and Distr Op Systems and Algs (Chow and Johnson) 1 Outline What is DSM DSM Design and

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

Module 9: Introduction to Shared Memory Multiprocessors Lecture 16: Multiprocessor Organizations and Cache Coherence Shared Memory Multiprocessors Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) 1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Cache Coherence in Bus-Based Shared Memory Multiprocessors Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Interconnect Routing

Interconnect Routing Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information