INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
|
|
- Jared Hudson
- 6 years ago
- Views:
Transcription
1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 11 Title: Multiprocessors - Classification and Shared Architectures Summary: Multiprocessor classification; MIMD architectures (shared memory and distributed memory); coherency and consistency. 2010/2011 Nuno.Roma@ist.utl.pt
2 Architectures for Embedded Computing Multiprocessors: Classification and Shared Architectures Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 46 Previous Class with Shared In the previous class... Multiple-issue processors; Superscalar processors; Very Long Instrucion Word (VLIW) processors; Code optimization for multiple-issue processors; Multi-threading. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 46
3 Road Map with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 46 Summary with Shared Today: Multiprocessor classification; MIMD architectures: Shared memory; Distributed memory: Distributed shared memory; Multi-computers; coherency and consistency. Bibliography: Computer Architecture: a Quantitative Approach, Chapter 4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 46
4 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 46 Parallel Processing Objectives: with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46
5 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Better use of the silicon space, by integrating several processors (cores) in a single chip: Chip Multiprocessador (CMP); Interconnection of several independent processors (e.g.: clusters, grids, etc.) Difficulties: Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46
6 Parallel Processing with Shared Objectives: Greater performance; Efficient use of silicon resources; Reduction of power consumption. Implementation: Better use of the silicon space, by integrating several processors (cores) in a single chip: Chip Multiprocessador (CMP); Interconnection of several independent processors (e.g.: clusters, grids, etc.) Difficulties: Parallelizing the software... Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 46 Parallel Processing Example: Homogeneous multi-core processor with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 46
7 Classification of Multi-Processor Systems Type Architecture Management Examples with Shared General Purpose Processor (GPP) Homogeneous Hardware - Intel, AMD, IBM Power, SUN etc. multi-core families Dedicated Processors / Accelerators Heterogeneous Misc. Hardware + Software - Cell (PS3) - GPUs (NVidia); - FPGA/ASIC dedicated accelerators. Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 46 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46
8 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Parallel execution in several processors in a single computer: Multiprocessors Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46
9 Parallelism Levels with Shared Simultaneous execution of several sequential instruction phases: Pipelining Parallel execution of the instructions of a given application in a single processor: Superscalar processors and VLIWs Parallel execution in several processors in a single computer: Multiprocessors Parallel execution in several computers: Clusters, Grids Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 46 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 46
10 Multiprocessor Classes SISD (Single Instruction, Single Data): uniprocessor case; with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46
11 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; MISD (Multiple Instruction, Single Data): each processor executes a different instruction, but all process the same data set: There isn t any commercial solution of this type; Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46 Multiprocessor Classes with Shared SISD (Single Instruction, Single Data): uniprocessor case; SIMD (Single Instruction, Multiple Data): the same instruction is executed in the several processors, but each processor operates an independent data set: Vectorial Architectures; MISD (Multiple Instruction, Single Data): each processor executes a different instruction, but all process the same data set: There isn t any commercial solution of this type; MIMD (Multiple Instruction, Multiple Data): each processor executes independent instructions over an independent data set. Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 46
12 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 46 with Shared More popular due to: Greater flexibility; Same components as the uni-processors. Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 46
13 with Shared More popular due to: Greater flexibility; Same components as the uni-processors. MIMD architectures can be divided into two classes: Shared (e.g.: multi-core processors); Distributed (e.g.: clusters, grids, etc.). Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 46 Shared with Shared Shared memory architecture, also known by: Uniform Access (UMA) or by: Symmetric Shared- Multiprocessors (SMP) Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 46
14 Distributed with Shared Distributed memory architecture, also known by: Non-Uniform Access (NUMA) Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 46 Shared vs Distributed with Shared In distributed architectures, most memory accesses are done in the local memory: Allows greater memory access bandwidths; Reduction of the access time. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 46
15 Shared vs Distributed with Shared In distributed architectures, most memory accesses are done in the local memory: Allows greater memory access bandwidths; Reduction of the access time. However, Communication between processors is more complex; Increased access time to the data stored in the other processors local memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 46 with Shared with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 46
16 Shared with Shared Uniform Access (UMA) or Symmetric Shared- Multiprocessors (SMP) Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 46 MIMD Processing with Shared Example: Homogeneous multi-core processor with Shared sharing: Level 1 Caches (L1) - Private; Level 2 Caches (L2): Private (e.g.: AMD); Shared (e.g.: Intel); Level 3 Cache (L3) - Shared; Main memory - Shared. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 46
17 Coherency with Shared Example, considering a write-through cache: Time Event Cache Cache up A up B Address X up A reads M[X] up B reads M[X] up A 0 M[X] In multi-processors, the migration and the replication of data are normal and expected events. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 46 Coherency with Shared A memory system is said to be coherent when the read operation from a given memory position returns the most recent value that was written into that memory position. Coherency: Defines which values can be returned in a read; Read and write access behavior to a certain memory position by a given processor. Consistency: Defines when a given written value is returned by a subsequent read; Read and write access behavior to a certain memory position by several different processors (synchronization). Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 46
18 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; One read of M[X] by P i, after a write on M[X] by P j, always returns the value written by P j, if the read and write are sufficiently separated in time and no other writes to M[X] occur between the two accesses. Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46
19 Coherency with Shared A given memory system is said to be coherent if: One read of M[X] by P, after the write on M[X] by P, always returns the value that was written by P, provided that no more writes have been done by other processors between the write and read operations; One read of M[X] by P i, after a write on M[X] by P j, always returns the value written by P j, if the read and write are sufficiently separated in time and no other writes to M[X] occur between the two accesses. Writes to the same location are serialized; two writes to the same location by any two processors are seen in the same order by all processors. Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 46 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46
20 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). What happens if a given processor is allowed to proceed while the write operation (slower) is taking place (e.g.: by using write-buffers)? It is possible that both P1 and P2 processors do not have access to the most recent values of B and A before the evaluation of the test condition. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46 Consistency with Shared The consistency model of a memory system defines when a change in a memory position will be seen by all processors. P1: A = 0;. A = 1; L1: if(b == 0). P2: B = 0;. B = 1; L2: if(a == 0). What happens if a given processor is allowed to proceed while the write operation (slower) is taking place (e.g.: by using write-buffers)? It is possible that both P1 and P2 processors do not have access to the most recent values of B and A before the evaluation of the test condition. Sequential consistency: the program only proceeds after all processors have been informed about the write operation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 46
21 Coherency Protocols with Shared The coherency protocols keep and check the status of the shared memory blocks: Snooping protocols: Each cache has a copy of the shared block s data and of the corresponding sharing status: there isn t any centralized status. Each cache controller listen to the memory bus, to determine whether or not it has a copy of the block that is being requested on the bus. Write-invalidate protocols; Write-update or broadcast protocols; Directory based protocols: the status of each shared block is kept in a centralized directory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 46 Snooping + Write-Invalidate Protocols with Shared Example, considering a write-through cache: up Bus Cache Cache Action Action up A up B Address X 0 up A reads M[X] Miss in X 0 0 up B reads M[X] Miss in X up A 1 M[X] Invalidation of X 1 1 up B reads M[X] Miss in X Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 46
22 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; The access to the memory bus imposes a natural serialization of the simultaneous write operations; Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46
23 Snooping + Write-Invalidate Protocols with Shared With write-back caches, snooping also has to be used in memory reads, since the cache that holds the most recent data of the block has to transfer it into the bus; The access to the memory bus imposes a natural serialization of the simultaneous write operations; The invalidation can be optimized by using an extra bit in the cache that indicates if that block s data is being shared or not: Valid Shared Meaning 0 - Invalid: the most recent value of that block is not present 1 1 Shared: that block is currently stored in several caches 1 0 Exclusive: currently, that block is only stored in this cache Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 46 Snooping + Write-Update Protocols with Shared Also known as Broadcast Protocol: up Bus Cache Cache Action Action up A up B Address X 0 up A reads M[X] Miss in X 0 0 up B reads M[X] Miss in X up A 1 M[X] Broadcast of X up B reads M[X] Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 46
24 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46
25 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. The delay between a write and a subsequent read (by other processor) is smaller with the broadcast protocol. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46 Comparison of Protocols with Shared Multiple writes to a given address cause: Broadcast protocol: multiple broadcasts; Write-invalidate snooping protocol: only one invalidation. Each write to a given shared block causes: Broadcast protocol: one broadcast; Write-invalidate snooping protocol: only one invalidation, corresponding to the first word that is written in that block. The delay between a write and a subsequent read (by other processor) is smaller with the broadcast protocol. Invalidation protocols are by far the most used, since they require a much smaller bandwidth in the memory bus. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 46
26 Directory Based Protocols with Shared In a directory based protocol the status of each block is kept in a centralized directory. Operations (just as before): Handle read misses; Handle writes to shared blocks; (Write misses correspond to these two, in sequence). Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 46 Directory Based Protocols with Shared Block status definition: Uncached: No processor has a copy of the cache block; Shared: One or more processors have the block cached, and the value in memory is up to date (as well as in all the caches); Exclusive: Exactly one processor has a copy of the cache block, and it has written the block, so the memory copy is out of date. The processor is called the owner of the block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 46
27 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Greater miss-rate; Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46
28 Performance of UMA Architectures with Shared In multi-processors with shared central memory, the contention to access the memory bus reduces the performance of each processor. In systems with write-invalidate snooping protocols: Increase of the number of invalidated cache positions; Greater miss-rate; Increase of the number of accesses to the central memory. Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 46 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46
29 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Coherency: Real: the word is really shared; False: miss due to simultaneous accesses by different processors to distinct words that belong to the same block. Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46 Performance of UMA Architectures with Shared Cache misses: Compulsory; Capacity; Conflict: Coherency: Real: the word is really shared; False: miss due to simultaneous accesses by different processors to distinct words that belong to the same block. Global performance depends on: Number of processors; Capacity of each cache; Caches block size. (to be seen in the next classes) Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 46
30 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 46 Distributed Architecture with Shared Distributed memory architecture, also known as: Non-Uniform Access (NUMA) Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 46
31 MIMD Processing Examples: Cluster & Grids with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 46 Distributed Architecture with Shared In distributed memory architectures, it is necessary to transfer the data between the several different memories. Two approaches are usually adopted to manage this transfer: Distributed Shared (DSM): The memories are physically separated, but they are logically accessed in the same addressing space. Multi-computers: Logically separated addressing spaces: each node is just like an independent computer, with its own resources, which are not acceded by the remaining processing nodes. Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 46
32 Distributed Shared (DSM) with Shared Processors share the same addressing space: A given physical address points to the same memory position in the several existing processors; accesses with load and store instructions, independently of the target memory device (either local or remote); Access time depends on the target memory device (either local or remote (NUMA)). Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 46 Multi-Computers with Shared Each processor has its own resources and addressing space, working just like an independent computer: It is not different from a cluster; Data transfer between processors requires a specific communication system to exchange message between the processors (remote procedure call, RPC). Prof. Nuno Roma ACE 2010/11 - DEI-IST 38 / 46
33 DSM vs Multi-Computers with Shared Advantages of DSM: Easiness to program and simplification of the compiler; Lower communication cost when reduced data volumes are transferred; Natural use of the caches. Advantages of Multi-Computers: Simpler hardware; Explicit communication; Easiness to emulate DSM. Prof. Nuno Roma ACE 2010/11 - DEI-IST 39 / 46 Coherency in DSM with Shared Snooping protocols are not viable! Solutions: Only private data is stored in cache; Directory based protocols. Prof. Nuno Roma ACE 2010/11 - DEI-IST 40 / 46
34 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. By software, it is possible to convert shared data to private data (by copying the block from the remote memory) Simplified hardware; There is little support using current compilers Left to the programmer responsibility! Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46
35 Coherency in DSM with Shared Implications arisen by only saving private data in cache: Reduction of the cache hit-rate. By software, it is possible to convert shared data to private data (by copying the block from the remote memory) Simplified hardware; There is little support using current compilers Left to the programmer responsibility! However: Very complex implementation; Conservative approach: in case of doubt, the block is considered to be shared. Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 46 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46
36 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Alternative: Distribute the directory in order to reduce the contention in acceeding the directory: each processor keeps local information concerning the set of shared blocks that are stored in its memory; Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46 Coherency in DSM with Shared Implications arisen by adopting directory based protocols: Information about the whole set of shared blocks: where they are and if they have been modified; Alternative: Distribute the directory in order to reduce the contention in acceeding the directory: each processor keeps local information concerning the set of shared blocks that are stored in its memory; Optimization to massive parallel systems (>200): Only keep information about the blocks that are effectively under use. Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 46
37 Directory Based Protocols with Shared Block status definition: Uncached: No processor has a copy of the cache block; Shared: One or more processors have the block cached; Exclusive: Exactly one processor has a copy of the cache block, and it has written the block. Operations: Handle read misses; Handle writes to shared blocks; (Write misses correspond to these two, in sequence). Prof. Nuno Roma ACE 2010/11 - DEI-IST 43 / 46 New Problems in DSM Architectures with Shared There is no common bus: The bus cannot be used to arbitrate (serialize) the accesses; The operations are no longer atomic. The protocol is implemented with messages: All requests must have explicit answers. Prof. Nuno Roma ACE 2010/11 - DEI-IST 44 / 46
38 with Shared Prof. Nuno Roma ACE 2010/11 - DEI-IST 45 / 46 with Shared Syncronization and Multi-Processor Systems; SIMD Architectures (examples): Cell (STI - Sony, Toshiba, IBM); GPUs (NVidia, ATI). Prof. Nuno Roma ACE 2010/11 - DEI-IST 46 / 46
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 14
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 07
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationChapter-4 Multiprocessors and Thread-Level Parallelism
Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationCOSC4201 Multiprocessors
COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationModule 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:
The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationMul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014
Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More information3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4
Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationCMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ
CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationLect. 2: Types of Parallelism
Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationLecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections 4.1-4.2) 1 Taxonomy SISD: single instruction and single data stream: uniprocessor
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationDr. Joe Zhang PDC-3: Parallel Platforms
CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationCOSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University
COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationDISTRIBUTED SHARED MEMORY
DISTRIBUTED SHARED MEMORY COMP 512 Spring 2018 Slide material adapted from Distributed Systems (Couloris, et. al), and Distr Op Systems and Algs (Chow and Johnson) 1 Outline What is DSM DSM Design and
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)
1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationInterconnect Routing
Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More information