Cache Coherency and Interconnection Networks
|
|
- Theodore Dorsey
- 6 years ago
- Views:
Transcription
1 Cache Coherency and Interconnection Networks Cluster and Grid Computing Autumn Semester ( ) 7 th August 2006 Umang Jain Kumar Puspesh Pankaj Jajoo Amar Kumar Dani 03CS CS CS CS304
2 CACHE COHERENCY
3 The Cache Coherence Problem Caches allow greater performance by storing frequently used data in faster memory Since all processors share the same address space, more than one caches may have a copy of the same block of data If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions
4 The Cache Coherence Problem For correct execution, coherence must be enforced between the caches Primary design issues are: Coherence detection strategy (or incoherence shall we say!) coherence enforcement strategy
5 Enforcement Strategies Write Invalidate Strategy All other caches now effectively do not contain the data block X ->X X ->X X ->I X ->I
6 Enforcement Strategies Write Update Strategy All other caches are updated X ->?? X ->X X ->X X ->X
7 Cache Coherence Protocols For invalidation or updation consistency commands have to be issued to the various processor caches Two options: - Broadcast the messages for all to listen - Multicast commands only to those caches having a copy of the data block in question
8 Snoopy Cache Protocol All caches snoop on the bus for all the consistency commands MEMORY Shared Bus Cache P Cache P Cache P
9 Snoopy Cache Protocol Snooping protocols rely on a shared bus between the processors for coherence On a processor write, the write is passed through the cache to main memory on the bus and invalidation or updation commands are broadcast on the bus Cache-controller of any processor caching that address may update or invalidate its cache entry as appropriate
10 Directory Based Schemes Consistency commands issued only to those caches having a copy of the data block (book-keeping required) M DIR M DIR M DIR Cache P Cache P Cache P
11 Directory Based Schemes Central directory may contain copies of local cache directories each of them containing the state information for the different blocks Presence flag vector associated with each memory block - has one bit for each cache (eliminates requirements for an exhaustive search) Flag vector and state information along with identity of the current owner may be stored locally as well.. This will reduce the directory contention problem
12 Write Invalidate :An Example We can have states associated with each block of data in a cache Invalid Inconsistent Valid Hasn t been updated locally-consistent Reserved- updated locally once consistent only with the memory copy Dirty consistent with none, it is the only updated copy
13 Write Invalidate :An Example Each cache controller executes a simple FSM switching states on receiving messages Commands-: mem_rd, mem_wr, p_rd, p_wr, wr_inv, rd_inv Cases-: Read miss read from memory (valid) Write hit update and all others invalidated Write miss read and updated all others invalid Replacement If state is dirty then write back
14 Write Invalidate :An Example X Y Wr_inv P X X X Y I I All VALID RESERVED INVALID
15 Write Invalidate :An Example Y Z Mem_rd P2 P_wr P Z I I Z Z I DIRTY VALID INVALID INVALID
16 Write Invalidate :An Example Y Y rd_inv P3 P_wr P3 Y I I->Y I I Y->Z RESERVED INVALID DIRTY INVALID
17 Snoopy vs Directory Schemes Snoopy protocols not suited for general topologies As the number of processors increase bus-traffic begins to pose serious problems Directory based schemes much better for large number of processors They are more scalable and have much reduced bustraffic Sacrifices ease of implementation due to increased hardware complexity
18 Cache Coherent Network Architectures Hierarchical Bus/Cache Architecture M M M Cache Cache Cache P/C P/C P/C P/C P/C P/C P/C P/C P/C
19 INTERCONNECTION NETWORKS
20 Interconnection Networks Tree Mesh Hypercube Tree of Meshes Mesh of Trees Fat-Trees 2D-Torus
21 TREE TOPOLOGY
22 Trees A general Purpose Topology Advantage :-. Easy to Implement 2. For any irregular topology, easy to define Tree that spans the whole Graph Disadvantage :-. Root and the nodes close to it become a Bottleneck.
23 Trees (contd.) Binary Tree Networks (a) Static (b) Dynamic
24 Trees (contd.) Diameter for static trees, for dynamic trees, Bisection Width Clearly, it is equal to. d = 2log((p+)/2) d = 2logp where p = total number of nodes Degree Degree is, 2 or 3. (for binary trees)
25 Trees (contd.) Parallel Algorithm for Matrix-Vector Multiplication a a 2 a 3 a 4 a 2 a 22 a 23 a 24 a 3 a 32 a 33 a 34 u u 2 u 3 u 4 v v 2 v 3 Or, we can write it as, v i = n Σ a ij x u j, <= i <=m j=
26 Trees (contd.) v P 7 v 2 v 3 P 5 P 6 P P P 3 P 2 4 u u 2 u 3 u 4 a a 2 a 3 a 4 a 2 a 22 a 23 a 24 a 3 a 32 a 33 a 34
27 Trees (contd.) Example A = 2 and U =
28 Trees (contd.) procedure TREE_MULTIPLICATION (A, U, V) do steps and 2 in parallel () for i = to n do in parallel // for leaf nodes for j = to m do (.) compute u i x a ji (.2) send result to parent end for end for (2) for i = n+ to 2n- do in parallel // for intermediate nodes while P i receives two inputs do (2.) compute sum of both inputs (2.2) if i < 2n- then send result to parent else produce result as output end if end while end for.
29 Trees (contd.) Analysis :. It takes log n steps for v to emerge from root after first row of A has entered at leaves. 2. After m- steps, v m emerges from the root. Hence, TREE_MULTIPLICATION takes (m + log n) steps. Cost is O(n 2 ) when m < n
30 Trees (contd.) Due to Heavy Traffic trough the root node, the links near the root become Bottleneck. So, some modifications are made in the standard Tree Networks : Fat-Tree
31 MESH TOPOLOGY
32 Mesh Array ( D Mesh) Ring ( D Torus) 2-D Mesh 2-D Torus
33 Mesh (Contd.) Properties : No. of Processors = k*k Distance : to 2*k-2 Diameter : 2*k 2 Degree : 2 to 4 Bisection Width : If k is even : k A two-dimensional Mesh If k is odd : k+
34 Torus Properties : No. of Processors = k*k Distance : to k Diameter : If k is even : k If k is odd : k+ Degree : 4 Bisection Width : 2*k or 2*k+2 A two-dimensional Torus
35 Mesh (Contd.) Matrix Multiplication :
36 Mesh (Contd.) A = B = A X B =
37 Mesh (Contd.)
38 Mesh (Contd.)
39 Mesh (Contd.)
40 Mesh (Contd.)
41 Mesh (Contd.)
42 Mesh (Contd.)
43 Mesh (Contd.)
44 Mesh (Contd.)
45 Mesh (Contd.) Matrix Multiplication Procedure matrix_multiplication for each processor P ij in parallel do c:= 0 end for repeat 3*N /2-2 times for each processor P ij in parallel receive a from top receive b from left do c := c + a*b send a down send b to right end for end repeat
46 Mesh (Contd.) Analysis : For an m*m matrix,total computation time required is 3m-2 steps. Each step takes constant time for multiplication and transfer operation. Thus time complexity is O(m). Cost = O(m 3 )
47 Mesh (Contd.) Advantages :-. There are multiple paths between any two nodes, so the network is tolerant of failure of specific node. 2. The topology supports many simultaneous messages due to multiplicity of paths. 3. The growth complexity is 2*N /2 + and there is no need of change of hardware of existing nodes. 4. Advantageous for problems involving calculations in n-dimension, for example, image processing, finite element analysis, etc.
48 Mesh (Contd.) Disadvantages :-. Large diameter, 2*N / It is a non-uniform topology i.e. there is a range of degrees, thus the complexity of routing algorithm increases.
49 Combination Mesh of Trees Tree of Meshes (Quite similar to Fat-Trees)
50 HYPERCUBE TOPOLOGY
51 Construction 0 0 D 2D D 4D
52 Routing Algorithm Each node is given a node ID An N-dimensional cube will have N-bit node IDs Sending a message from node A to node B can be done in at most n cycles On cycle i the node holding the message compares bit i of its own ID with that of destination ID If the bit matches the node holds the message If the bit does not match it forwards the message along dimension i
53 Properties Advantage:-. For a hypercube with 2 d nodes number of steps to send message to any node is at max D 2. Hypercube topology is highly scalable and node symmetric Disadvantage:-. Difficult to implement 2. Cannot be scaled up to include arbitrary number of computers
54 Metrics Diameter = ln(n), N = 2 n, No of nodes Bisection Width = N/2 Cost = No of links = N/2 ln(n) Degree = ln(n)
55 Topological Properties useful for parallel algorithms Recursive structure of hypercube make them ideal for recursive and divide and conquer type problems Existence of multiple node-disjoint and edgedisjoint paths between many pairs of nodes in a hypercube
56 Algorithm for Hypercube topology Matrix Multiplication : To multiply two m*m matrices, p = m 3 = 2 q processors are required Each processor has 3 registers : R (a),r (b) and R (c) Each processor is labeled by 3 indices i, j, k where each index is a q/3 bit binary number Initially, Processor (0, j, k) holds A (j,k) and B (j, k) in its R (a) and R (b) registers. At the end of the computation, Register Rc of processor (0,j,k) will hold element C (j, k) of the product matrix C.
57 Example: A = 2 B = C (Expected) = A*B =
58 Example : ,5 2,6 3,7 4,8 00 0,5 2,6 3,7 4,8 2,5 2,6 2,7 2,8,5,6,5,6 4,7 4,8 4,7 4,8 3,7 3,8 3,5 3,6
59 Example (contd..) R(C) = R(A)*R(B)
60 Algorithm:. for l = q/3 downto 0, Processor x = ijk, 0 i, j, k < m { if bit l of i is { R A [x] : = R A [N l+2q/3 (x)] R B [x] : = R B [N l+2q/3 (x)] } } 2. for l = q/3 downto 0, Processor x = ijk, 0 i, j, k < m { if bit l of i and k are different { R A [x] : = R A [N l (x)] } }
61 Algorithm (contd..) 3. for l = q/3 downto 0, Processor x = ijk, 0 i, j, k < m } { if bit l of i and j are different { R B [x] : = R B [N l+q/3 (x)] } 4. Processors x, 0 x < p, do R( C ) := R( A ) R( B ) { p = m ³ parallel multiplications in one step } 5. for l = q/3 downto 0, Processor x = ijk, 0 i, j, k < m { } if bit l of i is 0 { Rc [x] : = Rc[N l+q/3 (x)] + Rc [x] }
62 Algorithm Complexity Analysis : For an m*m matrix,total computation time required is 4(q/3 ) steps, where q = ln(p) = 3ln(m) Each step takes constant time for multiplication and transfer operation. Thus time complexity is O(q) or O(ln(m)) Cost = O(m 3 ln(m)
63 Comparison of properties of Tree, Mesh and Hypercube Topologies Property Tree Mesh Hypercube Diameter 2log((p+)/2) or 2log(p) 2*p /2-2 ln(p) Bisection Width p /2 (p=even) p /2 +(p=odd) p/2 Degree,2 or 3 2,3 or 4 ln(p)
64 Comparison of Algorithm complexity for matrix multiplication Number of Processors : Time Complexity: Cost : Mesh : m 2 Hypercube: m 3 Mesh : O(m) Hypercube: O(ln(m)) Mesh : O(m 3 ) Hypercube: O(m 3 )
65 Thank You!
Parallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationPhysical Organization of Parallel Platforms. Alexandre David
Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:
More informationInterconnection networks
Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationLecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues
Lecture 8: Directory-Based Cache Coherence Topics: scalable multiprocessor organizations, directory protocol design issues 1 Scalable Multiprocessors P1 P2 Pn C1 C2 Cn 1 CA1 2 CA2 n CAn Scalable interconnection
More informationPARALLEL MEMORY ARCHITECTURE
PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last
More informationScalable Cache Coherence
arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationECSE 425 Lecture 30: Directory Coherence
ECSE 425 Lecture 30: Directory Coherence H&P Chapter 4 Last Time Snoopy Coherence Symmetric SMP Performance 2 Today Directory- based Coherence 3 A Scalable Approach: Directories One directory entry for
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationCSC630/CSC730: Parallel Computing
CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationScalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationSHARED MEMORY VS DISTRIBUTED MEMORY
OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationData Communication and Parallel Computing on Twisted Hypercubes
Data Communication and Parallel Computing on Twisted Hypercubes E. Abuelrub, Department of Computer Science, Zarqa Private University, Jordan Abstract- Massively parallel distributed-memory architectures
More informationCache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols
Cache Coherence Todd C. Mowry CS 740 November 10, 1998 Topics The Cache Coherence roblem Snoopy rotocols Directory rotocols The Cache Coherence roblem Caches are critical to modern high-speed processors
More informationCS 614 COMPUTER ARCHITECTURE II FALL 2005
CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 23, 2005 HOMEWORK IV READ : i) Related portions of Chapters : 3, 10, 15, 17 and 18 of the Sima book and ii) Chapter 8 of the Hennessy book. ASSIGNMENT:
More informationScalable Cache Coherent Systems
NUM SS Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationParallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.
Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationCS 6143 COMPUTER ARCHITECTURE II SPRING 2014
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 DUE : April 9, 2014 HOMEWORK IV READ : - Related portions of Chapter 5 and Appendces F and I of the Hennessy book - Related portions of Chapter 1, 4 and 6 of
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationCache Coherence: Part II Scalable Approaches
ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor
More informationEffect of memory latency
CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationCS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control
CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationCache Coherence (II) Instructor: Josep Torrellas CS533. Copyright Josep Torrellas
Cache Coherence (II) Instructor: Josep Torrellas CS533 Copyright Josep Torrellas 2003 1 Sparse Directories Since total # of cache blocks in machine is much less than total # of memory blocks, most directory
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationParallel Programming Platforms
arallel rogramming latforms Ananth Grama Computing Research Institute and Department of Computer Sciences, urdue University ayg@cspurdueedu http://wwwcspurdueedu/people/ayg Reference: Introduction to arallel
More informationPortland State University ECE 588/688. Directory-Based Cache Coherence Protocols
Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All
More informationBasic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast
More informationModel Questions and Answers on
BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Model Questions and Answers on PARALLEL COMPUTING Prepared by, Dr. Subhendu Kumar Rath, BPUT, Odisha. Model Questions and Answers Subject Parallel Computing
More informationInterconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.
Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationParallel Architecture, Software And Performance
Parallel Architecture, Software And Performance UCSB CS240A, T. Yang, 2016 Roadmap Parallel architectures for high performance computing Shared memory architecture with cache coherence Performance evaluation
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationScalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:
Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication assist
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationMultiprocessor Interconnection Networks- Part Three
Babylon University College of Information Technology Software Department Multiprocessor Interconnection Networks- Part Three By The k-ary n-cube Networks The k-ary n-cube network is a radix k cube with
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches Overview ost cache protocols are more complicated than two state Snooping not effective for network-based systems Consider three
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationStatic Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.
Advanced Computer Architecture (0630561) Lecture 17 Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. INs Taxonomy: An IN could be either static or dynamic. Connections in a
More informationChapter 2: Parallel Programming Platforms
Chapter 2: Parallel Programming Platforms Introduction to Parallel Computing, Second Edition By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Contents Implicit Parallelism: Trends in Microprocessor
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationDr e v prasad Dt
Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationDirectory Implementation. A High-end MP
ectory Implementation Distributed memory each processor (or cluster of processors) has its own memory processor-memory pairs are connected via a multi-path interconnection network snooping with broadcasting
More informationNetwork Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami
Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationCache Coherence in Scalable Machines
ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationScientific Applications. Chao Sun
Large Scale Multiprocessors And Scientific Applications Zhou Li Chao Sun Contents Introduction Interprocessor Communication: The Critical Performance Issue Characteristics of Scientific Applications Synchronization:
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationCS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationCSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing
Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed
More informationLecture 4: Principles of Parallel Algorithm Design (part 4)
Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction
More informationLecture 3: Topology - II
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and
More informationParallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence
Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly
More information