Shared Memory. SMP Architectures and Programming
|
|
- Harold Melvin Logan
- 5 years ago
- Views:
Transcription
1 Shared Memory SMP Architectures and Programming 1
2 Why work with shared memory parallel programming? Speed Ease of use CLUMPS Good starting point 2
3 Shared Memory Processes or threads share memory No explicit communication is needed, only synchronization Easy to program 3
4 No explicit communication needed Can a process work at any time? No we still need communication Barriers, mutex and signals are much more important in SMP than in distributed memory architectures Swap Test-and-set Interrupts/systemcalls 4
5 Shared Memory Architecture UMA Uniform Memory Architecture NUMA Non Uniform Memory Architecture 5
6 SMP Architectures Shared bus Very simple Fairly Cheap Poor scalability Crossbar Switch More complex More expensive Scales better 6
7 SMP HW M M M M $ $ $ $ P P P P (a) (b) 7
8 SHMP a quick refresh Shared bus Rather simple Very cheap Only scales to a few processors Maintains the standard memory view 8
9 Shared Memoy Shared Bus... Processor Memory Bus Interrupt Controller Communications Bus Frame Buffer Interrupt Controller... I/O Interface I/O Bus Interrupt Controller I/O Interface I/O Bus 9 Interrupt Controller Memory Bus Controller Cache Controller Cache Memory Interrupt Controller Memory Bus Controller Cache Controller Cache Memory CPU CPU Interrupt Controller Memory Bus Controller Cache Controller Cache Memory CPU
10 SHMP a quick refresh Crossbar switched Rather complex Quite expensive Can scale to tens of processors Needs a relaxed memory consistency protocol 10
11 Crossbar switch CPU CPU CPU Cache CPU Cache Memory CPU Cache Cache CPU Cache Cache Cache Cache Cache Memory Cache Memory Cache Mem. Ctl. Mem. Ctl. Mem. Ctl. UPA Interface UPA Interface UPA Interface GigaPlane BUS I/O Extension I/O Extension 11
12 COMA Architectures Eliminates all main memory Cache entries are the only thing in the system Problems: Invalidating all copies on write Not eliminating the last copy 12
13 SHMP a quick refresh MP on a chip Extremely simple Extremely cheap Only very few processors per chip Allows the CPUs to work together more closely 13
14 MP on a chip (Majc) North UPA Instruction Cache Graphics Preprocessor Memory Controller PCI Instruction Cache CPU 0 Data Cache CPU 1 South UPA 14
15 SHMP a quick refresh Virtual MP on a chip Named Hyper-threading Extremely cheap only an extra register-file per VP and some control logic Virtual depth can be quite large but few applications may take advantage of it Allows us much better utilization of the CPU area 15
16 MP on a chip First Virtual CPU Second Virtual CPU INT Unit FP Unit 16
17 Processes and Threads A thread is simply a process which shares the address space of the process it resides in with the other threads in that process What is the importance of Threads vs Processes in SMP? 17
18 The MESI Protocol Common protocol for ensuring sequential consistency States are Modified Exclusive Shared Invalid 18
19 MESI Protocol M E S I 19
20 False Sharing Unfortunate memory layout may cause performance problems due to false sharing Int cnt[2]; A: cnt[0]++; B: cnt[1]++; cnt[0] cacheline cnt[1] 20
21 False Sharing: A test! volatile int count[2]; #ifdef FALSE worker(void * arg) { int i,index=(int)arg; for(i=0;i< ; i++) count[index]++; } #else worker(void * arg) { int i,index=(int)arg; int temp=0; for(i=0;i< ; i++) temp++; count[index]+=temp; } #endif main() { pthread_t t; } pthread_create(&t,null,worker,null); worker((void *)1); printf("%d %d\n",count[0],count[1]); False Sharing u 0.020s 0: % 0+0k 0+0io 245pf+0w No False Sharing 2.690u 0.000s 0: % 0+0k 0+0io 245pf+0w 21
22 Scaling Shared Memory MESI requires broadcast of memory operations To scale SHM architectures above a few tens of processors we may relax memory consistency 22
23 Defining Memory Consistency A Memory Consistency Model defines a set of constraints that must be meet by a system to conform to the given consistency model. These constraints define a set of rules that define how memory operations are viewed relative to: Real time Each other Different nodes 23
24 Why Relax the Consistency Model To simplify bus design on SMP systems More relaxed consistency models requires less bus bandwidth More relaxed consistency requires less cache synchronization To lower contention on DSM systems More relaxed consistency models allows better sharing More relaxed consistency models requires less interconnect bandwidth 24
25 Strict Consistency Any read to a memory location x returns the value stored by the most resent write to x. Performs correctly with race conditions Can t be implemented in systems with more than one CPU 25
26 Strict Consistency P 0 : P 1 : R(x)0 W(x)1 R(x)1 P 0 : P 1 : W(x)1 R(x)0 R(x)1 26
27 Sequential Consistency [A multiprocessor system is sequentially consistent if ] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appears in this sequence in the order specified by its program. Handles all correct code, except race conditions Can be implemented with more than one CPU 27
28 Sequential Consistency P 0 : P 1 : R(x)0 W(x)1 R(x)1 P 0 : P 1 : W(x)1 R(x)0 R(x)1 P 0 : W(x)1 P 0 : W(x)1 W(y)1 P 1 : R(x)0 P 1 : R(x)0 P 2 : R(x)1 P 2 : R(y)1 28
29 Causal Consistency Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. Still fits programmers idea of sequential memory accesses Hard to make an efficient implementation 29
30 Causal Consistency P 0 W(X)1 P 1 R(X)1 W(Y)1 P 2 R(Y)1 R(X)1 P 0 W(X)1 P 1 R(X)1 W(Y)1 P 2 R(Y)1 R(X)0 30
31 PRAM Consistency Writes done by a single process are received by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. Operations from one node can be grouped for better performance Does not comply with ordinary memory conception 31
32 W(X)1 W(y(1) W(X)2 W(y(1) W(X)3 W(y(1) PRAM Model CPU CPU... CPU Memory 32
33 PRAM Consistency P 0 W(X)1 P 1 R(X)1 W(Y)1 P 2 R(Y)1 R(X)0 P 0 W(X)1 W(X)2 P 1 R(X)2 R(X)1 33
34 Processor Consistency 1. Before a read is allowed to perform with respect to any other processor, all previous reads must be performed. 2. Before a write is allowed to perform with respect to any other processor, all other accesses (read and write) must be performed. Slightly stronger than PRAM Slightly easier than PRAM 34
35 Weak Consistency 1. Accesses to synchronization variables are sequentially consistent. 2. No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere. 3. No data access ( read or write ) is allowed to be performed until all previous accesses to synchronization variables have been performed. Synchronization variables are different from ordinary variables Lends itself to natural synchronization based parallel programming 35
36 Weak Consistency P 0 W(X)1 W(X)2 S P 1 R(X)1 S R(X)2 P 0 W(X)1 W(X)2 S P 1 S R(X)1 R(X)2 36
37 Release Consistency 1. Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully. 2. Before a release is allowed to be performed, all previous reads and writes done by the process must have completed. 3. The acquire and release accesses must be processor consistent. Synchronization's now differ between Acquire and Release Lends itself directly to semaphore synchronized parallel programming 37
38 Release Consistency P 0 : Acq(L) W(x)0 W(x)1 Rel(L) P 1 : R(x)0 P 2 : Acq(L) R(x)1 Rel(L) P 0 : Acq(L) W(x)0 W(x)1 Rel(L) P 1 : Acq(L) R(x)0 Rel(L) P 2 : Acq(L) R(x)1 Rel(L) 38
39 Lazy Release Consistency Differs only slightly from Release Consistency Release dependent variables are not propagated at release, but rather at the following acquire This allows Release Consistency to be used with smaller granularity 39
40 Entry Consistency 1. An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process. 2. Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in non-exclusive mode. 3. After an exclusive mode access to a synchronization variable has been performed, any other process next non-exclusive mode access to that synchronization variable may not be performed until it has been performed with respect to that variable s owner. Associates specific synchronization variables with specific data variables 40
41 Automatic Update Automatic update consistency has the same semantics as lazy release consistency, and adding: Before performing a release all automatic updates must be performed. Lends itself to hardware support Efficient when two nodes are sharing the same data often 41
42 Comparing Consistency models Added Semantics Entry Automatic Update Release Lazy Release Weak PRAM Processor Causal Strict Sequential Efficiency 42
43 Working with Relaxed Consistency Models Natural tradeoff between efficiency and added work Anything beyond Causal Consistency requires the consistency model to be explicitly known Compiler knowledge of the consistency model can hide the relaxation from the programmer 43
44 CASE Studies Intel MP Specification Sequent NUMA-Q SGI Origin-2000 Cray T3E 44
45 Intel MP Specification Shared bus approach Sequential Consistency MESI Protocol Scales to 4 processors directly 8 processors by gluing two 4 processor busses together UMA 45
46 Sequent NUMA-Q Based on Intel MP Spec and SCI bus Scalable Coherent Bus 4 way SMP boards keept coherent by MESI Boards keeps coherent by SCI The SCI ring is implemented in one chip UMA/NUMA 46
47 SGI Origin-2000 Extention to Stanford DASH Provides sequential consistency Boards with one R12000 or two R10000 processors (MESI) LARGE interface to the world keeps memory coherent NUMA (cc-numa) 47
48 Cray T3E Alpha processors with local memory May access the memory on other processors Memory is note keept coherent in any way! Special functionalities for synchronization NUMA 48
49 Summary Relaxing memory consistency is necessary for any system with more than one processor Simple relaxation can be hidden Strong relaxation can achieve better performance 49
Distributed Shared Memory
Distributed Shared Memory History, fundamentals and a few examples Coming up The Purpose of DSM Research Distributed Shared Memory Models Distributed Shared Memory Timeline Three example DSM Systems The
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationDISTRIBUTED SHARED MEMORY
DISTRIBUTED SHARED MEMORY COMP 512 Spring 2018 Slide material adapted from Distributed Systems (Couloris, et. al), and Distr Op Systems and Algs (Chow and Johnson) 1 Outline What is DSM DSM Design and
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationReplication of Data. Data-Centric Consistency Models. Reliability vs. Availability
CIS 505: Software Systems Lecture Note on Consistency and Replication Instructor: Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Replication of
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationLecture 13. Shared memory: Architecture and programming
Lecture 13 Shared memory: Architecture and programming Announcements Special guest lecture on Parallel Programming Language Uniform Parallel C Thursday 11/2, 2:00 to 3:20 PM EBU3B 1202 See www.cse.ucsd.edu/classes/fa06/cse260/lectures/lec13
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationMemory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.
Memory Consistency Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Shared Memory Two types of memory organization in parallel and distributed systems Shared memory (shared address
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationPortland State University ECE 588/688. Directory-Based Cache Coherence Protocols
Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More informationMultiprocessor Synchronization
Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationEN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors
EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationPCS - Part Two: Multiprocessor Architectures
PCS - Part Two: Multiprocessor Architectures Institute of Computer Engineering University of Lübeck, Germany Baltic Summer School, Tartu 2008 Part 2 - Contents Multiprocessor Systems Symmetrical Multiprocessors
More information9. Distributed Shared Memory
9. Distributed Shared Memory Provide the usual programming model of shared memory in a generally loosely coupled distributed environment. Shared Memory Easy to program Difficult to build Tight coupling
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationParallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence
Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationDistributed Shared Memory and Memory Consistency Models
Lectures on distributed systems Distributed Shared Memory and Memory Consistency Models Paul Krzyzanowski Introduction With conventional SMP systems, multiple processors execute instructions in a single
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationMore on Concurrency and Threads
More on Concurrency and Threads Knut Omang Ifi/Oracle 12 Feb, 2014 (with slides from several people) Today: Thread implementation user/kernel/hybrid communication between threads? Understanding the hardware
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationCSE 513: Distributed Systems (Distributed Shared Memory)
CSE 513: Distributed Systems (Distributed Shared Memory) Guohong Cao Department of Computer & Engineering 310 Pond Lab gcao@cse.psu.edu Distributed Shared Memory (DSM) Traditionally, distributed computing
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationLecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory
Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory 1 Capacity Limitations P P P P B1 C C B1 C C Mem Coherence Monitor Mem Coherence Monitor B2 In a Sequent NUMA-Q design above,
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationCache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O
6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationChapter 2: Computer-System Structures. Hmm this looks like a Computer System?
Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding
More informationUniprocessor Computer Architecture Example: Cray T3E
Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure
More informationChapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Part 2 Homogeneous & Heterogeneous Multicore Architectures Intel XEON 22nm
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationEE382 Processor Design. Illinois
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationCS377P Programming for Performance Multicore Performance Cache Coherence
CS377P Programming for Performance Multicore Performance Cache Coherence Sreepathi Pai UTCS October 26, 2015 Outline 1 Cache Coherence 2 Cache Coherence Awareness 3 Scalable Lock Design 4 Transactional
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationCONSISTENCY MODELS IN DISTRIBUTED SHARED MEMORY SYSTEMS
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More informationCache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014
Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationEE382 Processor Design. Processor Issues for MP
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Consistency and Replication Jia Rao http://ranger.uta.edu/~jrao/ 1 Reasons for Replication Data is replicated for the reliability of the system Servers are replicated for performance
More informationCOMP Parallel Computing. CC-NUMA (2) Memory Consistency
COMP 633 - Parallel Computing Lecture 11 September 26, 2017 Memory Consistency Reading Patterson & Hennesey, Computer Architecture (2 nd Ed.) secn 8.6 a condensed treatment of consistency models Coherence
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationLect. 6: Directory Coherence Protocol
Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMultiprocessor Systems. Chapter 8, 8.1
Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor
More informationCOMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory
COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy
More informationMemory Consistency Models
Calcolatori Elettronici e Sistemi Operativi Memory Consistency Models Sources of out-of-order memory accesses... Compiler optimizations Store buffers FIFOs for uncommitted writes Invalidate queues (for
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationLecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols
Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols 1 Modern Memory System...... PROC.. 4 DDR3 channels 64-bit data channels
More informationDistributed Shared Memory: Concepts and Systems
Distributed Shared Memory: Concepts and Systems Jelica Protić, Milo Toma sević and Veljko Milutinović IEEE Parallel & Distributed Technology, Summer 1996 Context: distributed memory management high-performance
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationConsistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering
Consistency: Relaxed SWE 622, Spring 2017 Distributed Software Engineering Review: HW2 What did we do? Cache->Redis Locks->Lock Server Post-mortem feedback: http://b.socrative.com/ click on student login,
More informationLecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 16/17: Distributed Shared Memory CSC 469H1F Fall 2006 Angela Demke Brown Outline Review distributed system basics What is distributed shared memory? Design issues and tradeoffs Distributed System
More informationShared Virtual Memory. Programming Models
Shared Virtual Memory Arvind Krishnamurthy Fall 2004 Programming Models Shared memory model Collection of threads Sharing the same address space Reads/writes on shared address space visible to all other
More information