Distributed Shared Memory

Similar documents
9. Distributed Shared Memory

Shared Virtual Memory. Programming Models

CSE 513: Distributed Systems (Distributed Shared Memory)

Lecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Multiprocessors & Thread Level Parallelism

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Chapter 5. Multiprocessors and Thread-Level Parallelism

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

Distributed Shared Memory and Memory Consistency Models

DISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components:

Distributed Shared Memory: A Survey

Consistency and Replication (part b)

DISTRIBUTED SHARED MEMORY

Distributed shared memory - an overview

Distributed Shared Memory

Distributed Shared Memory: Concepts and Systems

1. Memory technology & Hierarchy

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Lect. 6: Directory Coherence Protocol

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 17: Distributed Systems (DS)

Distributed Shared Memory (DSM) Introduction Shared Memory Systems Distributed Shared Memory Systems Advantage of DSM Systems

Distributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan.

Flynn s Classification

Implementing caches. Example. Client. N. America. Client System + Caches. Asia. Client. Africa. Client. Client. Client. Client. Client.

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Distributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck)

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Multiprocessor Cache Coherency. What is Cache Coherence?

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

What is consistency?

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Memory Coherence in Shared Virtual

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Scalable Cache Coherence

Indirect Communication

Lecture 9: MIMD Architectures

Givy, a software DSM runtime using raw pointers

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Consistency: Strict & Sequential. SWE 622, Spring 2017 Distributed Software Engineering

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

1. Memory technology & Hierarchy

Portland State University ECE 588/688. Cache-Only Memory Architectures

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

ECSE 425 Lecture 30: Directory Coherence

Multi-Process Systems: Memory (2) Memory & paging structures: free frames. Memory & paging structures. Physical memory

G Disco. Robert Grimm New York University

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Memory Management. Dr. Yingwu Zhu

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

CMSC 611: Advanced. Distributed & Shared Memory

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Chapter 11: File System Implementation. Objectives

CMSC 611: Advanced Computer Architecture

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

Chap. 4 Multiprocessors and Thread-Level Parallelism

SDSM Progression. Implementing Shared Memory on Distributed Systems. Software Distributed Shared Memory. Why a Distributed Shared Memory (DSM) System?

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Directory Implementation. A High-end MP

Memory Management. Dr. Yingwu Zhu

殷亚凤. Processes. Distributed Systems [3]

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Selected Questions. Exam 2 Fall 2006

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operating Systems ( ) Chapter 6: Demand Paging

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

Computer Architecture

VIRTUAL MEMORY II. Jo, Heeseung

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

The memory of a program. Paging and Virtual Memory. Swapping. Paging. Physical to logical address mapping. Memory management: Review

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

Chapter 6: Demand Paging

Lecture 9: MIMD Architectures

Multitasking and Multithreading on a Multiprocessor With Virtual Shared Memory. Created By:- Name: Yasein Abdualam Maafa. Reg.

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Shared Memory. SMP Architectures and Programming

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnect Routing

Virtualization. Virtualization

Transcription:

Distributed Shared Memory EECS 498 Farnam Jahanian University of Michigan Reading List Supplemental Handout: pp. 312-313, 333-353 from Tanenbaum Dist. OS text (dist. in class) What DSM? Concept & Design Issues Comparison of Shared Memory Systems IVY (page-based DSM) Munin (shared variable DSM)

What s DSM? First introduced by Li & Hudak in 1986 Adv. of multi-processors: easier to program Adv. Of multi-computers: easier to build Idea: provide the mechanism for a set of networked workstations to share a single, paged virtual address space. A ref. to local memory is done in hardware fast Emulate multi-processor caches using MMU and OS: an attempt to ref. an address that is not local causes a page fault, trap to OS, msg. to remote node to fetch the page, and restart faulting instruction Idea is similar to traditional VM systems Sounds great! What s the problem? Performance!!! Simple but potential poor performance (similar to thrashing in uni-processors) What to do? Later, but first various forms of shared memory systems. Comparison of Shared Memory Systems Figure 6-10 (supp. handout) Spectrum: (tightly-coupled to loosely-coupled architectures) Single-bus multi-processor Switched multi-processor NUMA Paged-based DSM Shared-variable DSM Object-based DSM Hardware-controlled caching vs. software-controlled caching Cache (replication) managed by MMU, OS or language run-time system Remote access: enabled by hardware vs. software Transfer unit: cache block, page, data structure, object

Distributed Shared Memory IVY system (page-based DSN) used to illustrate DSM concept and issues: Granularity of transfer Replication Consistency semantics Write-update vs. invalidation Finding the owner Finding the copies Replacement policy Distributed Shared Memory Systems a) Pages of address space distributed among four machines b) Situation after CPU 1 references page 10 c) Situation if page 10 is read only and replication is used

Granularity of Transfer How large should the unit of transfer (sharing) be? Word, block, page, segments of pages Trade-off: Adv. large blocks: fewer # of page transfers by taking adv. of locality of ref. Disadv. large blocks: false sharing False Sharing False sharing of a page between two independent processes. 1.18

Replication and Consistency No replication exactly one copy of each page and it moves around sequential consistency is easy to maintain Replicated read-only copies, single read-write copy sequential consistency is easy to maintain Replicated read-write copies potential consistency problem (similar to a multi-processor case when a processor attempts to modify a word that is present in caches) Cache invalidation vs. write-thru (Figure 6-27 in supp. handout) DSMs typically use cache invalidation: write-thru is expensive, very inefficient if many consecutive writes to the same page, difficult to enforce sequential consistency if multiple concurrent updates originating from several processors. Page Ownership Stationary (fixed) Migrating (Dynamic) Centralized Mgr. * Distributed Mgr. Fixed Mgr. * * Dynamic Mgr. * Considered in IVY systems. Manager = Directory Server

Finding the Owner Broadcast: asking for the owner to respond Optimization: ask for copy of page in r/w mode Overhead and scalability issues Designate a centralized manager (i.e. directory server) Figure 6-28 (supp. handout) Potential bottleneck Page Mgr. Page Mgr. 1. Request 2. Reply 1. Request 2. Request Forwarded P 3. Request 4. Reply Owner P 3. Reply Owner Finding the Owner (cont.) Fixed Distributed Manager Multiple managers in the system Split the work by partitioning the pages among managers How to find the right manager? user lower-order bits of the page as an index into a manager s table, or use a hash function. The manager uses the incoming requests to keep track of changes in the ownership. Dynamic Distributed Manager In this scheme, manager = owner Each process keeps track of the probable owner Request sent to a probable owner is forwarded to the new owner Periodically, location of new owner is broadcast to everyone to ensure updated tables

Finding all Copies to be Invalidated Broadcast: requires reliable msg. delivery Copyset maintained by owner/manager to invalidate, send a msg. to each processor in the copyset; must wait for acks from everyone SEE HANDOUT FOR DETAILS ON READ & WRITE PROTOCOLS (DYNAMIC DISTRIBUTED MANAGER) What about performance problems of DSMs? Don t share the entire page. Share selected variables and data structures higher level of abstraction, reduce false sharing Exploit data types or semantics of sharing Relax consistency SEE HANDOUT FOR OVERVIEW OF MUNIN SYSTEM