DISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components:

Similar documents
Distributed Shared Memory. Presented by Humayun Arafat

SDSM Progression. Implementing Shared Memory on Distributed Systems. Software Distributed Shared Memory. Why a Distributed Shared Memory (DSM) System?

Shared Virtual Memory. Programming Models

Distributed Shared Memory

9. Distributed Shared Memory

Distributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck)

Distributed Shared Memory

Multiprocessors 2007/2008

Distributed Shared Memory: Concepts and Systems

Multiple-Writer Distributed Memory

Multitasking and Multithreading on a Multiprocessor With Virtual Shared Memory. Created By:- Name: Yasein Abdualam Maafa. Reg.

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Distributed Shared Memory: A Survey

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers

"is.n21.jiajia" "is.n21.nautilus" "is.n22.jiajia" "is.n22.nautilus"

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Introduction. Chapter 1

Is S-DSM Dead? Michael L. Scott. Department of Computer Science University of Rochester

CHAPTER 1 Fundamentals of Distributed System. Issues in designing Distributed System

CSE 120 Principles of Operating Systems

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Lecture 23 Database System Architectures

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

Client Server & Distributed System. A Basic Introduction

Evaluation of the JIAJIA Software DSM System on High Performance Computer Architectures y

6.1 Multiprocessor Computing Environment

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Shared Memory Models

Lecture 9: MIMD Architectures

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.

Lecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown

Distributed Object Sharing for Cluster-based Java Virtual Machine

Chapter 20: Database System Architectures

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Concepts of Distributed Systems 2006/2007

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

CSE 513: Distributed Systems (Distributed Shared Memory)

The MOSIX Scalable Cluster Computing for Linux. mosix.org

Database Architectures

OpenACC Course. Office Hour #2 Q&A

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

Distributed Shared Memory (DSM) Introduction Shared Memory Systems Distributed Shared Memory Systems Advantage of DSM Systems

Run-Time Support for Distributed Sharing in Typed Languages

WHY PARALLEL PROCESSING? (CE-401)

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability

EE382 Processor Design. Processor Issues for MP

Multiprocessor OS. COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2. Scalability of Multiprocessor OS.

An Efficient Lock Protocol for Home-based Lazy Release Consistency

CSE Traditional Operating Systems deal with typical system software designed to be:

IO virtualization. Michael Kagan Mellanox Technologies

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

CA464 Distributed Programming

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Lecture 9: MIMD Architecture

Adaptive Cluster Computing using JavaSpaces

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

CS370 Operating Systems

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Distributed Shared Memory and Memory Consistency Models

CSCI 4717 Computer Architecture

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Advanced Operating Systems

Distributed Systems Principles and Paradigms

DISTRIBUTED SHARED MEMORY

CS533 Concepts of Operating Systems. Jonathan Walpole

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Distributed Computing. Santa Clara University 2016

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

DSM64: A DISTRIBUTED SHARED MEMORY SYSTEM IN USER-SPACE. A Thesis. Presented to. the Faculty of California Polytechnic State University

Givy, a software DSM runtime using raw pointers

Two Phase Commit Protocol. Distributed Systems. Remote Procedure Calls (RPC) Network & Distributed Operating Systems. Network OS.

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. UNIT I PART A (2 marks)

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

GFS: The Google File System

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

SMD149 - Operating Systems - Multiprocessing

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Concurrent Object-Oriented Programming

On the Design of Global Object Space for Efficient Multi-threading Java Computing on Clusters 1

An Introduction to Distributed Shared Memory Concepts

PRACE Autumn School Basic Programming Models

Shared Memory. SMP Architectures and Programming

Lecture 9: MIMD Architectures

OS Extensibility: SPIN and Exokernels. Robert Grimm New York University

Chapter 3 Parallel Software

Operating Systems, Fall Lecture 9, Tiina Niklander 1

Multiprocessors 2014/2015

CUDA GPGPU Workshop 2012

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

Transcription:

SHARED ADDRESS SPACE DSM consists of two components: DISTRIBUTED SYSTEMS [COMP9243] ➀ Shared address space ➁ Replication and consistency of memory objects Lecture 3b: Distributed Shared Memory Shared address space: Slide 1 ➀ DSM ➁ Case study ➂ Design issues ➃ Implementation issues Slide 3 0x2000 0x2000 Shared addresses are valid in all processes DISTRIBUTED SHARED MEMORY (DSM) Transparent remote access: DSM: shared memory + multicomputer Shared global address space 0x2000 0x2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Slide 2 0 2 5 1 3 6 4 7 11 13 15 16 Slide 4 9 8 10 12 14 Properties: CPU 1 CPU 2 CPU 3 CPU 4 Remote access is expensive compared to local memory access Individual operations can have very low overhead Threads can distinguish between local and remote access SHARED ADDRESS SPACE 1 SHARED ADDRESS SPACE 2

Why DSM?: Slide 5 Shared memory model: easiest to program to Physical shared memory not possible on multicomputer DSM emulates shared memory Benefits of DSM: Ease of programming (shared memory model) Eases porting of existing code Pointer handling Shared pointers refer to shared memory Share complex data (lists, etc.) Slide 7 Middleware: Library: Library routines to create/access shared memory Example: MPI-2, CRL Language Shared memory encapsulated in language constructs Extend language with annotations Example: Orca, Linda, JavaSpaces, JavaParty, Jackal No marshalling DSM IMPLEMENTATIONS Hardware: Multiprocessor Example: MIT Alewife, DASH Typical Implementation: Most often implemented in user space (e.g., TreadMarks, CVM) User space: what s needed from the kernel? Slide 6 OS with hardware support: SCI network cards (SCI = Scalable Coherent Interconnect) SCI maps extended physical address space to remote nodes OS maps shared virtual address space to SCI range OS and Virtual Memory: Virtual memory (page faults, paging) Slide 8 User-level fault handler [e.g., Unix signals] User-level VM page mapping and protection [e.g., mmap() and mprotect()] Message passing layer [e.g., socket API] Local address space vs Large address space DSM IMPLEMENTATIONS 3 DSM IMPLEMENTATIONS 4

Example: two processes sharing memory pages: Page migration and replication: Slide 9 Slide 11 Occurrence of a read fault: Recovery from read fault: Slide 10 Slide 12 Fault! Resume DSM IMPLEMENTATIONS 5 DSM MODELS 6

Slide 13 DSM MODELS Shared page (coarse-grained): Traditional model Ideal page size? X False sharing Examples: Ivy, TreadMarks Shared region (fine-grained): More fine grained than sharing pages V Prevent false sharing X Not regular memory access (transparency) Examples: CRL (C Region Library), MPI-2 one-sided communication, Shasta Slide 15 Tuple Space: A Write A B Write B T Read T Insert a copy of A Tuple instance Insert a copy of B A B B B A A JavaSpace C Look for tuple that matches T C Return C (and optionally remove it) Slide 14 Shared variable: Release and Entry based consistency Annotations V Fine grained X More complex for programmer Examples: Munin, Midway Shared structure: Encapsulate shared data Access only through predefined procedures (e.g., methods) V Tightly integrated synchronisation V Encapsulate (hide) consistency model X Lose familiar shared memory model Examples: Orca (shared object), Linda (tuple space) Slide 16 main() {... eval("function", f()) ; eval("function", f()) ;... for (i=0; i<100; i++) out("data", i) ;... } f(){ in("data",?x) ; y = g(x) ; out("function", x, y) ; } What s good about this? LINDA EXAMPLE DSM MODELS 7 APPLICATIONS OF DSM 8

REQUIREMENTS OF DSM APPLICATIONS OF DSM Transparency: Location, migration, replication, concurrency Slide 17 Scientific parallel computing Bioinformatics (gene sequence analysis) Simulations (climate modeling, economic modeling) Data processing (physics, astronomy) Graphics (image processing, rendering) Data server (distributed FS, Web server) Slide 19 Reliability: Computations depend on availability of data Performance: Important in high-performance computing Important for transparency Data storage Scalability: Important in wide-area Important for large computations Slide 18 DSM ENVIRONMENTS Multiprocessor NUMA Multicomputer Supercomputer Cluster of Workstations Wide-area Slide 20 Consistency: Access to DSM should be consistent According to a consistency model Programmability: Easy to program Communication transparency REQUIREMENTS OF DSM 9 CASE STUDY 10

USING TREADMARKS Compiling: Compile Slide 21 CASE STUDY TreadMarks: 1992 Rice University Page based DSM library C, C++, Java, Fortran Lazy release consistency model Heterogeneous environment Slide 23 Link with TreadMarks libraries Starting a TreadMarks Application: app -- -h host1 -h host2 -h host3 -h host4 Anatomy of a TreadMarks Program: Starting remote processes Tmk_startup(argc, argv); Allocating and sharing memory shared = (struct shared*) Tmk_Malloc(sizeof(shared)); Tmk_distribute(&shared, sizeof(shared)); DESIGN ISSUES Granularity Page based, Page size: minimum system page size Slide 22 Replication Lazy release consistency Scalability Meant for cluster or NOW ( of Workstations) Synchronisation primitives Locks (acquire and release), Barrier Heterogeneity Limited (doesn t address endianness or mismatched word sizes) Slide 24 Barriers Tmk_barrier(0); Acquire/Release Tmk_lock_acquire(0); shared->sum += mysum; Tmk_lock_release(0); Fault Tolerance Research No Security USING TREADMARKS 11 TREADMARKS IMPLEMENTATION 12

TREADMARKS IMPLEMENTATION Consistency Protocol: Slide 25 Multiple writer Twins Reduce false sharing R x=1 P1 x(0) P1 x=1 RW x(0) twin x(0) Slide 27 Data Location: Know who has diffs because of invalidations Each page has a statically assigned manager Modification Detection: Page Fault If page is read-only then do consistency protocol 1. Write causes page fault 2. After page fault If not in local memory, get from manager P1 x=1 RW x(1) twin x(0) P1 RW x(1) diff x 0 1 Memory Management: Garbage collection of diffs 3. Write is executed 4. At release or barrier Update Propagation: Modified pages invalidated at acquire Page is updated at access time Updates are transferred as diffs Lazy Diffs: Initialisation: Processes set up communication channels between themselves Register SIGIO handler for communication Allocate large block of memory Slide 26 Normally make diffs at release time Lazy: make diffs only when they are requested Slide 28 Same (virtual) address on each machine Mark as non-accessible Communication: UDP/IP or AAL3/4 (ATM) Light-weight, user-level protocols to ensure message delivery Assign manager process for each page, lock, barrier (round robin) Register SEGV handler Use SIGIO for message receive notification TREADMARKS IMPLEMENTATION 13 READING LIST 14

READING LIST Slide 29 Distributed Shared Memory: A Survey of Issues and Algorithms An overview of DSM and key issues as well as older DSM implementations. TreadMarks: Shared Memory Computing on s of Workstations An overview of TreadMarks, design decisions and implementation. Slide 30 Do Assignment 1! HOMEWORK HOMEWORK 15