COS 318: Operating Systems. Message Passing. Kai Li and Andy Bavier Computer Science Department Princeton University

Similar documents
l Threads in the same process l Processes that share an address space l Inter-process communication (IPC) u Destination or source u Message (msg)

COS 318: Operating Systems. Message Passing. Vivek Pai Computer Science Department Princeton University

Introduction to parallel computing concepts and technics

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Programming with MPI Basic send and receive

Parallel Computing Paradigms

COSC 6374 Parallel Computation

Point-to-Point Communication. Reference:

Parallel Short Course. Distributed memory machines

High Performance Computing Course Notes Message Passing Programming I

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Introduction to the Message Passing Interface (MPI)

int sum;... sum = sum + c?

MPI Runtime Error Detection with MUST

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSci Introduction to Distributed Systems. Communication: RPC

Holland Computing Center Kickstart MPI Intro

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center

Message Passing Interface

High performance computing. Message Passing Interface

Message Passing Interface. George Bosilca

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Introduction to MPI. Branislav Jansík

Slides prepared by : Farzana Rahman 1

Parallel Programming

Practical Scientific Computing: Performanceoptimized

MPI (Message Passing Interface)

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CSE 160 Lecture 18. Message Passing

MPI Message Passing Interface

Parallel Programming using MPI. Supercomputing group CINECA

Outline. Communication modes MPI Message Passing Interface Standard

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Standard MPI - Message Passing Interface

AMath 483/583 Lecture 21

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

An Introduction to MPI

Message Passing Interface. most of the slides taken from Hanjun Kim

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Message-Passing Computing

Context. Distributed Systems: Sockets Programming. Alberto Bosio, Associate Professor UM Microelectronic Departement

Message Passing Interface

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

MPI Message Passing Interface. Source:

A Message Passing Standard for MPP and Workstations

CSE 160 Lecture 15. Message Passing

ME964 High Performance Computing for Engineering Applications

Parallel Programming

Parallel programming with MPI Part I -Introduction and Point-to-Point Communications

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008

Buffering in MPI communications

Distributed Memory Programming with Message-Passing

Introduction to MPI: Part II

A message contains a number of elements of some particular datatype. MPI datatypes:

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Parallel programming with MPI Part I -Introduction and Point-to-Point

MPI Runtime Error Detection with MUST

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Non-Blocking Communications

Non-Blocking Communications

Remote Procedure Call

Interprocess Communication

Inter-Process Communication: Message Passing. Thomas Plagemann. Big Picture. message passing communication?

Lecture 9: MPI continued

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

Programming with MPI. Pedro Velho

What is Hadoop? Hadoop is an ecosystem of tools for processing Big Data. Hadoop is an open source project.

Reusing this material

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

IPC. Communication. Layered Protocols. Layered Protocols (1) Data Link Layer. Layered Protocols (2)

High Performance Computing Course Notes Message Passing Programming III

MPI. (message passing, MIMD)

15-440: Recitation 8

Intermediate MPI (Message-Passing Interface) 1/11

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

High Performance Computing

Parallel Programming with MPI: Day 1

Intermediate MPI (Message-Passing Interface) 1/11

ECE 650 Systems Programming & Engineering. Spring 2018

Parallel Programming, MPI Lecture 2

ECE 574 Cluster Computing Lecture 13

MPI MESSAGE PASSING INTERFACE

Last Class: RPCs. Today:

Lecture 7: Distributed memory

MPI point-to-point communication

Practical Introduction to Message-Passing Interface (MPI)

ECE 563 Midterm 1 Spring 2015

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 22: Remote Procedure Call (RPC)

DEADLOCK DETECTION IN MPI PROGRAMS

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI

Transcription:

COS 318: Operating Systems Message Passing Kai Li and Andy Bavier Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/)

Quizzes Quiz 1 Most of you did very well Quiz 2: Mesa-style monitor: Continue current thread after Signal() Allows Signal() to wakeup more than 1 thread After Wait(), the condition may not be true Quiz 3: Most of you did very well 2

Revisit Mesa-Style Monitor Waiting for a resource Acquire( mutex ); while ( no resource ) wait( mutex, cond );... (use the resource)... Release( mutex); Make a resource available Acquire( mutex );... (make resource available)... Signal( cond ); /* or Broadcast( cond ); Release( mutex); 3

About Midterm Exam Midterm may include these topics OS structure, processes and threads Synchronization Scheduling Deadlocks I/O devices Help? Office hours today: 3pm-5pm, 7:30-8:30pm Information In class this Thursday, 80 minutes No book, no notes, no cheat sheet No devices and no online accesses 4

Today s Topics Message passing Indirect communications Examples Mailbox Socket Message Passing Interface (MPI) Remote Procedure Call (RPC) Exceptions 5

Sending A Message Within A System Across A Network Send() Recv() Send() Network Recv() OS Kernel OS COS461 OS 6

Synchronous Message Passing (Within A System) Synchronous send: Call send system call with M send system call: No buffer in kernel: block Copy M to kernel buffer Synchronous recv: Call recv system call recv system call: No M in kernel: block Copy to user buffer How to manage kernel buffer? send( ) M M recv( ) M 7

API Issues Message Buffer and size Message type, buffer and size Destination or source Direct address: node Id, process Id Indirect address: mailbox, socket, channel, S send(dest, msg) R recv(src, msg) 8

Direct Addressing Example Producer(){... while (1) { produce item; recv(consumer, &credit); send(consumer, item); } } Consumer(){... for (i=0; i<n; i++) send(producer, credit); while (1) { recv(producer, &item); send(producer, credit); consume item; } } Does this work? Would it work with multiple producers and 1 consumer? Would it work with 1 producer and multiple consumers? What about multiple producers and multiple consumers? 9

Indirect Addressing Example Producer(){... while (1) { produce item; recv(prodmbox, &credit); send(consmbox, item); } } Consumer(){... for (i=0; i<n; i++) send(prodmbox, credit); while (1) { recv(consmbox, &item); send(prodmbox, credit); consume item; } } Would it work with multiple producers and 1 consumer? Would it work with 1 producer and multiple consumers? What about multiple producers and multiple consumers? 10

Indirect Communication Names mailbox, socket, channel, Properties Some allow one-to-one (e.g. pipe) Some allow many-to-one or one-to-many communications (e.g. mailbox) mbox pipe 11

Mailbox Message Passing Message-oriented 1-way communication Like real mailbox: letters/messages, not sure about receiver Data structure Mutex, condition variable, buffer for messages Operations Init, open, close, send, receive, Does the sender know when receiver gets a message? mbox_send(m) mbox_recv(m) 12

Example: Keyboard Input Interrupt handler Get the input characters and give to device thread Device thread Generate a message and send it a mailbox of an input process V(s); while (1) { P(s); Acquire(m); convert Release(m); }; mbox getchar() Interrupt handler Device thread 13

Sockets Sockets Bidirectional (unlike mailbox) Unix domain sockets (IPC) send /recv socket send/ recv Network sockets (over network) Same APIs Kernel Two types Datagram Socket (UDP) Collection of messages Best effort socket Connectionless Stream Socket (TCP) Stream of bytes (like pipe) Reliable Connection-oriented 14

Network Socket Address Binding A network socket binds to Host: IP address Protocol: UDP/TCP Port: Well known ports (0..1023), e.g. port 80 for Web Unused ports available for clients Each (1025..65535) Why ports (indirection again)? No need to know which process to communicate with Update software on one side won t affect another side ports UDP/TCP protocols 128.112.9.1 address 15

Communication with Stream Sockets Client Server Create a socket Create a socket Bind to a port Connect to server Send request Receive response Establish connection request reply Listen on the port Accept connection Receive request Send response 16

Sockets API Create and close a socket sockid = socket(af, type, protocol); Sockerr = close(sockid); Bind a socket to a local address sockerr = bind(sockid, localaddr, addrlength); Negotiate the connection listen(sockid, length); accept(sockid, addr, length); Connect a socket to destimation connect(sockid, destaddr, addrlength); Message passing send(sockid, buf, size, flags); Recv(sockid, buf, size, flags); 17

Message Passing Interface (MPI) A message-passing library for parallel machines Implemented at user-level for high-performance computing Portable MPI and MPI2 Basic (6 functions) Works for most parallel programs Large (125 functions) Blocking (or synchronous) message passing Non-blocking (or asynchronous) message passing Collective communication References http://www.mpi-forum.org/ 18

Hello World using MPI #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); }! MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; Last call to clean up Initialize MPI environment Return my rank Return # of processes 19

Blocking Send MPI_Send(buf, count, datatype, dest, tag, comm) buf address of send buffer count # of elements in buffer datatype data type of each send buffer element dest rank of destination tag message tag comm communicator This routine may block until the message is received by the destination process Depending on implementation More about message tag later 20

Blocking Receive MPI_Recv(buf, count, datatype, source, tag, comm, status) buf address of receive buffer (output) count maximum # of elements in receive buffer datatype datatype of each receive buffer element source rank of source tag message tag comm communicator status status object (output) Receive a message with the specified tag from the specified comm and specified source process MPI_Get_count(status, datatype, count) returns the real count of the received data 21

More on Blocking Send & Recv Comm = X MPI_Send(, dest=1, tag=1, comm=x ) Tag = 0 Tag = 1 Tag = MPI_Recv(, Source=0,tag=1,comm=X ) Can send from source to destination directly Send can block until recv gets the message Message passing must match Source rank (can be MPI_ANY_SOURCE) Tag (can be MPI_ANY_TAG) Comm (can be MPI_COMM_WORLD) 22

Buffered Send MPI_Bsend(buf, count, datatype, dest, tag, comm) buf address of send buffer count # of elements in buffer Datatype type of each send element dest rank of destination tag message tag comm communicator MPI_Buffer_attach(), MPI_Buffer_detach creates and destroy the buffer MPI_Bsend(buf, ) Buffer Created by MPI_Buffer_attach() MPI_Ssend: Use no buffer MPI_Rsend: ready send (recv posts first) 23

Non-Blocking Send MPI_Isend(buf, count, datatype, dest, tag, comm, *request) Same as MPI_Send except request, which is a handle Return as soon as possible Unsafe to use buf right away MPI_Wait(*request, *status) Block until send is done MPI_Test(*request, *flag,*status) Return the status without blocking MPI_Isend( ) Work to do MPI_Wait( ) MPI_Isend( ) Work to do MPI_Test(, flag, ); while ( flag == FALSE) { More work } 24

Non-Blocking Recv MPI_Irecv(buf, count, datatype, dest, tag, comm, *request, ierr) Return right away MPI_Wait() Block until finishing receive MPI_Test() Return status MPI_Probe(source, tag, comm, flag, status, ierror) Is there a matching message? MPI_Irecv( ) Work to do MPI_Wait( ) MPI_Probe( ) while ( flag == FALSE) { More work } MPI_Irecv( ) or MPI_recv( ) 25

Remote Procedure Call (RPC) Make remote procedure calls Similar to local procedure calls Examples: SunRPC, Java RMI Restrictions Call by value Call by object reference (maintain consistency) Not call by reference Different from mailbox, socket or MPI Remote execution, not just data transfer References B. J. Nelson, Remote Procedure Call, PhD Dissertation, 1981 A. D. Birrell and B. J. Nelson, Implementing Remote Procedure Calls, ACM Trans. on Computer Systems, 1984 26

RPC Model Caller (Client) Server RPC call Request message including arguments Return (same as local calls) Reply message Including a return value Function execution w/ passed arguments Compile time type checking and interface generation CSS490 RPC 27

RPC Mechanism Client program Server program Return Call Call Return Client stub Decode unmarshall Encode/ marshall Server stub Decode unmarshall Encode/ marshall RPC runtime Receive Send RPC runtime Receive Send ClientId RPCId Call Args Reply Results 28

Message-Passing Implementation Issues R waits for a message from S, but S has terminated R may be blocked forever S R S sends a message to R, but R has terminated S has no buffer and will be blocked forever S R 29

Exception: Message Loss Use ack and timeout to detect and retransmit a lost message Receiver sends an ack for each msg Sender blocks until an ack message is back or timeout status = send( dest, msg, timeout ); S send ack R If timeout happens and no ack, then retransmit the message Issues Duplicates Losing ack messages 30

Exception: Message Loss, cont d Retransmission must handle Duplicate messages on receiver side Out-of-sequence ack messages on sender side Retransmission Use sequence number for each message to identify duplicates Remove duplicates on receiver side Sender retransmits on an out-ofsequence ack Reduce ack messages Bundle ack messages Receiver sends noack messages: can be complex Piggy-back acks in send messages S send 1 ack 1 send 2 ack 2 R 31

Exception: Message Corruption Data x CRC Compute checksum Detection Compute a checksum over the entire message and send the checksum (e.g. CRC code) as part of the message Recompute a checksum on receive and compare with the checksum in the message Correction Trigger retransmission Use correction codes to recover 32

Summary Message passing Move data between processes Implicit synchronization Many API design alternatives (Socket, MPI) Indirections are helpful RPC Remote execution like local procedure calls With constraints in terms of passing data Issues Synchronous method is most common Asynchronous method provides overlapping Exception needs to be carefully handled 33