Ruby Threads Thread Creation. CMSC 330: Organization of Programming Languages. Ruby Threads Condition. Ruby Threads Locks.

Similar documents
CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Concurrency & Multiprocessing

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

CS427 Multicore Architecture and Parallel Computing

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CMSC 132: Object-Oriented Programming II

COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Threaded Programming. Lecture 9: Alternatives to OpenMP

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

12:00 13:20, December 14 (Monday), 2009 # (even student id)

CMSC 330: Organization of Programming Languages

CMSC330 Fall 2010 Final Exam Solutions

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

The Technology Behind. Slides organized by: Sudhanva Gurumurthi

Computer Architecture

CMSC330 Fall 2009 Final Exam

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Clustering Lecture 8: MapReduce

Parallel Computing and the MPI environment

CMSC 132: Object-Oriented Programming II. Threads in Java

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

Distributed Memory Parallel Programming

CS 470 Spring Mike Lam, Professor. OpenMP

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

CS 470 Spring Mike Lam, Professor. OpenMP

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Mixed-Mode Programming

Lecture 4: OpenMP Open Multi-Processing

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L12: Cloud Computing 1

CMSC330 Fall 2014 Final Exam

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Lecture 36: MPI, Hybrid Programming, and Shared Memory. William Gropp

Introduction to the Message Passing Interface (MPI)

Chapter 4. Message-passing Model

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Implementation of Parallelization

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Introduction to Parallel Programming Part 4 Confronting Race Conditions

ECE 574 Cluster Computing Lecture 13

ECE 563 Second Exam, Spring 2014

Parallel Programming in C with MPI and OpenMP

Distributed Internet Applications - DIA. Processes and Threads

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

Concurrency, Thread. Dongkun Shin, SKKU

Hybrid MPI and OpenMP Parallel Programming

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L3b: Cloud Computing 1

Scalasca performance properties The metrics tour

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Synchronisation in Java - Java Monitor

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

Message Passing Interface

Processes The Process Model. Chapter 2. Processes and Threads. Process Termination. Process Creation

CMSC330 Spring 2012 Final Exam Solutions

Multithreading in C with OpenMP

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

AMath 483/583 Lecture 18 May 6, 2011

Message Passing Interface

Map-Reduce in Various Programming Languages

Parallel Programming in C with MPI and OpenMP

Parallel Programming Libraries and implementations

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Parallel Paradigms & Programming Models. Lectured by: Pham Tran Vu Prepared by: Thoai Nam

ECE 574 Cluster Computing Lecture 10

MPI Collective communication

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Allows program to be incrementally parallelized

Shared Memory Programming Model

Introduction to MapReduce

Holland Computing Center Kickstart MPI Intro

COSC 6374 Parallel Computation. Shared memory programming with POSIX Threads. Edgar Gabriel. Fall References

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Java Monitors. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Processes The Process Model. Chapter 2 Processes and Threads. Process Termination. Process States (1) Process Hierarchies

Message-Passing Computing

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

CS 345A Data Mining. MapReduce

Parallel Computing Parallel Programming Languages Hwansoo Han

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Shared Memory Programming. Parallel Programming Overview

Introduction to OpenMP.

Transcription:

CMSC 330: Organization of Programming Languages Multithreading 2 Ruby Threads Thread Creation Create thread using Thread.new New method takes code block argument t = Thread.new { body of thread t = Thread.new (arg) { arg body of thread Join method waits for thread to complete t.join Example mythread = Thread.new { sleep 1 # sleep for 1 second puts New thread awake!" $stdout.flush # flush makes sure output is seen CMSC 330 2 Ruby Threads Locks Monitor, Mutex Core synchronization mechanisms Enforces mutual exclusion! Mutex is not reentrant, Monitor is Create lock using Monitor.new Synchronize method takes code block argument require 'monitor.rb' mylock = Monitor.new mylock.synchronize { # mylock held during this code block CMSC 330 3 Ruby Threads Condition Condition derived from Monitor Create condition from lock using new_cond Sleep while waiting using wait_while, wait_until Wake up waiting threads using broadcast Example mylock = Monitor.new # new lock mycondition = mylock.new_cond # new condition mylock.synchronize { mycondition.wait_while { y > 0 # wait as long as y > 0 mycondition.wait_until { x!= 0 # wait as long as x == 0 mylock.synchronize { mycondition.broadcast # wake up all waiting threads CMSC 330 4

Parking Lot Example require "monitor.rb" class ParkingLot def initialize # initialize synchronization @numcars = 0 @mylock = Monitor.new @mycondition = @mylock.new_cond def addcar def removecar CMSC 330 5 Parking Lot Example def addcar # do work not requiring synchronization @mylock.synchronize { @mycondition.wait_until { @numcars < MaxCars @numcars = @numcars + 1 @mycondition.broadcast def removecar # do work not requiring synchronization @mylock.synchronize { @mycondition.wait_until { @numcars > 0 @numcars = @numcars - 1 @mycondition.broadcast CMSC 330 6 Parking Lot Example garage = ParkingLot.new valet1 = Thread.new { # valet 1 drives cars into parking lot while # do work not requiring synchronization garage.addcar valet2 = Thread.new { # valet 2 drives car out of parking lot while # do work not requiring synchronization garage.removecar valet1.join # returns when valet 1 exits valet2.join # returns when valet 2 exits CMSC 330 7 Ruby Threads Comparing to Java Ruby Monitor m = Monitor.new Condition c = m.new_cond c.wait_until { e c.broadcast Java(1.5) ReentrantLock m = new ReentrantLock() Condition c = m.newcondition() while (e) c.await() c.signalall() CMSC 330 8

Ruby Threads Differences from Java Ruby thread can access all variables in scope when thread is created, including local variables Java threads can only access object fields, or final local variables Exiting All threads exit when main Ruby thread exits Java continues until all non-daemon threads exit When thread throws exception Ruby only aborts current thread (by default) Ruby can also abort all threads (better for debugging)! Set Thread.abort_on_exception = true CMSC 330 9 OCaml Threads Thread Creation Create thread using Thread.create method takes closure as its argument let t = Thread.create (fun x -> body ) arg;; Join method waits for thread to complete Thread.join t Example let mythread = Thread.create (fun _ -> Unix.sleep 1; (* sleep for 1 second *) print_string New thread awake! ; flush Pervasives.stdout (* flush ensures output is seen *) );; CMSC 330 10 OCaml Threads Locks Mutex module Not reentrant Has lock, unlock functions let my_lock = Mutex.create ();; Mutex.lock my_lock; (* my_lock held here *) Mutex.unlock my_lock OCaml Threads Conditions Condition module Create condition directly! let my_cond = Condition.create () Sleep while waiting using wait (takes mutex arg)! while (e) do! Condition.wait my_cond my_lock! done; (* condition now satisfied *) Wake up waiting threads using broadcast! Condition.broadcast my_cond CMSC 330 11 CMSC 330 12

OCaml Threads Comparing to Java OCaml Monitor let m = Mutex.create () Condition let c = Condition.create () while (not e) do Condition.wait c m done Condition.broadcast c Java(1.5) (Non)ReentrantLock m = new NonReentrantLock() Condition c = m.newcondition() while (e) c.await() c.signalall() These ideas have been around a long time http://www.cs.princeton.edu/courses/archive/fall09/cos318/ reading/birrell_threads.pdf CMSC 330 13 CMSC 330 14 Shared-memory Multithreading Looking beyond shared-memory multithreading + Portable, high degree of control - Low-level and unstructured Thread management, synchronization via locks and signals essentially manual Blocking synchronization is not compositional, which inhibits nested parallelism Easy to get wrong, hard to debug Data races, deadlocks all too common CMSC 330 15 CMSC 330 16

Parallel Language Extensions MPI expressive, portable, but Hard to partition data and get good performance! Temptation is to hardcode data locations, number of processors Hard to write the program correctly! Little relation to the sequential algorithm OpenMP, HPF parallelizes certain code patterns (e.g., loops), but Limited to built-in types (e.g., arrays) Code patterns, scheduling policies brittle CMSC 330 17 MPI C Example int count_primes (int n) { MPI_Init ( &argc, &argv ); But see: http://www.dursi.ca/hpc-isdying-and-mpi-is-killing-it/ MPI_Comm_size ( MPI_COMM_WORLD, &p ); // p = total # proc MPI_Comm_rank ( MPI_COMM_WORLD, &id ); // id = my proc # if ( id == 0 ) { printf ( " # of processes is %d\n", p ); MPI_Bcast ( &n, 1, MPI_INT, 0, MPI_COMM_WORLD ); // broadcast n primes_part = prime_number ( n, id, p ); // do my portion of work MPI_Reduce ( &primes_part, &primes, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD ); // global sum for # primes if ( id == 0 ) { printf ( " %d %d \n", n, primes); // proc 0 prints answer MPI_Finalize ( ); SPMD model (single program multiple data) all processors execute same program CMSC 330 18 OpenMP C Example int count_primes (int n) { int i, j, prime, total = 0; # pragma omp parallel shared ( n ) private ( i, j, prime ) # pragma omp for reduction ( + : total ) for ( i = 2; i <= n; i++ ) { prime = 1; for ( j = 2; j < i; j++ ) { if ( i % j == 0 ) { prime = 0; break; total = total + prime; return total; Fork-join model (main thread assigns iterations of parallel loop to workers) CMSC 330 19 Two Directions To A Solution Start with clean, but limited, languages/ abstractions and generalize MapReduce (Google, 2004) StreamIt (MIT, 2002) Cilk (MIT, 1994) Start with full-featured languages and add cleanliness Software transactional memory Static analyzers (Locksmith, Chord, ) Threaded Building Blocks (Intel) CMSC 330 20

Cleanliness Space of Solutions most clean Research Used in practice MapReduce Dryad Yet to come? StreamIt CML Cilk TBB+threads STM+threads OpenMP MPI POSIX threads most flexible Flexibility CMSC 330 21 Kinds of Parallelism Data parallelism Can divide parts of the data between different tasks and perform the same action on each part in parallel Task parallelism Different tasks running on the same data Hybrid data/task parallelism A parallel pipeline of tasks, each of which might be data parallel Unstructured Ad hoc combination of threads with no obvious toplevel structure CMSC 330 22 MapReduce: Programming the Pipeline Pattern inspired by Lisp, ML, etc. Many problems can be phrased this way Results in clean code Easy to program / debug / maintain! Simple programming model! Nice retry / failure semantics Efficient and portable! Easy to distribute across nodes Map & Reduce map f list fold_right in OCaml map square [1; 2; 3; 4] [1; 4; 9; 16] fold_right (+) [1; 4; 9; 16] 0 (1 + (4 + (9 + (16 + 0) ) ) ) 30 fold_right (+) (map square [1; 2; 3; 4]) 0 Thanks to Google, Inc. for some of the slides that follow CMSC 330 23 CMSC 330 24

MapReduce a la Google map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output Count Words In Documents Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts): Sum all 1 s in values list Emit result (word, sum) CMSC 330 25 CMSC 330 26 Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts): Sum all 1 s in values list Emit result (word, sum) Execution see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1 CMSC 330 27 CMSC 330 28

Parallel Execution data parallelism Model Is Widely Applicable MapReduce Programs In Google Source Tree 2004 pipeline parallelism Example uses: distributed grep distributed sort web link-graph reversal term-vector / host web access log stats inverted index construction Key: no implicit depencies between map or reduce tasks CMSC 330 29 document clustering machine learning clustering for Google News popularity for Google Trs... statistical machine translation for Google Translate CMSC 330 30 The Programming Model Is Key Simple control makes depencies evident Can automate scheduling of tasks and optimization! Map, reduce for different keys, embarassingly parallel! Pipeline between mappers, reducers evident map and reduce are pure functions Can rerun them to get the same answer! In the case of failure, or! To use idle resources toward faster completion No worry about data races, deadlocks, etc. since there is no shared state Compare to Dedicated Supercomputers According to Wikipedia, in 2009 Google uses 450,000 servers (2006), mostly commodity Intel boxes 2TB drive per server, at least 16GB memory per machine More recent details are kept secret by Google More computing power than even the most powerful supercomputer CMSC 330 31 CMSC 330 32

Apache Hadoop Open source framework for large scale Storage based on Google File System Computation based on MapReduce Mostly written in Java Widely used commercially Used by over half of Fortune 50 (largest) companies Applications include marketing analytics, machine learning, web crawling, image processing Yahoo (2008) used 10K core system for web indexing Facebook (2012) uses system to manage 100+ PB db CMSC 330 33