Concurrency: what, why, how

Similar documents
Concurrency: what, why, how

Trends and Challenges in Multicore Programming

All routines were built with VS2010 compiler, OpenMP 2.0 and TBB 3.0 libraries were used to implement parallel versions of programs.

Multicore programming in Haskell. Simon Marlow Microsoft Research

Paradigms of computer programming

Declarative concurrency. March 3, 2014

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh.

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

CSC630/COS781: Parallel & Distributed Computing

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Online Course Evaluation. What we will do in the last week?

WHY PARALLEL PROCESSING? (CE-401)

General Overview of Mozart/Oz

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Introduction. A. Bellaachia Page: 1

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Problems with Concurrency. February 19, 2014

Parallel Computing Why & How?

Programming Paradigms

Parallelism. CS6787 Lecture 8 Fall 2017

Parallel Languages: Past, Present and Future

Advances in Programming Languages

Parallelism Marco Serafini

Com S 541. Programming Languages I

Overview. Distributed Computing with Oz/Mozart (VRH 11) Mozart research at a glance. Basic principles. Language design.

Review of previous examinations TMA4280 Introduction to Supercomputing

The State of Parallel Programming. Burton Smith Technical Fellow Microsoft Corporation

CS 426 Parallel Computing. Parallel Computing Platforms

Programming Models for Supercomputing in the Era of Multicore

Introduction to High-Performance Computing

Parallel Functional Programming Lecture 1. John Hughes

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

A Survey of Concurrency Constructs. Ted Leung Sun

CS558 Programming Languages

Multi-core Parallelization in Clojure - a Case Study

High-Performance Scientific Computing

StreamBox: Modern Stream Processing on a Multicore Machine

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

General introduction: GPUs and the realm of parallel architectures

The Actor Model. Towards Better Concurrency. By: Dror Bereznitsky

Scalable Shared Memory Programing

High Performance Computing on GPUs using NVIDIA CUDA

CS 242. Fundamentals. Reading: See last slide

Message Passing. Advanced Operating Systems Tutorial 7

Reconfigurable Computing. Introduction

ECE/CS 250 Computer Architecture. Summer 2016

Linux multi-core scalability

Functional Programming Lecture 13: FP in the Real World

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel

Parallel and High Performance Computing CSE 745

Moore s Law. Computer architect goal Software developer assumption

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Ohua: Implicit Dataflow Programming for Concurrent Systems

Functional Programming

Parallel and Distributed Computing (PD)

Concurrent ML. John Reppy January 21, University of Chicago

Introduction to Parallel Computing

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Functional Programming Lecture 1: Introduction

An Introduction to Parallel Programming

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Abstraction: Distributed Ledger

Stop coding Pascal. Saturday, April 6, 13

Message Passing. Frédéric Haziza Summer Department of Computer Systems Uppsala University

Computer Architecture Crash course

Parallel Algorithm Engineering

7. System Design: Addressing Design Goals

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Compiler Design Spring 2018

Lecture 3: Intro to parallel machines and models

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Understanding Hardware Transactional Memory

Shared-Memory Programming Models

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave

Parallella: A $99 Open Hardware Parallel Computing Platform

What are compilers? A Compiler

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel

EE/CSCI 451: Parallel and Distributed Computation

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

Programming Languages, Summary CSC419; Odelia Schwartz

Kernel Synchronization I. Changwoo Min

DATA PARALLEL PROGRAMMING IN HASKELL

Functional Programming Principles in Scala. Martin Odersky

High Performance Computing. University questions with solution

Transactifying Apache s Cache Module

Computer Architecture

All you need is fun. Cons T Åhs Keeper of The Code

Functional Programming Patterns And Their Role Instructions

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

Transcription:

Concurrency: what, why, how May 28, 2009 1 / 33

Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and languages 2 / 33

Basic idea Dependency idea Some terms (1) Some terms (2) 3 / 33

Basic idea Basic idea Dependency idea Some terms (1) Some terms (2) intuitively simultaneous execution of instructions (with CPU pipelines) actions (functions within a program) programs (distributed application) what is simultaneous? physically at the same time? nearly at the same time? 2 threads on a one single-core CPU? 4 / 33

Dependency idea Basic idea Dependency idea Some terms (1) Some terms (2) If 2 actions do not need result of each other (independent) do not interfere otherwise e.g. do not write to the same file then order of their execution does not matter. a = c+d b = c+e the idea of dependency of actions is important 5 / 33

Some terms (1) Basic idea Dependency idea Some terms (1) Some terms (2) Not a rule, just to extend the understanding of. Parallel - execute simultaneously - the order of execution does not matter From wikipedia: Parallel computing is a form of computation in which many calculations are carried out simultaneously. computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel. sometimes referred to as pseudoparallel. 6 / 33

Some terms (2) Basic idea Dependency idea Some terms (1) Some terms (2) Haskell community variants. Parallel - deterministic data crunching simultaneous execution of the same type tasks - non-deterministic execution of unrelated communicating processes From Chapter 24. and multicore : A concurrent program needs to perform several possibly unrelated tasks at the same time. In contrast, a parallel program solves a single problem. 7 / 33

List of reasons Some analogies Faster programs Hiding latency Better structure 8 / 33

List of reasons List of reasons Some analogies Faster programs Hiding latency Better structure Faster programs running on several cores/cpus/computers More responsive programs GUI interface hiding disk/network latency Programs with natural distributed programs (client-server, etc) Fault tolerant programs using redundancy Better structured programs 9 / 33

Some analogies List of reasons Some analogies Faster programs Hiding latency Better structure Speed of a process With 1 axe one friend can chop wood and the other collect it With 2 axes both friends can chop wood in parallel Hiding latency When we turn on a kettle we do not wait until it boils e.g. we go and take out cups from cupboard then return to the kettle Better structured doing ironing and cooking concurrently is messy assign to different people 10 / 33

Faster programs List of reasons Some analogies Faster programs Hiding latency Better structure Calculate elements of an array in parallel Perform calculations on several processors/nodes Serving youtube videos from multiple servers End of Moore s law The number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years Every new laptop comes with (at least) dual core technology usually stuck with 50% CPU usage 11 / 33

Hiding latency List of reasons Some analogies Faster programs Hiding latency Better structure disk/network take time either work asynchronously dedicated thread 12 / 33

Better structure List of reasons Some analogies Faster programs Hiding latency Better structure Assign different threads to unrelated tasks (if reasonable) Data sharing server vertically, one thread per request horizontally (conveier) dedicated thread(s) for reading requests dedicated thread(s) for searching data new thread for sending data Mixing tasks of all threads in one thread asynchronous behavior structural nightmare 13 / 33

Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model 14 / 33

Task and data Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Task : different operations concurrently calculate g and h in f(g(x), h(y)) concurrently threads in the same program several programs running on the same computer Data : same operation for different data (SIMD) loop operations: forall i=1..n do a[i]=a[i]+1 vectorised operations: MMX, SSE, etc A program may benefit from both! 15 / 33

Coarse and fine grained Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Ratio of computation and communication coarse-grain parallel programs compute most of the time e.g distribute data, calculate, collect result (Google MapReduce) fine-grain parallel programs communicate frequently lots of dependencies between distributed data medium-grained DOUG: lots of computation interchange with lots of communication 16 / 33

High and low level Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Different granularity (unit of ) instruction-level conveiers and pipelines in CPU; MMX expression level run expression in separate thread function level process level Source of confusion: this sometimes referred as fine/coarse grained. 17 / 33

(1) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Models and languages for Parallel Computation; David B. Skillicorn, Domenico Talia; 1998 Parallelism explicit (hints for possible ) Loops: forall i in 1..N do a[i]=i Fortran 90 matrix sum: C=A+B Decomposition explicit (specify parallel pieces) Mapping explicit (map pieces to processors) Communication explicit (specify sends/recvs) Synchronization explicit (handle details of message-passing) 18 / 33

(2) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Possibilities 1. nothing explicit, (OBJ, P3L) 2. explicit, decomposition Loops - Fortran variants, Id, APL, NESL 3. decomposition explicit, mapping (BSP, LogP) 4. mapping explicit, communication (Linda) 5. communication explicit, synchronization Actors, smalltalk 6. everything explicit PVM, MPI, fork 19 / 33

Formalizations Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model How to desribe (concurrent) computations? operational semantics describe operations in Virtual Machine (VM) Oz way reasoning for a programmer denotational semantics describe algebraic rules concurrent lambda calculus, Pi calculus, CSP, Petri nets, DDA (Data Dependency Algebra) reasoning for a matematician axiomatic semantics describe logical rules TLA (Temporal Logic of Actions) reasoning for a machine (a prover) 20 / 33

By application areas Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Scientific computing High-Performance Computing (HPC) High-Throughput Computing (HTC) Distributed applications clients, servers P2P telephone stations (Erlang PL) Desktop applications responsive user interfaces utilizing multiple cores 21 / 33

By computation model Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model What style of is supported? Declarative concurrent model (pure) functional logical Message-passing model synchronous, asynchronous, RPC active objects, passive objects Shared-state (shared memory) model locks transactions 22 / 33

Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 23 / 33

Why language? Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Why not just library? cleaner syntax forces usage patterns control over compilation process In 198x there were hundreds PLs for concurrent, now there are thousands. the following slides describe some languages 24 / 33

Oz Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct roots in logic dataflow variables (logical variables with suspension) multiparadigm (advertises different styles of ) functional object oriented constraint (logic) explicit task (thread statement) explicit and communication (through dataflow variables) for distributed and desktop 25 / 33

Erlang Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Ericsson project from ~1990 for telecom applications handle thousands of phone calls robustness, distribution Concurrency processes with message-passing (actors) focus on fault tolerance loop(state) -> receive {circle, R} -> io:format("area is ~p~n", [3.14*R*R]), loop(state+1) {rectangle, Width, Ht} ->... 26 / 33

Scala Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic interoperable with Java (runs on JVM) syntax similar to Java object oriented, functional, etc static typing Concurrency task processes with message-passing (actors) 27 / 33

Clojure Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic targets the Java Virtual Machine Lisp syntax functional, macro Concurrency task reactive Agent system software transactional memory 28 / 33

High-Performance Fortran Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1993, extension of Fortran 90 Concurrency data REAL A(16,16),B(14,14)!HPF$ ALIGN B(I,J) WITH A(I+1,J+1)!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()/3,3)!HPF$ DISTRIBUTE A(CYCLIC,BLOCK) ONTO P 29 / 33

NESL Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1995 available only on rare platforms a way to handle nested data sparse matrice storage in quicksort algorithm Concurrency nested data 30 / 33

and Parallel Haskell Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Parallel Haskell with par and pseq deterministic speculative execution Haskell with forkio locks, monitors, etc synchronization variables MVars STM (software transactional memory) with atomically and more: mhaskell Data Parallel Haskell with parallel arrays NDP (nested data ) 31 / 33

Intel TBB Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel Thread Building Blocks recent C++ library Concurrency task 32 / 33

Ct Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel C for Throughput Computing compiler not yet publicly available Concurrency immutable data (declarative model) (nested) data 33 / 33