Concurrency: what, why, how May 28, 2009 1 / 33
Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and languages 2 / 33
Basic idea Dependency idea Some terms (1) Some terms (2) 3 / 33
Basic idea Basic idea Dependency idea Some terms (1) Some terms (2) intuitively simultaneous execution of instructions (with CPU pipelines) actions (functions within a program) programs (distributed application) what is simultaneous? physically at the same time? nearly at the same time? 2 threads on a one single-core CPU? 4 / 33
Dependency idea Basic idea Dependency idea Some terms (1) Some terms (2) If 2 actions do not need result of each other (independent) do not interfere otherwise e.g. do not write to the same file then order of their execution does not matter. a = c+d b = c+e the idea of dependency of actions is important 5 / 33
Some terms (1) Basic idea Dependency idea Some terms (1) Some terms (2) Not a rule, just to extend the understanding of. Parallel - execute simultaneously - the order of execution does not matter From wikipedia: Parallel computing is a form of computation in which many calculations are carried out simultaneously. computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel. sometimes referred to as pseudoparallel. 6 / 33
Some terms (2) Basic idea Dependency idea Some terms (1) Some terms (2) Haskell community variants. Parallel - deterministic data crunching simultaneous execution of the same type tasks - non-deterministic execution of unrelated communicating processes From Chapter 24. and multicore : A concurrent program needs to perform several possibly unrelated tasks at the same time. In contrast, a parallel program solves a single problem. 7 / 33
List of reasons Some analogies Faster programs Hiding latency Better structure 8 / 33
List of reasons List of reasons Some analogies Faster programs Hiding latency Better structure Faster programs running on several cores/cpus/computers More responsive programs GUI interface hiding disk/network latency Programs with natural distributed programs (client-server, etc) Fault tolerant programs using redundancy Better structured programs 9 / 33
Some analogies List of reasons Some analogies Faster programs Hiding latency Better structure Speed of a process With 1 axe one friend can chop wood and the other collect it With 2 axes both friends can chop wood in parallel Hiding latency When we turn on a kettle we do not wait until it boils e.g. we go and take out cups from cupboard then return to the kettle Better structured doing ironing and cooking concurrently is messy assign to different people 10 / 33
Faster programs List of reasons Some analogies Faster programs Hiding latency Better structure Calculate elements of an array in parallel Perform calculations on several processors/nodes Serving youtube videos from multiple servers End of Moore s law The number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years Every new laptop comes with (at least) dual core technology usually stuck with 50% CPU usage 11 / 33
Hiding latency List of reasons Some analogies Faster programs Hiding latency Better structure disk/network take time either work asynchronously dedicated thread 12 / 33
Better structure List of reasons Some analogies Faster programs Hiding latency Better structure Assign different threads to unrelated tasks (if reasonable) Data sharing server vertically, one thread per request horizontally (conveier) dedicated thread(s) for reading requests dedicated thread(s) for searching data new thread for sending data Mixing tasks of all threads in one thread asynchronous behavior structural nightmare 13 / 33
Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model 14 / 33
Task and data Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Task : different operations concurrently calculate g and h in f(g(x), h(y)) concurrently threads in the same program several programs running on the same computer Data : same operation for different data (SIMD) loop operations: forall i=1..n do a[i]=a[i]+1 vectorised operations: MMX, SSE, etc A program may benefit from both! 15 / 33
Coarse and fine grained Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Ratio of computation and communication coarse-grain parallel programs compute most of the time e.g distribute data, calculate, collect result (Google MapReduce) fine-grain parallel programs communicate frequently lots of dependencies between distributed data medium-grained DOUG: lots of computation interchange with lots of communication 16 / 33
High and low level Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Different granularity (unit of ) instruction-level conveiers and pipelines in CPU; MMX expression level run expression in separate thread function level process level Source of confusion: this sometimes referred as fine/coarse grained. 17 / 33
(1) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Models and languages for Parallel Computation; David B. Skillicorn, Domenico Talia; 1998 Parallelism explicit (hints for possible ) Loops: forall i in 1..N do a[i]=i Fortran 90 matrix sum: C=A+B Decomposition explicit (specify parallel pieces) Mapping explicit (map pieces to processors) Communication explicit (specify sends/recvs) Synchronization explicit (handle details of message-passing) 18 / 33
(2) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Possibilities 1. nothing explicit, (OBJ, P3L) 2. explicit, decomposition Loops - Fortran variants, Id, APL, NESL 3. decomposition explicit, mapping (BSP, LogP) 4. mapping explicit, communication (Linda) 5. communication explicit, synchronization Actors, smalltalk 6. everything explicit PVM, MPI, fork 19 / 33
Formalizations Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model How to desribe (concurrent) computations? operational semantics describe operations in Virtual Machine (VM) Oz way reasoning for a programmer denotational semantics describe algebraic rules concurrent lambda calculus, Pi calculus, CSP, Petri nets, DDA (Data Dependency Algebra) reasoning for a matematician axiomatic semantics describe logical rules TLA (Temporal Logic of Actions) reasoning for a machine (a prover) 20 / 33
By application areas Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Scientific computing High-Performance Computing (HPC) High-Throughput Computing (HTC) Distributed applications clients, servers P2P telephone stations (Erlang PL) Desktop applications responsive user interfaces utilizing multiple cores 21 / 33
By computation model Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model What style of is supported? Declarative concurrent model (pure) functional logical Message-passing model synchronous, asynchronous, RPC active objects, passive objects Shared-state (shared memory) model locks transactions 22 / 33
Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 23 / 33
Why language? Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Why not just library? cleaner syntax forces usage patterns control over compilation process In 198x there were hundreds PLs for concurrent, now there are thousands. the following slides describe some languages 24 / 33
Oz Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct roots in logic dataflow variables (logical variables with suspension) multiparadigm (advertises different styles of ) functional object oriented constraint (logic) explicit task (thread statement) explicit and communication (through dataflow variables) for distributed and desktop 25 / 33
Erlang Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Ericsson project from ~1990 for telecom applications handle thousands of phone calls robustness, distribution Concurrency processes with message-passing (actors) focus on fault tolerance loop(state) -> receive {circle, R} -> io:format("area is ~p~n", [3.14*R*R]), loop(state+1) {rectangle, Width, Ht} ->... 26 / 33
Scala Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic interoperable with Java (runs on JVM) syntax similar to Java object oriented, functional, etc static typing Concurrency task processes with message-passing (actors) 27 / 33
Clojure Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic targets the Java Virtual Machine Lisp syntax functional, macro Concurrency task reactive Agent system software transactional memory 28 / 33
High-Performance Fortran Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1993, extension of Fortran 90 Concurrency data REAL A(16,16),B(14,14)!HPF$ ALIGN B(I,J) WITH A(I+1,J+1)!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()/3,3)!HPF$ DISTRIBUTE A(CYCLIC,BLOCK) ONTO P 29 / 33
NESL Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1995 available only on rare platforms a way to handle nested data sparse matrice storage in quicksort algorithm Concurrency nested data 30 / 33
and Parallel Haskell Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Parallel Haskell with par and pseq deterministic speculative execution Haskell with forkio locks, monitors, etc synchronization variables MVars STM (software transactional memory) with atomically and more: mhaskell Data Parallel Haskell with parallel arrays NDP (nested data ) 31 / 33
Intel TBB Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel Thread Building Blocks recent C++ library Concurrency task 32 / 33
Ct Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel C for Throughput Computing compiler not yet publicly available Concurrency immutable data (declarative model) (nested) data 33 / 33