Concurrency: what, why, how

Size: px
Start display at page:

Download "Concurrency: what, why, how"

Transcription

1 Concurrency: what, why, how May 28, / 33

2 Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and languages 2 / 33

3 Basic idea Dependency idea Some terms (1) Some terms (2) 3 / 33

4 Basic idea Basic idea Dependency idea Some terms (1) Some terms (2) intuitively simultaneous execution of instructions (with CPU pipelines) actions (functions within a program) programs (distributed application) what is simultaneous? physically at the same time? nearly at the same time? 2 threads on a one single-core CPU? 4 / 33

5 Dependency idea Basic idea Dependency idea Some terms (1) Some terms (2) If 2 actions do not need result of each other (independent) do not interfere otherwise e.g. do not write to the same file then order of their execution does not matter. a = c+d b = c+e the idea of dependency of actions is important 5 / 33

6 Some terms (1) Basic idea Dependency idea Some terms (1) Some terms (2) Not a rule, just to extend the understanding of. Parallel - execute simultaneously - the order of execution does not matter From wikipedia: Parallel computing is a form of computation in which many calculations are carried out simultaneously. computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel. sometimes referred to as pseudoparallel. 6 / 33

7 Some terms (2) Basic idea Dependency idea Some terms (1) Some terms (2) Haskell community variants. Parallel - deterministic data crunching simultaneous execution of the same type tasks - non-deterministic execution of unrelated communicating processes From Chapter 24. and multicore : A concurrent program needs to perform several possibly unrelated tasks at the same time. In contrast, a parallel program solves a single problem. 7 / 33

8 List of reasons Some analogies Faster programs Hiding latency Better structure 8 / 33

9 List of reasons List of reasons Some analogies Faster programs Hiding latency Better structure Faster programs running on several cores/cpus/computers More responsive programs GUI interface hiding disk/network latency Programs with natural distributed programs (client-server, etc) Fault tolerant programs using redundancy Better structured programs 9 / 33

10 Some analogies List of reasons Some analogies Faster programs Hiding latency Better structure Speed of a process With 1 axe one friend can chop wood and the other collect it With 2 axes both friends can chop wood in parallel Hiding latency When we turn on a kettle we do not wait until it boils e.g. we go and take out cups from cupboard then return to the kettle Better structured doing ironing and cooking concurrently is messy assign to different people 10 / 33

11 Faster programs List of reasons Some analogies Faster programs Hiding latency Better structure Calculate elements of an array in parallel Perform calculations on several processors/nodes Serving youtube videos from multiple servers End of Moore s law The number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years Every new laptop comes with (at least) dual core technology usually stuck with 50% CPU usage 11 / 33

12 Hiding latency List of reasons Some analogies Faster programs Hiding latency Better structure disk/network take time either work asynchronously dedicated thread 12 / 33

13 Better structure List of reasons Some analogies Faster programs Hiding latency Better structure Assign different threads to unrelated tasks (if reasonable) Data sharing server vertically, one thread per request horizontally (conveier) dedicated thread(s) for reading requests dedicated thread(s) for searching data new thread for sending data Mixing tasks of all threads in one thread asynchronous behavior structural nightmare 13 / 33

14 Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model 14 / 33

15 Task and data Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Task : different operations concurrently calculate g and h in f(g(x), h(y)) concurrently threads in the same program several programs running on the same computer Data : same operation for different data (SIMD) loop operations: forall i=1..n do a[i]=a[i]+1 vectorised operations: MMX, SSE, etc A program may benefit from both! 15 / 33

16 Coarse and fine grained Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Ratio of computation and communication coarse-grain parallel programs compute most of the time e.g distribute data, calculate, collect result (Google MapReduce) fine-grain parallel programs communicate frequently lots of dependencies between distributed data medium-grained DOUG: lots of computation interchange with lots of communication 16 / 33

17 High and low level Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Different granularity (unit of ) instruction-level conveiers and pipelines in CPU; MMX expression level run expression in separate thread function level process level Source of confusion: this sometimes referred as fine/coarse grained. 17 / 33

18 (1) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Models and languages for Parallel Computation; David B. Skillicorn, Domenico Talia; 1998 Parallelism explicit (hints for possible ) Loops: forall i in 1..N do a[i]=i Fortran 90 matrix sum: C=A+B Decomposition explicit (specify parallel pieces) Mapping explicit (map pieces to processors) Communication explicit (specify sends/recvs) Synchronization explicit (handle details of message-passing) 18 / 33

19 (2) Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Possibilities 1. nothing explicit, (OBJ, P3L) 2. explicit, decomposition Loops - Fortran variants, Id, APL, NESL 3. decomposition explicit, mapping (BSP, LogP) 4. mapping explicit, communication (Linda) 5. communication explicit, synchronization Actors, smalltalk 6. everything explicit PVM, MPI, fork 19 / 33

20 Formalizations Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model How to desribe (concurrent) computations? operational semantics describe operations in Virtual Machine (VM) Oz way reasoning for a programmer denotational semantics describe algebraic rules concurrent lambda calculus, Pi calculus, CSP, Petri nets, DDA (Data Dependency Algebra) reasoning for a matematician axiomatic semantics describe logical rules TLA (Temporal Logic of Actions) reasoning for a machine (a prover) 20 / 33

21 By application areas Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model Scientific computing High-Performance Computing (HPC) High-Throughput Computing (HTC) Distributed applications clients, servers P2P telephone stations (Erlang PL) Desktop applications responsive user interfaces utilizing multiple cores 21 / 33

22 By computation model Task and data Coarse and fine grained High and low level (1) (2) Formalizations By application areas By computation model What style of is supported? Declarative concurrent model (pure) functional logical Message-passing model synchronous, asynchronous, RPC active objects, passive objects Shared-state (shared memory) model locks transactions 22 / 33

23 Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 23 / 33

24 Why language? Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Why not just library? cleaner syntax forces usage patterns control over compilation process In 198x there were hundreds PLs for concurrent, now there are thousands. the following slides describe some languages 24 / 33

25 Oz Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct roots in logic dataflow variables (logical variables with suspension) multiparadigm (advertises different styles of ) functional object oriented constraint (logic) explicit task (thread statement) explicit and communication (through dataflow variables) for distributed and desktop 25 / 33

26 Erlang Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Ericsson project from ~1990 for telecom applications handle thousands of phone calls robustness, distribution Concurrency processes with message-passing (actors) focus on fault tolerance loop(state) -> receive {circle, R} -> io:format("area is ~p~n", [3.14*R*R]), loop(state+1) {rectangle, Width, Ht} -> / 33

27 Scala Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic interoperable with Java (runs on JVM) syntax similar to Java object oriented, functional, etc static typing Concurrency task processes with message-passing (actors) 27 / 33

28 Clojure Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct 2008 year hot topic targets the Java Virtual Machine Lisp syntax functional, macro Concurrency task reactive Agent system software transactional memory 28 / 33

29 High-Performance Fortran Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1993, extension of Fortran 90 Concurrency data REAL A(16,16),B(14,14)!HPF$ ALIGN B(I,J) WITH A(I+1,J+1)!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()/3,3)!HPF$ DISTRIBUTE A(CYCLIC,BLOCK) ONTO P 29 / 33

30 NESL Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct since 1995 available only on rare platforms a way to handle nested data sparse matrice storage in quicksort algorithm Concurrency nested data 30 / 33

31 and Parallel Haskell Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Parallel Haskell with par and pseq deterministic speculative execution Haskell with forkio locks, monitors, etc synchronization variables MVars STM (software transactional memory) with atomically and more: mhaskell Data Parallel Haskell with parallel arrays NDP (nested data ) 31 / 33

32 Intel TBB Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel Thread Building Blocks recent C++ library Concurrency task 32 / 33

33 Ct Why language? Oz Erlang Scala Clojure High- Performance Fortran NESL and Parallel Haskell Intel TBB Ct Intel C for Throughput Computing compiler not yet publicly available Concurrency immutable data (declarative model) (nested) data 33 / 33

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how Oleg Batrashev February 10, 2014 what this course is about? additional experience with try out some language concepts and techniques Grading java.util.concurrent.future? agents,

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

All routines were built with VS2010 compiler, OpenMP 2.0 and TBB 3.0 libraries were used to implement parallel versions of programs.

All routines were built with VS2010 compiler, OpenMP 2.0 and TBB 3.0 libraries were used to implement parallel versions of programs. technologies for multi-core numeric computation In order to compare ConcRT, OpenMP and TBB technologies, we implemented a few algorithms from different areas of numeric computation and compared their performance

More information

Multicore programming in Haskell. Simon Marlow Microsoft Research

Multicore programming in Haskell. Simon Marlow Microsoft Research Multicore programming in Haskell Simon Marlow Microsoft Research A concurrent web server server :: Socket -> IO () server sock = forever (do acc

More information

Paradigms of computer programming

Paradigms of computer programming Paradigms of computer programming Louv1.1x and Louv1.2x form a two-course sequence Together they teach programming as a unified discipline that covers all programming languages Second-year university level:

More information

Declarative concurrency. March 3, 2014

Declarative concurrency. March 3, 2014 March 3, 2014 (DP) what is declarativeness lists, trees iterative comutation recursive computation (DC) DP and DC in Haskell and other languages 2 / 32 Some quotes What is declarativeness? ness is important

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh.

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh. Scientific Computing Aachen, 6 Feb 2014 navid.abbaszadeh@rwth-aachen.de Overview Trends Introduction Paradigms, Data Structures, Syntax Compilation & Execution Concurrency Model Reference Types Performance

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

CSC630/COS781: Parallel & Distributed Computing

CSC630/COS781: Parallel & Distributed Computing CSC630/COS781: Parallel & Distributed Computing Algorithm Design Chapter 3 (3.1-3.3) 1 Contents Preliminaries of parallel algorithm design Decomposition Task dependency Task dependency graph Granularity

More information

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique Parallelism Master 1 International Andrea G. B. Tettamanzi Université de Nice Sophia Antipolis Département Informatique andrea.tettamanzi@unice.fr Andrea G. B. Tettamanzi, 2014 1 Lecture 5, Part a Languages

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

General Overview of Mozart/Oz

General Overview of Mozart/Oz General Overview of Mozart/Oz Peter Van Roy pvr@info.ucl.ac.be 2004 P. Van Roy, MOZ 2004 General Overview 1 At a Glance Oz language Dataflow concurrent, compositional, state-aware, object-oriented language

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 2 2. Why are there so many programming languages?... 2 3. What makes a language successful?... 2 4. Programming Domains... 3 5. Language and Computer Architecture... 4 6.

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Problems with Concurrency. February 19, 2014

Problems with Concurrency. February 19, 2014 with Concurrency February 19, 2014 s with concurrency interleavings race conditions dead GUI source of s non-determinism deterministic execution model 2 / 30 General ideas Shared variable Access interleavings

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

Programming Paradigms

Programming Paradigms PP 2017/18 Unit 15 Concurrent Programming with Erlang 1/32 Programming Paradigms Unit 15 Concurrent Programming with Erlang J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE PP

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Parallel Languages: Past, Present and Future

Parallel Languages: Past, Present and Future Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One

More information

Advances in Programming Languages

Advances in Programming Languages O T Y H Advances in Programming Languages APL5: Further language concurrency mechanisms David Aspinall (including slides by Ian Stark) School of Informatics The University of Edinburgh Tuesday 5th October

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

Com S 541. Programming Languages I

Com S 541. Programming Languages I Programming Languages I Lecturer: TA: Markus Lumpe Department of Computer Science 113 Atanasoff Hall http://www.cs.iastate.edu/~lumpe/coms541.html TR 12:40-2, W 5 Pramod Bhanu Rama Rao Office hours: TR

More information

Overview. Distributed Computing with Oz/Mozart (VRH 11) Mozart research at a glance. Basic principles. Language design.

Overview. Distributed Computing with Oz/Mozart (VRH 11) Mozart research at a glance. Basic principles. Language design. Distributed Computing with Oz/Mozart (VRH 11) Carlos Varela RPI March 15, 2007 Adapted with permission from: Peter Van Roy UCL Overview Designing a platform for robust distributed programming requires

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

The State of Parallel Programming. Burton Smith Technical Fellow Microsoft Corporation

The State of Parallel Programming. Burton Smith Technical Fellow Microsoft Corporation The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation 1 Parallel computing is mainstream Uniprocessors are reaching their performance limits More transistors per core increases

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than

More information

Parallel Functional Programming Lecture 1. John Hughes

Parallel Functional Programming Lecture 1. John Hughes Parallel Functional Programming Lecture 1 John Hughes Moore s Law (1965) The number of transistors per chip increases by a factor of two every year two years (1975) Number of transistors What shall we

More information

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Poor co-ordination that exists in threads on JVM is bottleneck

More information

A Survey of Concurrency Constructs. Ted Leung Sun

A Survey of Concurrency Constructs. Ted Leung Sun A Survey of Concurrency Constructs Ted Leung Sun Microsystems ted.leung@sun.com @twleung 16 threads 128 threads Today s model Threads Program counter Own stack Shared Memory Locks Some of the problems

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 7a Andrew Tolmach Portland State University 1994-2016 Values and Types We divide the universe of values according to types A type is a set of values and a

More information

Multi-core Parallelization in Clojure - a Case Study

Multi-core Parallelization in Clojure - a Case Study Multi-core Parallelization in Clojure - a Case Study Johann M. Kraus and Hans A. Kestler AG Bioinformatics and Systems Biology Institute of Neural Information Processing University of Ulm 29.06.2009 Outline

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

StreamBox: Modern Stream Processing on a Multicore Machine

StreamBox: Modern Stream Processing on a Multicore Machine StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu

More information

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1 CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility Informal Semantics of Data semantic specification names (identifiers) attributes binding declarations scope rules visibility 1 Ways to Specify Semantics Standards Documents (Language Definition) Language

More information

General introduction: GPUs and the realm of parallel architectures

General introduction: GPUs and the realm of parallel architectures General introduction: GPUs and the realm of parallel architectures GPU Computing Training August 17-19 th 2015 Jan Lemeire (jan.lemeire@vub.ac.be) Graduated as Engineer in 1994 at VUB Worked for 4 years

More information

The Actor Model. Towards Better Concurrency. By: Dror Bereznitsky

The Actor Model. Towards Better Concurrency. By: Dror Bereznitsky The Actor Model Towards Better Concurrency By: Dror Bereznitsky 1 Warning: Code Examples 2 Agenda Agenda The end of Moore law? Shared state concurrency Message passing concurrency Actors on the JVM More

More information

Scalable Shared Memory Programing

Scalable Shared Memory Programing Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

CS 242. Fundamentals. Reading: See last slide

CS 242. Fundamentals. Reading: See last slide CS 242 Fundamentals Reading: See last slide Syntax and Semantics of Programs Syntax The symbols used to write a program Semantics The actions that occur when a program is executed Programming language

More information

Message Passing. Advanced Operating Systems Tutorial 7

Message Passing. Advanced Operating Systems Tutorial 7 Message Passing Advanced Operating Systems Tutorial 7 Tutorial Outline Review of Lectured Material Discussion: Erlang and message passing 2 Review of Lectured Material Message passing systems Limitations

More information

Reconfigurable Computing. Introduction

Reconfigurable Computing. Introduction Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally

More information

ECE/CS 250 Computer Architecture. Summer 2016

ECE/CS 250 Computer Architecture. Summer 2016 ECE/CS 250 Computer Architecture Summer 2016 Multicore Dan Sorin and Tyler Bletsch Duke University Multicore and Multithreaded Processors Why multicore? Thread-level parallelism Multithreaded cores Multiprocessors

More information

Linux multi-core scalability

Linux multi-core scalability Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation

More information

Functional Programming Lecture 13: FP in the Real World

Functional Programming Lecture 13: FP in the Real World Functional Programming Lecture 13: FP in the Real World Viliam Lisý Artificial Intelligence Center Department of Computer Science FEE, Czech Technical University in Prague viliam.lisy@fel.cvut.cz 1 Mixed

More information

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel COMP528 Ways of exploiting parallelism, or thinking parallel www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Thinking parallel

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Moore s Law. Computer architect goal Software developer assumption

Moore s Law. Computer architect goal Software developer assumption Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Lecture 28 Multicore, Multithread Suggested reading: (H&P Chapter 7.4) Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain

More information

Ohua: Implicit Dataflow Programming for Concurrent Systems

Ohua: Implicit Dataflow Programming for Concurrent Systems Ohua: Implicit Dataflow Programming for Concurrent Systems Sebastian Ertel Compiler Construction Group TU Dresden, Germany Christof Fetzer Systems Engineering Group TU Dresden, Germany Pascal Felber Institut

More information

Functional Programming

Functional Programming The Meta Language (ML) and Functional Programming Daniel S. Fava danielsf@ifi.uio.no Department of informatics University of Oslo, Norway Motivation ML Demo Which programming languages are functional?

More information

Parallel and Distributed Computing (PD)

Parallel and Distributed Computing (PD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Parallel and Distributed Computing (PD) The past decade has brought explosive growth in multiprocessor computing, including multi-core

More information

Concurrent ML. John Reppy January 21, University of Chicago

Concurrent ML. John Reppy January 21, University of Chicago Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Chris Kauffman CS 499: Spring 2016 GMU Goals Motivate: Parallel Programming Overview concepts a bit Discuss course mechanics Moore s Law Smaller transistors closer together

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

Functional Programming Lecture 1: Introduction

Functional Programming Lecture 1: Introduction Functional Programming Lecture 1: Introduction Viliam Lisý Artificial Intelligence Center Department of Computer Science FEE, Czech Technical University in Prague viliam.lisy@fel.cvut.cz Acknowledgements

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

Abstraction: Distributed Ledger

Abstraction: Distributed Ledger Bitcoin 2 Abstraction: Distributed Ledger 3 Implementation: Blockchain this happened this happened this happen hashes & signatures hashes & signatures hashes signatu 4 Implementation: Blockchain this happened

More information

Stop coding Pascal. Saturday, April 6, 13

Stop coding Pascal. Saturday, April 6, 13 Stop coding Pascal...emotional sketch about past, present and future of programming languages, Python, compilers, developers, Life, Universe and Everything Alexey Kachayev CTO at KitApps Inc. Open source

More information

Message Passing. Frédéric Haziza Summer Department of Computer Systems Uppsala University

Message Passing. Frédéric Haziza Summer Department of Computer Systems Uppsala University Message Passing Frédéric Haziza Department of Computer Systems Uppsala University Summer 2009 MultiProcessor world - Taxonomy SIMD MIMD Message Passing Shared Memory Fine-grained Coarse-grained

More information

Computer Architecture Crash course

Computer Architecture Crash course Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

7. System Design: Addressing Design Goals

7. System Design: Addressing Design Goals 7. System Design: Addressing Design Goals Outline! Overview! UML Component Diagram and Deployment Diagram! Hardware Software Mapping! Data Management! Global Resource Handling and Access Control! Software

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing

More information

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical

More information

Compiler Design Spring 2018

Compiler Design Spring 2018 Compiler Design Spring 2018 Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Logistics Lecture Tuesdays: 10:15 11:55 Thursdays: 10:15 -- 11:55 In ETF E1 Recitation Announced later

More information

Lecture 3: Intro to parallel machines and models

Lecture 3: Intro to parallel machines and models Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

Shared-Memory Programming Models

Shared-Memory Programming Models Shared-Memory Programming Models Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube Cilk C language combined with several new keywords Different approach to OpenMP

More information

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave 6.827 Multithreaded Parallelism: Languages and Compilers Fall 2006 Lecturer: TA: Assistant: Arvind Nirav Dave Sally Lee L01-1 IBM Power 5 130nm SOI CMOS with Cu 389mm 2 2GHz 276 million transistors Dual

More information

Parallella: A $99 Open Hardware Parallel Computing Platform

Parallella: A $99 Open Hardware Parallel Computing Platform Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA Adapteva Achieves 3 World Firsts 1. First

More information

What are compilers? A Compiler

What are compilers? A Compiler What are compilers? Dr. Barbara G. Ryder Dept of Computer Science ryder@cs.rutgers.edu http://www.cs.rutgers.edu/~ryder 1 A Compiler A program that translates computer programs that people write, into

More information

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel Parallel Programming Environments Presented By: Anand Saoji Yogesh Patel Outline Introduction How? Parallel Architectures Parallel Programming Models Conclusion References Introduction Recent advancements

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline

More information

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism

More information

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models

More information

Programming Languages, Summary CSC419; Odelia Schwartz

Programming Languages, Summary CSC419; Odelia Schwartz Programming Languages, Summary CSC419; Odelia Schwartz Chapter 1 Topics Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation Criteria Influences on Language Design

More information

Kernel Synchronization I. Changwoo Min

Kernel Synchronization I. Changwoo Min 1 Kernel Synchronization I Changwoo Min 2 Summary of last lectures Tools: building, exploring, and debugging Linux kernel Core kernel infrastructure syscall, module, kernel data structures Process management

More information

DATA PARALLEL PROGRAMMING IN HASKELL

DATA PARALLEL PROGRAMMING IN HASKELL DATA PARALLEL PROGRAMMING IN HASKELL An Overview Manuel M T Chakravarty University of New South Wales INCLUDES JOINT WORK WITH Gabriele Keller Sean Lee Roman Leshchinskiy Ben Lippmeier Trevor McDonell

More information

Functional Programming Principles in Scala. Martin Odersky

Functional Programming Principles in Scala. Martin Odersky Functional Programming Principles in Scala Martin Odersky Programming Paradigms Paradigm: In science, a paradigm describes distinct concepts or thought patterns in some scientific discipline. Main programming

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

Transactifying Apache s Cache Module

Transactifying Apache s Cache Module H. Eran O. Lutzky Z. Guz I. Keidar Department of Electrical Engineering Technion Israel Institute of Technology SYSTOR 2009 The Israeli Experimental Systems Conference Outline 1 Why legacy applications

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

All you need is fun. Cons T Åhs Keeper of The Code

All you need is fun. Cons T Åhs Keeper of The Code All you need is fun Cons T Åhs Keeper of The Code cons@klarna.com Cons T Åhs Keeper of The Code at klarna Architecture - The Big Picture Development - getting ideas to work Code Quality - care about the

More information

Functional Programming Patterns And Their Role Instructions

Functional Programming Patterns And Their Role Instructions Functional Programming Patterns And Their Role Instructions In fact, the relabelling function is precisely the same as before! Phil Wadler's Chapter 7 of The Implementation of Functional Programming Languages.

More information

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Architectural Styles Software Architecture Lecture 5 Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Object-Oriented Style Components are objects Data and associated

More information