. Programming in Chapel. Kenjiro Taura. University of Tokyo

Size: px
Start display at page:

Download ". Programming in Chapel. Kenjiro Taura. University of Tokyo"

Transcription

1 .. Programming in Chapel Kenjiro Taura University of Tokyo 1 / 44

2 Contents. 1 Chapel Chapel overview Minimum introduction to syntax Task Parallelism Locales Data parallel constructs Ranges, domains, and arrays Other nice things about Chapel 2 / 44

3 Chapel brief history 2003: started as a DARPA-funded project under HPCS program HPCS: High Productivity Computing Systems DARPA: The Defense Advanced Research Projects Agency 2008: first public release 3 / 44

4 References section numbers below refer to those in Chapel Specification (version 0.92) tutorials: concise cut-and-pastable: extensive: SC11-Chapel.tar.gz cheat sheet: 4 / 44

5 the implementation from CRAY this tutorial uses Chapel (newest as of 2012 Nov) 5 / 44

6 Compiling and running Chapel programs compile with chpl command with the following environment variables CHPL COMM CHPL TASKS CHPL COMM SUBSTRATE e.g. with bash, $ CHPL_ COMM = gasnet CHPL_ COMM_ SUBSTRATE = udp CHPL_ TASKS = fifo chpl program. chpl run the executable giving the number of nodes (locales) with -nl; the command line depends on the choice of CHPL COMM SUBSTRATE. e.g. with udp, $./a. out -nl 1 $ SSH_SERVERS =" oooka000 oooka001 "./a. out -nl 2 see Appendix 2 for more details 6 / 44

7 Hello world in Chapel proc main () { writeln (" hello "); } proc introduces a function a function called main is the entry point: writeln is a versatile print-with-newline function 7 / 44

8 Chapel programming model basics there is only one main thread (or a task ) as opposed to the SPMD model of MPI/UPC a task can create another thread at an arbitrary point (task parallelism) ( 24) begin, sync variables, sync, cobegin, coforall a node is represented as a locale object and can be used to specify task/data distribution Locales, on ( 26) objects and arrays can be remotely referenced and shared (global address space) higher-level data parallel constructs on top of them ( 25) forall 8 / 44

9 Chapel and other languages distributed memory global address arbitrary nested support space parallelism OpenMP n N/A n TBB n N/A y MPI y n n UPC y y n Chapel y y y 9 / 44

10 Primitive types int(8), int(16), int(32), int(64); similar for uint int = int(64); uint = uint(64) real(32), real(64); real = real(64); complex(64), complex(128); complex = complex(128); bool (true/false) string 10 / 44

11 Variable declaration three types of variables param, const, var param : compile time constant const : run time constant (variables initialized and never assigned again) var : general variables (variables assigned arbitrary number of times) you are advised to make constantness explicit types are automatically inferred from initializing expression param x = 2; const r = rs. getnext (); var s = 0. 0; types can/should be explicitly given when necessary param x : real = 2; var s : string; 11 / 44

12 Procedure definition begin with proc keyword; the return type is automatically inferred from returned expressions proc f(x : int ) { return x + 1; } it can/should be specified explicitly when necessary proc g( x : int ) : real { return x + 1; } in particular, it is mandatory for recursive procedures proc fib ( n : int ) : int { if ( n <2) then return 1; else return fib (n -1) + fib (n -2) ; } 12 / 44

13 For loop simplest examples of for loops: for i in 1.. n {... } var A : [1.. n] real ; for i in A. domain { A[i] = i; } a similar syntax for parallel loops (coforall task parallel loops and forall data parallel loops) more about loops later 13 / 44

14 Overview of task parallelism in Chapel begin ( 24.2) creates a new task TBB s task group.run synchronization variables ( 24.3) can be used for synchronization TBB s task group.wait cobegin ( 24.4), coforall ( 24.5), sync ( 24.6) are higher-level constructs built on top of the two 14 / 44

15 begin statement and synchronization variables example: proc fib ( n : int ) : int { if ( n <2) then return 1; else { var a : single int ; begin { a = fib (n -1) ; } const b = fib (n -2) ; return a + b; } } begin statement creates a new task executing statement variables are shared between the parent and the new task reading a synchronization variable will block until it is written a : single int;... a +... begin {... } a a = / 44

16 Synchronization variables ( 24.3) there are two types of synchronization variables (sync and single) single : write once, read many sync : a bounded buffer of capacity one 16 / 44

17 cobegin, sync cobegin { A 1,..., A n } will begin each of A 1,... A n and wait for them to finish; e.g. var a, b : int ; cobegin { a = fib (n - 1); b = fib (n - 2); } sync {... } will wait for all tasks begin ed in... to finish; e.g. const primes : [ ] bool ; sync { for i in { begin { primes [i] = is_prime (i); } } } 17 / 44

18 coforall coforall var in... {... } will begin each iteration of the loop; e.g. coforall i in { primes [i] = is_prime (i); } 18 / 44

19 Chapel and distributed memory basics Chapel does not automatically distribute tasks tasks are created at the local node they do not automatically migrate to other nodes Chapel does not automatically distribute data either variables are allocated at the local node objects and arrays are allocated at the local node it s (ultimately) the programmer who distributes tasks/data across nodes 19 / 44

20 Locale Chapel abstracts a compute node as a locale object Locales is an array of all participating nodes on statement explicitly moves a task on (locale) statement executes statement on locale 20 / 44

21 Inter-node communication in Chapel it happens as a result of: on statement (RPC style) accessing remote non-const/param variables var a; on ( Locales [1]) { a = a + 1; } referencing remote object fields class foo {} var f; on ( Locales [1]) { f = new foo (); } f.x = f.x + 1; Note: objects are assigned by references similar for arrays, but things are more complex due to value-assignment semantics (later) 21 / 44

22 Querying locales useful to understand what s going on... here is a locale you are now executing in expression.locale gives you a locale hosting that location (variable, array element etc.) locale.id is an integer identifier of the locale locale.name is a symbolic name (hostname) of the locale var a : int ; on ( Locales [1]) { writeln (" accessing a at locale ", a. locale.id, " from locale ", here.id, " (", here.name, ")"); } 22 / 44

23 A quick latency/overhead test var a = 0; on ( Locales [1]) { for i in 1.. n { a = i; } } stack network round-trip latency UDP 10G Ethernet ns OpenMPI Infiniband ns (?) OpenFabric Infiniband 3740 ns within a node 3 ns 23 / 44

24 forall : data parallel for loop forall var in... {... } will partition iterations among tasks e.g. forall i in { primes [i] = is_prime (i); } 24 / 44

25 coforall and forall coforall create a task for each iteration NG for fine grain iterations; simply may not run with some tasking layers (fifo) you may synchronize between tasks forall partitions iterations into (a small number of) tasks OK for fine grain iterations iterations may be serialized in an arbitrary way; synchronizations between iterations not allowed (may lead to deadlocks) 25 / 44

26 More about loops for x in... { statements } three kinds of loops for : serial forall : data parallel coforall : task parallel (iteration task)... can take various things including: range : 1..n domain : {1..n,1..m} array : [ 1, 2, 3, 4 ] they are all first-class entities most generally, it can take iterator : a function-like object defined by iter instead of proc any object implementing iter these() method (iterator) 26 / 44

27 Iterator syntax is similar to procedure definition (proc) it may call yield to generate the next value a trivial example: iter gen_ prime () { yield 2; yield 3; yield 5; yield 7; } for x in gen_prime () { write (x, " "); } writeln (); / 44

28 Slightly more useful iterator iter gen_fib (n : int ) { var a : int = 1, b : int = 1, c : int ; yield a; /* fib 0 */ yield b; /* fib 1*/ for i in 2.. n { c = a + b; a = b; b = c; yield c; } } for x in gen_fib (10) { write (x, " "); } writeln (); / 44

29 Distributed arrays and loops we ve so far covered task creation within a node via begin etc. data parallelism on top of task creation via forall (still within a node) explicit migration of tasks via on remote reference to objects not yet covered : parallelism over distributed memory distributed arrays (how to distribute arrays into multiple locales?) distributed parallel loops (e.g. how to distribute executions of a loop over distributed arrays, without using on clauses every time?) 29 / 44

30 Chapel design goals around distributed arrays/loops build and abstract them within Chapel on top of low level machinery of locales and parallelism within a node 1. users are able to write something as simple as: for x in A {... } for distributed arrays and executions automagically distributed 2. distribution of elements/iterations to locales are implementable within Chapel 30 / 44

31 Ranges, domains, and arrays range an interval domain multidimensional rectangular regions (a set of multidimensional indexes) arrays is a mapping from index in a domain value both ranges and domains are first-class data range const r : range = 1..n; 31 / 44

32 Distributed domains and arrays ( 27) domain can be distributed (dmapped) arrays whose domain is dmapped is a distributed array range const r : range = 1..n; 32 / 44

33 Distributed domains/arrays example ( 33) const r = 1.. n; // range literal ( includes n!) const d1 = {1..n,1.. n}; // domain literal const d2 = {r,r}; // range is first class const a : [1..n,1.. n] real ; // array specifies domain and the value type const b : [ d2] real ; // domain is first class too const c = [ 1.0, 2.0, 3.0 ]; // array literal Note: type declarations were omitted where possible 33 / 44

34 Dmapped domains yield distributed execution use BlockDist ; const blocked_ dom = { 0.. 9} dmapped Block(0..9); forall x in blocked_ dom { write (x, ":", here.id, " "); } writeln (); executed with 2 locales 0:0 1:0 2:0 3:0 4:0 5:1 6:1 7:1 8:1 9:1 it s implemented in an iterater (these() method) of Block distribution class (presumably using task parallelism and on) Chapel compiler does not have any builtin policy about where to execute particular iterations 34 / 44

35 Distributed arrays too use BlockDist ; const blocked_dom = {0..9} dmapped Block ({0..9}) ; const A : [ blocked_ dom ] real ; forall i in A. domain { writeln (i, ": executing at ", here.id, ", accessing ", A[i]. locale.id); A[i] = i; } forall a in A { a =...; } 35 / 44

36 Other distributions cyclic: use Cyclic ; const cyclic_ dom = { 0.. 9} dmapped Cyclic ( startidx =1) ; block-cyclic: const block_ cyclic_ dom = { 0.. 9} dmapped BlockCyclic ( startidx =1, blocksize =3) ; all are flexible enough to accommodate: multidimensional domains distributing to a subset of all locales you are able to define your own distribution (I haven t mastered it yet) 36 / 44

37 Calling external C functions ( 31) Chapel is designed to make it easy to call external C functions all you need to do to call C s system function getpid() extern proc getpid () : int ( 32) ; as straightforward as this to call many system-supplied functions 37 / 44

38 Calling external C functions ( 31) want to call a C function you wrote? 1. write a C file containing the function (func.c) int foo ( int x) { return x + 1; } 2. write a corresponding header file containing its declaration (func.h) int foo ( int x); 3. write this in your Chapel program (you did this for getpid()) extern proc foo ( int (32) ) : int (32) ; 4. include all files in the command line $ chpl func.h func.c program. chpl 38 / 44

39 Configuration variable ( 8.5) writing a program that takes command line options is very straightforward and scalable 1. define your global variable as config config const n =10; // 10 is the default 2. run your program with -svar=value./a. out -nl 2 -sn = / 44

40 Appendix a detailed note for those who work on Chapel now under construction, stay tuned 40 / 44

41 Chapel configurations : summary Chapel installation can choose: tasking layer implementation communication layer implementation underlying gasnet you must build Chapel (i.e. run make ) for each combination, but you can keep all modules in a single build tree you can choose the configuration to use when you compile your chapel program into an executable, through environment variables 41 / 44

42 Tasking layer choices see chapel-1.6.0/doc/readme.tasks for the full list what we have experiences with are: fifo : default massivethreads : U-Tokyo s MassiveThreads library qthreads : Sandia lab s Qthreads library you choose one of them through environment variable CHPL TASKS, both when you build Chapel and you compile your Chapel program 42 / 44

43 Communication layer choices Chapel currently uses GasNet library, which in turn allows us to choose an underlying communication substrate from several see chapel-1.6.0/doc/readme.multilocale for the overview what we have experiences with are: udp : UDP socket supported by OS (portable but slow) mpi : MPI; see chapel-1.6.0/third-party/gasnet/gasnet /mpi-conduit/ for further details ibv : Infiniband; see chapel-1.6.0/thirdparty/gasnet/gasnet /vapi-conduit/ for further details you choose one of them through environment variable CHPL COMM SUBSTRATE, both when you build Chapel and you compile your Chapel program you also set CHPL COMM=gasnet, common in all substrates 43 / 44

44 Building Chapel so the basic procedure is to run make with the three environment variables CHPL COMM=gasnet CHPL COMM SUBSTRATE={udp,mpi,ibv} CHPL TASKS={fifo,massivethreads,qthreads} to use massivethreads or qthreads, you must cd third - party ; make massivethreads ( or qthreads ) before compiling Chapel when using MPI, you might want to set MPI CC and MPIRUN CMD to point to 44 / 44

Brad Chamberlain Cray Inc. March 2011

Brad Chamberlain Cray Inc. March 2011 Brad Chamberlain Cray Inc. March 2011 Approach the topic of mapping Chapel to a new target platform by reviewing some core Chapel concepts describing how Chapel s downward-facing interfaces implement those

More information

Overview: Emerging Parallel Programming Models

Overview: Emerging Parallel Programming Models Overview: Emerging Parallel Programming Models the partitioned global address space paradigm the HPCS initiative; basic idea of PGAS the Chapel language: design principles, task and data parallelism, sum

More information

Chapel: Multi-Locale Execution 2

Chapel: Multi-Locale Execution 2 Definition Abstract unit of target architecture Capacity for processing and storage (memory) Supports reasoning about locality Properties Locale s tasks have uniform access to local memory Other locale

More information

Chapel Introduction and

Chapel Introduction and Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple

More information

Affine Loop Optimization using Modulo Unrolling in CHAPEL

Affine Loop Optimization using Modulo Unrolling in CHAPEL Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers

More information

Chapel: Locality and Affinity

Chapel: Locality and Affinity Chapel: Locality and Affinity Brad Chamberlain PRACE Winter School 12 February 2009 Outline Basics of Multi-Locale Chapel The locale type and Locales array The on statement, here locale, and communication

More information

Linear Algebra Programming Motifs

Linear Algebra Programming Motifs Linear Algebra Programming Motifs John G. Lewis Cray Inc. (retired) March 2, 2011 Programming Motifs 1, 2 & 9 Dense Linear Algebra Graph Algorithms (and Sparse Matrix Reordering) (2) SIAM CSE 11 Features

More information

Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency

Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Chapel: Task Parallelism 2 Syntax begin-stmt: begin

More information

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks main() is the only task when execution begins Thread: a system-level concept that executes tasks not

More information

. Programming Distributed Memory Machines in MPI and UPC. Kenjiro Taura. University of Tokyo

. Programming Distributed Memory Machines in MPI and UPC. Kenjiro Taura. University of Tokyo .. Programming Distributed Memory Machines in MPI and UPC Kenjiro Taura University of Tokyo 1 / 57 Distributed memory machines chip (socket, node, CPU) (physical) core hardware thread (virtual core, CPU)

More information

Steve Deitz Cray Inc.

Steve Deitz Cray Inc. Steve Deitz Cray Inc. Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Implementation Notes and Examples

More information

Chapel: An Emerging Parallel Programming Language. Thomas Van Doren, Chapel Team, Cray Inc. Northwest C++ Users Group April 16 th, 2014

Chapel: An Emerging Parallel Programming Language. Thomas Van Doren, Chapel Team, Cray Inc. Northwest C++ Users Group April 16 th, 2014 Chapel: An Emerging Parallel Programming Language Thomas Van Doren, Chapel Team, Cray Inc. Northwest C Users Group April 16 th, 2014 My Employer: 2 Parallel Challenges Square-Kilometer Array Photo: www.phy.cam.ac.uk

More information

Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables

Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables Definition: Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables i.e., has processors and memory Properties: a locale s tasks have ~uniform

More information

SYNCHRONIZED DATA. Week 9 Laboratory for Concurrent and Distributed Systems Uwe R. Zimmer. Pre-Laboratory Checklist

SYNCHRONIZED DATA. Week 9 Laboratory for Concurrent and Distributed Systems Uwe R. Zimmer. Pre-Laboratory Checklist SYNCHRONIZED DATA Week 9 Laboratory for Concurrent and Distributed Systems Uwe R. Zimmer Pre-Laboratory Checklist vvyou have read this text before you come to your lab session. vvyou understand and can

More information

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Thread: a system-level concept for executing tasks not exposed in the language sometimes exposed in the

More information

Sung-Eun Choi and Steve Deitz Cray Inc.

Sung-Eun Choi and Steve Deitz Cray Inc. Sung-Eun Choi and Steve Deitz Cray Inc. Fast prototyping writeln( hello, world ); Production-grade module HelloWorld { def main() { writeln( hello, world ); } } Chapel: Language Basics 2 Syntax Basics

More information

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P GO 08012011 Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary

More information

Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language

Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language Purity: An Integrated, Fine-Grain, Data- Centric, Communication Profiler for the Chapel Language Richard B. Johnson and Jeffrey K. Hollingsworth Department of Computer Science, University of Maryland,

More information

STREAM and RA HPC Challenge Benchmarks

STREAM and RA HPC Challenge Benchmarks STREAM and RA HPC Challenge Benchmarks simple, regular 1D computations results from SC 09 competition AMR Computations hierarchical, regular computation SSCA #2 unstructured graph computation 2 Two classes

More information

Steve Deitz Chapel project, Cray Inc.

Steve Deitz Chapel project, Cray Inc. Parallel Programming in Chapel LACSI 2006 October 18 th, 2006 Steve Deitz Chapel project, Cray Inc. Why is Parallel Programming Hard? Partitioning of data across processors Partitioning of tasks across

More information

Distributed Shared Memory for High-Performance Computing

Distributed Shared Memory for High-Performance Computing Distributed Shared Memory for High-Performance Computing Stéphane Zuckerman Haitao Wei Guang R. Gao Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

Locality/Affinity Features COMPUTE STORE ANALYZE

Locality/Affinity Features COMPUTE STORE ANALYZE Locality/Affinity Features Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about

More information

Survey on High Productivity Computing Systems (HPCS) Languages

Survey on High Productivity Computing Systems (HPCS) Languages Survey on High Productivity Computing Systems (HPCS) Languages [Internal Report] Saliya Ekanayake School of Informatics and Computing, Indiana University sekanaya@cs.indiana.edu Abstract Parallel languages

More information

Lecture 32: Partitioned Global Address Space (PGAS) programming models

Lecture 32: Partitioned Global Address Space (PGAS) programming models COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP

More information

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017 2 OpenMP Tasks 2.1 Introduction Introduction to OpenMP Tasks N.M. Maclaren nmm1@cam.ac.uk September 2017 These were introduced by OpenMP 3.0 and use a slightly different parallelism model from the previous

More information

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

More information

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup starting in 1979 based on C Introduction to C++ also

More information

Introduction to C++ with content from

Introduction to C++ with content from Introduction to C++ with content from www.cplusplus.com 2 Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup

More information

Introduction to Parallel Programming

Introduction to Parallel Programming 2012 Summer School on Concurrency August 22 29, 2012 St. Petersburg, Russia Introduction to Parallel Programming Section 4. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod

More information

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010 IPCoreL Programming Language Reference Manual Phillip Duane Douglas, Jr. 11/3/2010 The IPCoreL Programming Language Reference Manual provides concise information about the grammar, syntax, semantics, and

More information

Data Parallelism COMPUTE STORE ANALYZE

Data Parallelism COMPUTE STORE ANALYZE Data Parallelism Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about our financial

More information

Parallel Programming with Coarray Fortran

Parallel Programming with Coarray Fortran Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming

More information

Chapel: An Emerging Parallel Programming Language. Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013

Chapel: An Emerging Parallel Programming Language. Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013 Chapel: An Emerging Parallel Programming Language Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013 A Call To Arms Q: Why doesn t HPC have languages as enjoyable

More information

A Local-View Array Library for Partitioned Global Address Space C++ Programs

A Local-View Array Library for Partitioned Global Address Space C++ Programs Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June

More information

Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint SC 12

Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint SC 12 Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint Workshop @ SC 12 Given: m element vectors A, B, C Compute: i 1..m, A i = B i + α C i In pictures: A B C α = + 2 Given: m element vectors A, B, C Compute:

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: Static Data Structures Outline of Lecture 18 Recap:

More information

Chapel Hierarchical Locales

Chapel Hierarchical Locales Chapel Hierarchical Locales Greg Titus, Chapel Team, Cray Inc. SC14 Emerging Technologies November 18 th, 2014 Safe Harbor Statement This presentation may contain forward-looking statements that are based

More information

Decaf Language Reference Manual

Decaf Language Reference Manual Decaf Language Reference Manual C. R. Ramakrishnan Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4400 cram@cs.stonybrook.edu February 12, 2012 Decaf is a small object oriented

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Chapel: A Versa-le Tool for Teaching Undergraduates Parallel Programming. David P. Bunde, Knox College Kyle Burke, Colby College

Chapel: A Versa-le Tool for Teaching Undergraduates Parallel Programming. David P. Bunde, Knox College Kyle Burke, Colby College Chapel: A Versa-le Tool for Teaching Undergraduates Parallel Programming David P. Bunde, Knox College Kyle Burke, Colby College Acknowledgements Material drawn from tutorials created with contribuaons

More information

LLVM-based Communication Optimizations for PGAS Programs

LLVM-based Communication Optimizations for PGAS Programs LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson

More information

Scalable Software Transactional Memory for Chapel High-Productivity Language

Scalable Software Transactional Memory for Chapel High-Productivity Language Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

Practical Considerations for Multi- Level Schedulers. Benjamin

Practical Considerations for Multi- Level Schedulers. Benjamin Practical Considerations for Multi- Level Schedulers Benjamin Hindman @benh agenda 1 multi- level scheduling (scheduler activations) 2 intra- process multi- level scheduling (Lithe) 3 distributed multi-

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable

More information

Using Chapel to teach parallel concepts. David Bunde Knox College

Using Chapel to teach parallel concepts. David Bunde Knox College Using Chapel to teach parallel concepts David Bunde Knox College dbunde@knox.edu Acknowledgements Silent partner: Kyle Burke Material drawn from tutorials created with contribudons from Johnathan Ebbers,

More information

Chapel: Features. Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010

Chapel: Features. Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Chapel: Features Brad Chamberlain Inc. CSEP 524 May 20, 2010 Outline Language Overview Locality language concepts Data parallelism : Design Block-structured, imperative programming Intentionally not an

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Steve Deitz Cray Inc.

Steve Deitz Cray Inc. Steve Deitz Cray Inc. A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Goals Improve programmer productivity Improve the programmability of parallel computers

More information

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?

More information

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

Summary of Go Syntax /

Summary of Go Syntax / Summary of Go Syntax 02-201 / 02-601 Can declare 1 or more variables in same var statement Variables Can optionally provide initial values for all the variables (if omitted, each variable defaults to the

More information

COMP-520 GoLite Tutorial

COMP-520 GoLite Tutorial COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements

More information

Towards Resilient Chapel

Towards Resilient Chapel Towards Resilient Chapel Konstantina Panagiotopoulou Hans-Wolfgang Loidl [kp167 1, H.W.Loidl 2 ] @hw.ac.uk Heriot-Watt University EASC 2015 21st - 23rd April 2015, Edinburgh Overview 1 The Need for Resilience

More information

Scalable Shared Memory Programing

Scalable Shared Memory Programing Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory

More information

. Shared Memory Programming in OpenMP and Intel TBB. Kenjiro Taura. University of Tokyo

. Shared Memory Programming in OpenMP and Intel TBB. Kenjiro Taura. University of Tokyo .. Shared Memory Programming in OpenMP and Intel TBB Kenjiro Taura University of Tokyo 1 / 62 Today s topics. 1 What is shared memory programming?. 2 OpenMP OpenMP overview parallel and for pragma Data

More information

CS 470 Spring Parallel Languages. Mike Lam, Professor

CS 470 Spring Parallel Languages. Mike Lam, Professor CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf

More information

Affine Loop Optimization Based on Modulo Unrolling in Chapel

Affine Loop Optimization Based on Modulo Unrolling in Chapel Affine Loop Optimization Based on Modulo Unrolling in Chapel ABSTRACT Aroon Sharma, Darren Smith, Joshua Koehler, Rajeev Barua Dept. of Electrical and Computer Engineering University of Maryland, College

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Lecture V: Introduction to parallel programming with Fortran coarrays

Lecture V: Introduction to parallel programming with Fortran coarrays Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time

More information

Recap. Practical Compiling for Modern Machines (Special Topics in Programming Languages)

Recap. Practical Compiling for Modern Machines (Special Topics in Programming Languages) Recap Practical Compiling for Modern Machines (Special Topics in Programming Languages) Why Compiling? Other reasons: Performance Performance Performance correctness checking language translation hardware

More information

Managing Hierarchy with Teams in the SPMD Programming Model

Managing Hierarchy with Teams in the SPMD Programming Model Lawrence Berkeley National Laboratory Managing Hierarchy with Teams in the SPMD Programming Model Amir Kamil Lawrence Berkeley Lab Berkeley, CA, USA April 28, 2014 1 Single Program, Multiple Data Model

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3

The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3 The good, the bad and the ugly: Experiences with developing a PGAS runtime on top of MPI-3 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018) www.dash-project.org Karl Fürlinger

More information

Chapel: Productive Parallel Programming from the Pacific Northwest

Chapel: Productive Parallel Programming from the Pacific Northwest Chapel: Productive Parallel Programming from the Pacific Northwest Brad Chamberlain, Cray Inc. / UW CS&E Pacific Northwest Prog. Languages and Software Eng. Meeting March 15 th, 2016 Safe Harbor Statement

More information

Optimize an Existing Program by Introducing Parallelism

Optimize an Existing Program by Introducing Parallelism Optimize an Existing Program by Introducing Parallelism 1 Introduction This guide will help you add parallelism to your application using Intel Parallel Studio. You will get hands-on experience with our

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks COMP528 Task-based programming in OpenMP www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of Computer Science University of Liverpool a.lisitsa@.liverpool.ac.uk Parallel algorithm templates We have

More information

Data-Centric Locality in Chapel

Data-Centric Locality in Chapel Data-Centric Locality in Chapel Ben Harshbarger Cray Inc. CHIUW 2015 1 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward

More information

The APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering

The APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering The APGAS Programming Model for Heterogeneous Architectures David E. Hudak, Ph.D. Program Director for HPC Engineering dhudak@osc.edu Overview Heterogeneous architectures and their software challenges

More information

Pierce Ch. 3, 8, 11, 15. Type Systems

Pierce Ch. 3, 8, 11, 15. Type Systems Pierce Ch. 3, 8, 11, 15 Type Systems Goals Define the simple language of expressions A small subset of Lisp, with minor modifications Define the type system of this language Mathematical definition using

More information

Note: There is more in this slide deck than we will be able to cover, so consider it a reference and overview

Note: There is more in this slide deck than we will be able to cover, so consider it a reference and overview Help you understand code in subsequent slide decks Give you the basic skills to program in Chapel Provide a survey of Chapel s base language features Impart an appreciation for the base language design

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Tasks Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? OpenMP Tasks In OpenMP 3.0 with a slightly different model A form

More information

HPCC STREAM and RA in Chapel: Performance and Potential

HPCC STREAM and RA in Chapel: Performance and Potential HPCC STREAM and RA in Chapel: Performance and Potential Steven J. Deitz Bradford L. Chamberlain Samuel Figueroa David Iten Cray Inc. chapel info@cray.com Abstract Chapel is a new parallel programming language

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming

More information

A Static Cut-off for Task Parallel Programs

A Static Cut-off for Task Parallel Programs A Static Cut-off for Task Parallel Programs Shintaro Iwasaki, Kenjiro Taura Graduate School of Information Science and Technology The University of Tokyo September 12, 2016 @ PACT '16 1 Short Summary We

More information

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

Partitioned Global Address Space Paradigm ASD Distributed Memory HPC Workshop

Partitioned Global Address Space Paradigm ASD Distributed Memory HPC Workshop Partitioned Global Address Space Paradigm ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia November 02,

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

5/23/2015. Core Java Syllabus. VikRam ShaRma

5/23/2015. Core Java Syllabus. VikRam ShaRma 5/23/2015 Core Java Syllabus VikRam ShaRma Basic Concepts of Core Java 1 Introduction to Java 1.1 Need of java i.e. History 1.2 What is java? 1.3 Java Buzzwords 1.4 JDK JRE JVM JIT - Java Compiler 1.5

More information

CHAPTER 2: PROCESS MANAGEMENT

CHAPTER 2: PROCESS MANAGEMENT 1 CHAPTER 2: PROCESS MANAGEMENT Slides by: Ms. Shree Jaswal TOPICS TO BE COVERED Process description: Process, Process States, Process Control Block (PCB), Threads, Thread management. Process Scheduling:

More information

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

CSC 533: Organization of Programming Languages. Spring 2005

CSC 533: Organization of Programming Languages. Spring 2005 CSC 533: Organization of Programming Languages Spring 2005 Language features and issues variables & bindings data types primitive complex/structured expressions & assignments control structures subprograms

More information

Parallel Programming Languages. HPC Fall 2010 Prof. Robert van Engelen

Parallel Programming Languages. HPC Fall 2010 Prof. Robert van Engelen Parallel Programming Languages HPC Fall 2010 Prof. Robert van Engelen Overview Partitioned Global Address Space (PGAS) A selection of PGAS parallel programming languages CAF UPC Further reading HPC Fall

More information

Arrays. Comp Sci 1570 Introduction to C++ Array basics. arrays. Arrays as parameters to functions. Sorting arrays. Random stuff

Arrays. Comp Sci 1570 Introduction to C++ Array basics. arrays. Arrays as parameters to functions. Sorting arrays. Random stuff and Arrays Comp Sci 1570 Introduction to C++ Outline and 1 2 Multi-dimensional and 3 4 5 Outline and 1 2 Multi-dimensional and 3 4 5 Array declaration and An array is a series of elements of the same type

More information

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Seminar Analysis and Verification of Pointer Programs (WS

More information

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer

More information

Titanium. Titanium and Java Parallelism. Java: A Cleaner C++ Java Objects. Java Object Example. Immutable Classes in Titanium

Titanium. Titanium and Java Parallelism. Java: A Cleaner C++ Java Objects. Java Object Example. Immutable Classes in Titanium Titanium Titanium and Java Parallelism Arvind Krishnamurthy Fall 2004 Take the best features of threads and MPI (just like Split-C) global address space like threads (ease programming) SPMD parallelism

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Silberschatz, Galvin and Gagne 2013! Chapter 3: Process Concept Process Concept" Process Scheduling" Operations on Processes" Inter-Process Communication (IPC)" Communication

More information