Chapel: Locality and Affinity

Size: px
Start display at page:

Download "Chapel: Locality and Affinity"

Transcription

1 Chapel: Locality and Affinity Brad Chamberlain PRACE Winter School 12 February 2009 Outline Basics of Multi-Locale Chapel The locale type and Locales array The on statement, here locale, and communication The local block Domain and Array Distributions Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (2) 1Cray Proprietary

2 The locale Type Definition An abstract unit of the target architecture Supported to permit reasoning about locality Has capacity for processing and storage L0 L1 L2 L3 MEM MEM MEM MEM MEM MEM MEM MEM MEM Properties Threads within a locale have ~uniform access to local memory Memory within other locales is accessible, but at a price Locales are defined for a given architecture by a Chapel compiler e.g., a multicore processor or SMP node could be a locale Chapel: Locality and Affinity (3) Locales and Program Startup Chapel users specify # locales on executable command-line prompt> mychapelprog nl8 # run using 8 locales L0 L1 L2 L3 L4 L5 L6 L7 Chapel launcher bootstraps program execution: obtains necessary machine resources e.g., requests 8 nodes from the job scheduler loads a copy of the executable onto the machine resources starts running the program. Conceptually locale #0 starts running program s entry point (main()) other locales wait for work to arrive Chapel: Locality and Affinity (4) 2Cray Proprietary

3 Locale Variables Built-in variables represent a program s set of locales: config const numlocales: int; // number of locales const LocaleSpace [0..numLocales-1], // locale indices Locales: [LocaleSpace] locale; // locale values numlocales: 8 LocaleSpace: Locales: L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (5) Locale Views Using standard array operations, users can create their own locale views: var TaskALocs Locales[..numTaskALocs]; var TaskBLocs Locales[numTaskALocs1..]; L0 L1 L2 L3 L4 L5 L6 L7 var CompGrid Locales.reshape([1..gridRows, 1..gridCols]); L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (6) 3Cray Proprietary

4 Locale Methods The locale type supports built-in methods: def locale.id: int; // index in LocaleSpace def locale.name: string; // similar to uname -n def locale.numcores: int; // # of processor cores def locale.physicalmemory( ): ; // amount of memory Chapel: Locality and Affinity (7) Executing on Remote Locales Syntax on-stmt: on expr { stmt } Semantics Executes stmt on the locale specified by expr Does not introduce concurrency Example var A: [LocaleSpace] int; coforall loc in Locales do on loc { A(loc.id) computation(loc.id); } Chapel: Locality and Affinity (8) 4Cray Proprietary

5 Querying a variable s locale Syntax locale-query-expr: var-expr. locale Semantics Returns the locale on which var-expr is allocated Example var i: int; write(i.locale.id); on Locales(1) do write(i.locale.id); Output 00 L0 i 0 L1 Chapel: Locality and Affinity (9) Serial on-clause example on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y on Locales(0) do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (10) 5Cray Proprietary

6 Serial on-clause example (data-driven) on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y optionally, in a data-driven manner on Locales(0) x do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (11) Parallel on-clause examples on clauses: indicate where code should execute By naming locales explicitly cobegin { on TaskALocs do computetaska( ); on TaskBLocs do computetaskb( ); on Locales[0] do computetaskc( ); } L0 L1 computetaska() L2 L3 L4 L5 L6 L7 computetaskb() L0 computetaskc() or in a data-driven manner computepivot(data, lo, hi); cobegin { on data[lo] do Quicksort(data, lo, pivot); on data[pivot] do Quicksort(data, pivot, hi); } Chapel: Locality and Affinity (12) 6Cray Proprietary

7 Here Built-in locale function def here: locale; Semantics Returns the locale on which the task is executing Example writeln(here.id); on Locales(1) do writeln(here.id); Output 0 1 Chapel: Locality and Affinity (13) Remote Reads and Writes Example var i 0; on Locales(1) { writeln((here.id, i.locale.id, i)); i 1; writeln((here.id, i.locale.id, i)); } writeln((here.id, i.locale.id, i)); Output (1, 0, 0) (1, 0, 1) (0, 0, 1) L0 i 0 L1 Chapel: Locality and Affinity (14) 7Cray Proprietary

8 Remote Classes Example class C { var x: int; } var c: C; on Locales(1) do c new C(); writeln((here.id, c.locale.id, c)); L0 c L1 C x: 0 Output (0, 1, {x 0}) Chapel: Locality and Affinity (15) Local Blocks Syntax local-stmt: local stmt Semantics Asserts there are no remote references in stmt Checked at runtime by default; can be disabled for performance Example c Root.child(1); on c do local { traversetree(c); } local { A[D] B[D]; } Chapel: Locality and Affinity (16) 8Cray Proprietary

9 Outline Basics of Multi-Locale Chapel Domain and Array Distributions overview a case study: Block1D Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (17) Chapel Distributions Distributions: Recipes for parallel, distributed arrays help the compiler map from the computation s global view α down to the fragmented, per-processor implementation α α α α MEMORY MEMORY MEMORY MEMORY Chapel: Locality and Affinity (18) 9Cray Proprietary

10 Domain Distribution Domains may be distributed across locales var D: domain(2) distributed Block on CompGrid ; L0 L1 L2 L3 D L4 L5 L6 L7 CompGrid A B A distribution implies ownership of the domain s indices (and its arrays elements) the default work ownership for operations on the domains/arrays e.g., forall loops or promoted operations over domains/arrays Chapel: Locality and Affinity (19) Authoring Distributions (Advanced) Programmers can write distributions in Chapel Chapel will support a standard library of distributions research goal: using the same mechanism that users would our compiler should have no knowledge of specific distributions only its structural interface how to create domains and arrays using that distribution map indices to locales access array elements iterate over indices/array elements sequentially in parallel in parallel and zippered with other parallel iteratable types and so forth Distributions are built using the concepts we ve already seen on clauses for expressing what data & tasks each locale owns begins, cobegins, coforalls to express inter- & intra-locale parallelism Chapel: Locality and Affinity (20) 10Cray Proprietary

11 Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (21) Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (22) 11Cray Proprietary

12 A Simple Distribution: Block1D Goal: block a 1D index space across a set of locales L0 L1 L2 Use a bounding box to compute the blocking Chapel: Locality and Affinity (23) Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B1 [0..9], innerd: subdomain(d) [2..7], slided: subdomain(d) [4..6]; Sharing a distribution supports trivial alignment of these domains Chapel: Locality and Affinity (24) 12Cray Proprietary

13 Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B2 [0..9], innerd: distributed B3 [2..7], slided: distributed B4 [4..6]; When each domain is given its own distribution, the alignment between indices is less obvious. Chapel: Locality and Affinity (25) Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) Responsibility: Mapping of indices to locales Responsibility: How to store, iterate over domain indices Responsibility: How to store, access, iterate over array elements local descriptors (one instance per locale) Responsibility: How to store, iterate over local domain indices Responsibility: How to store, access, iterate over local array elements Chapel: Locality and Affinity (26) 13Cray Proprietary

14 Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) distribution descriptor domain descriptor array descriptor target locale set distribution params map index to locale global index info index type index iteration allocate array element type global value iteration random access local descriptors (one instance per locale) local segment descriptor local array segment descriptor legend descriptor state descriptor methods store local indices local index iteration add new indices store local values local value iteration local random access Chapel: Locality and Affinity (27) Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const D: domain(1) distributed B1 [1..8]; var A: [D] real; global descriptors boundingbox 1 8 targetlocales whole 1 8 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) Chapel: Locality and Affinity (28) 14Cray Proprietary

15 Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const sliced: domain(1) distributed B1 [4..6]; var A2: [sliced] real; global descriptors boundingbox 1 8 targetlocales whole 4 6 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) 6 6 Chapel: Locality and Affinity (29) Outline Basics of Multi-Locale Chapel Domain and Array Distributions Sample Uses of Distributed Domains/Arrays HPCC Stream Triad HPCC Random Access (RA) Chapel: Locality and Affinity (30) 15Cray Proprietary

16 Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially: A B C alpha * Chapel: Locality and Affinity (31) Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially (in parallel): A B C alpha * * * * * Chapel: Locality and Affinity (32) 16Cray Proprietary

17 STREAM Triad in Chapel const BlockDist new Block1D(bbox[1..m], tasksperlocale ); m const ProblemSpace: domain(1, int(64)) distributed BlockDist [1..m]; 1 m var A, B, C: [ProblemSpace] real; α forall (a, b, c) in (A, B, C) do a b alpha * c; Chapel: Locality and Affinity (33) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (34) 17Cray Proprietary

18 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (35) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: xor the value 21 into T (21 mod m) repeat N U times Chapel: Locality and Affinity (36) 18Cray Proprietary

19 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Chapel: Locality and Affinity (37) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Random Numbers Not actually generated using lotto ping-pong balls! Instead, implement a pseudo-random stream: kth random value can be generated at some cost given the kth random value, can generate the (k1)-st much more cheaply Chapel: Locality and Affinity (38) 19Cray Proprietary

20 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Conflicts When a conflict occurs an update may be lost; a certain number of these are permitted Chapel: Locality and Affinity (39) RA Declarations in Chapel const TableDist new Block1D(bbox[0..m-1], tasksperlocale ), UpdateDist new Block1D(bbox[0..N_U-1], tasksperlocale ); m N U const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; 0 m-1 0 N_U-1 var T: [TableSpace] uint(64); Chapel: Locality and Affinity (40) 20Cray Proprietary

21 RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (41) RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (42) 21Cray Proprietary

22 RA Computation in Chapel: tune for affinity const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (43) RA Computation in Chapel: fire and forget const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); sync { forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do begin T(r&indexMask) ^ r; } 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (44) 22Cray Proprietary

23 Locality and Affinity Status Stable Features: locale types, methods, and variables on clauses Incomplete Features: the local block has not been stress-tested we ve only just started getting our first distributions working this is the reason that foralls/promotions don t result in parallelism only Block1D and only for basic domain/array operations see examples/hpcc/stream.chpl and ra.chpl for sample uses Future Directions: improved support for replicated, symmetric data distributions as a mechanism for software resiliency richer locale types: multiple flavors and hierarchical locales to better represent machine structure and heterogeneity Chapel: Locality and Affinity (45) Questions? 23Cray Proprietary

Chapel: Multi-Locale Execution 2

Chapel: Multi-Locale Execution 2 Definition Abstract unit of target architecture Capacity for processing and storage (memory) Supports reasoning about locality Properties Locale s tasks have uniform access to local memory Other locale

More information

Chapel: Features. Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010

Chapel: Features. Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Chapel: Features Brad Chamberlain Inc. CSEP 524 May 20, 2010 Outline Language Overview Locality language concepts Data parallelism : Design Block-structured, imperative programming Intentionally not an

More information

Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables

Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables Definition: Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables i.e., has processors and memory Properties: a locale s tasks have ~uniform

More information

Brad Chamberlain Cray Inc. March 2011

Brad Chamberlain Cray Inc. March 2011 Brad Chamberlain Cray Inc. March 2011 Approach the topic of mapping Chapel to a new target platform by reviewing some core Chapel concepts describing how Chapel s downward-facing interfaces implement those

More information

Locality/Affinity Features COMPUTE STORE ANALYZE

Locality/Affinity Features COMPUTE STORE ANALYZE Locality/Affinity Features Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about

More information

Chapel: Global HPCC Benchmarks and Status Update. Brad Chamberlain Chapel Team. CUG 2007 May 7, 2007

Chapel: Global HPCC Benchmarks and Status Update. Brad Chamberlain Chapel Team. CUG 2007 May 7, 2007 Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7, 2007 Chapel Chapel: a new parallel language being developed by Cray Themes: general parallelism data-, task-,

More information

HPCC STREAM and RA in Chapel: Performance and Potential

HPCC STREAM and RA in Chapel: Performance and Potential HPCC STREAM and RA in Chapel: Performance and Potential Steven J. Deitz Bradford L. Chamberlain Samuel Figueroa David Iten Cray Inc. chapel info@cray.com Abstract Chapel is a new parallel programming language

More information

. Programming in Chapel. Kenjiro Taura. University of Tokyo

. Programming in Chapel. Kenjiro Taura. University of Tokyo .. Programming in Chapel Kenjiro Taura University of Tokyo 1 / 44 Contents. 1 Chapel Chapel overview Minimum introduction to syntax Task Parallelism Locales Data parallel constructs Ranges, domains, and

More information

Steve Deitz Cray Inc.

Steve Deitz Cray Inc. Steve Deitz Cray Inc. A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Goals Improve programmer productivity Improve the programmability of parallel computers

More information

Overview: Emerging Parallel Programming Models

Overview: Emerging Parallel Programming Models Overview: Emerging Parallel Programming Models the partitioned global address space paradigm the HPCS initiative; basic idea of PGAS the Chapel language: design principles, task and data parallelism, sum

More information

Chapel Hierarchical Locales

Chapel Hierarchical Locales Chapel Hierarchical Locales Greg Titus, Chapel Team, Cray Inc. SC14 Emerging Technologies November 18 th, 2014 Safe Harbor Statement This presentation may contain forward-looking statements that are based

More information

Chapel: An Emerging Parallel Programming Language. Thomas Van Doren, Chapel Team, Cray Inc. Northwest C++ Users Group April 16 th, 2014

Chapel: An Emerging Parallel Programming Language. Thomas Van Doren, Chapel Team, Cray Inc. Northwest C++ Users Group April 16 th, 2014 Chapel: An Emerging Parallel Programming Language Thomas Van Doren, Chapel Team, Cray Inc. Northwest C Users Group April 16 th, 2014 My Employer: 2 Parallel Challenges Square-Kilometer Array Photo: www.phy.cam.ac.uk

More information

Chapel Team: Brad Chamberlain, Sung-Eun Choi, Tom Hildebrandt, Vass Litvinov, Greg Titus Benchmarks: John Lewis, Kristi Maschhoff, Jonathan Claridge

Chapel Team: Brad Chamberlain, Sung-Eun Choi, Tom Hildebrandt, Vass Litvinov, Greg Titus Benchmarks: John Lewis, Kristi Maschhoff, Jonathan Claridge Chapel Team: Brad Chamberlain, Sung-Eun Choi, Tom Hildebrandt, Vass Litvinov, Greg Titus Benchmarks: John Lewis, Kristi Maschhoff, Jonathan Claridge SC11: November 15 th, 2011 A new parallel programming

More information

Sung-Eun Choi and Steve Deitz Cray Inc.

Sung-Eun Choi and Steve Deitz Cray Inc. Sung-Eun Choi and Steve Deitz Cray Inc. Domains and Arrays Overview Arithmetic Other Domain Types Data Parallel Operations Examples Chapel: Data Parallelism 2 A first-class index set Specifies size and

More information

STREAM and RA HPC Challenge Benchmarks

STREAM and RA HPC Challenge Benchmarks STREAM and RA HPC Challenge Benchmarks simple, regular 1D computations results from SC 09 competition AMR Computations hierarchical, regular computation SSCA #2 unstructured graph computation 2 Two classes

More information

Introduction to Parallel Programming

Introduction to Parallel Programming 2012 Summer School on Concurrency August 22 29, 2012 St. Petersburg, Russia Introduction to Parallel Programming Section 4. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod

More information

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Thread: a system-level concept for executing tasks not exposed in the language sometimes exposed in the

More information

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks

Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks main() is the only task when execution begins Thread: a system-level concept that executes tasks not

More information

Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency

Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Chapel: Task Parallelism 2 Syntax begin-stmt: begin

More information

Data Parallelism COMPUTE STORE ANALYZE

Data Parallelism COMPUTE STORE ANALYZE Data Parallelism Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about our financial

More information

Steve Deitz Cray Inc.

Steve Deitz Cray Inc. Steve Deitz Cray Inc. Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Implementation Notes and Examples

More information

Chapel s New Adventures in Data Locality Brad Chamberlain Chapel Team, Cray Inc. August 2, 2017

Chapel s New Adventures in Data Locality Brad Chamberlain Chapel Team, Cray Inc. August 2, 2017 Chapel s New Adventures in Data Locality Brad Chamberlain Chapel Team, Cray Inc. August 2, 2017 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current

More information

Chapel: Status and Future Directions

Chapel: Status and Future Directions : Status and Future Directions Brad Chamberlain PRACE Winter School 12 February 2009 The Team Brad Chamberlain Steve Deitz Samuel Figueroa David Iten Lee Prokowich Interns Robert Bocchino (`06 UIUC) James

More information

Chapel: An Emerging Parallel Programming Language. Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013

Chapel: An Emerging Parallel Programming Language. Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013 Chapel: An Emerging Parallel Programming Language Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013 A Call To Arms Q: Why doesn t HPC have languages as enjoyable

More information

An Overview of Chapel

An Overview of Chapel A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Status Version 1.1 released April 15, 2010 Open source via BSD license http://chapel.cray.com/ http://sourceforge.net/projects/chapel/

More information

Data Parallelism, By Example COMPUTE STORE ANALYZE

Data Parallelism, By Example COMPUTE STORE ANALYZE Data Parallelism, By Example Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements

More information

Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus Casey Battaglino, Rachel Sobel Brandon Holt,

Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus Casey Battaglino, Rachel Sobel Brandon Holt, Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus Casey Battaglino, Rachel Sobel Brandon Holt, Jeff Keasler An emerging parallel programming language

More information

Data-Centric Locality in Chapel

Data-Centric Locality in Chapel Data-Centric Locality in Chapel Ben Harshbarger Cray Inc. CHIUW 2015 1 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward

More information

User-Defined Distributions and Layouts in Chapel Philosophy and Framework

User-Defined Distributions and Layouts in Chapel Philosophy and Framework User-Defined Distributions and Layouts in Chapel Philosophy and Framework Brad Chamberlain, Steve Deitz, David Iten, Sung Choi Cray Inc. HotPAR 10 June 15, 2010 What is Chapel? A new parallel language

More information

Sung-Eun Choi and Steve Deitz Cray Inc.

Sung-Eun Choi and Steve Deitz Cray Inc. Sung-Eun Choi and Steve Deitz Cray Inc. Fast prototyping writeln( hello, world ); Production-grade module HelloWorld { def main() { writeln( hello, world ); } } Chapel: Language Basics 2 Syntax Basics

More information

Chapel: an HPC language in a mainstream multicore world

Chapel: an HPC language in a mainstream multicore world Chapel: an HPC language in a mainstream multicore world Brad Chamberlain UW CSE/MSR Summer Research Institute August 6, 2008 Chapel Chapel: a new parallel language being developed by Cray Inc. Themes:

More information

Recap. Practical Compiling for Modern Machines (Special Topics in Programming Languages)

Recap. Practical Compiling for Modern Machines (Special Topics in Programming Languages) Recap Practical Compiling for Modern Machines (Special Topics in Programming Languages) Why Compiling? Other reasons: Performance Performance Performance correctness checking language translation hardware

More information

Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint SC 12

Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint SC 12 Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint Workshop @ SC 12 Given: m element vectors A, B, C Compute: i 1..m, A i = B i + α C i In pictures: A B C α = + 2 Given: m element vectors A, B, C Compute:

More information

Scalable Software Transactional Memory for Chapel High-Productivity Language

Scalable Software Transactional Memory for Chapel High-Productivity Language Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL

More information

Affine Loop Optimization using Modulo Unrolling in CHAPEL

Affine Loop Optimization using Modulo Unrolling in CHAPEL Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers

More information

Steve Deitz Chapel project, Cray Inc.

Steve Deitz Chapel project, Cray Inc. Parallel Programming in Chapel LACSI 2006 October 18 th, 2006 Steve Deitz Chapel project, Cray Inc. Why is Parallel Programming Hard? Partitioning of data across processors Partitioning of tasks across

More information

PARALLEL PROGRAMMING (gabor_par, lab_limits)

PARALLEL PROGRAMMING (gabor_par, lab_limits) PARALLEL PROGRAMMING (gabor_par, lab_limits) Introduction Chapel offers language support for task-driven and data-driven parallelism. In task parallelism you define actions that can be done independently.

More information

User-Defined Parallel Zippered Iterators in Chapel

User-Defined Parallel Zippered Iterators in Chapel User-Defined Parallel Zippered Iterators in Chapel Bradford L. Chamberlain Sung-Eun Choi Cray Inc. {bradc sungeun@cray.com Steven J. Deitz Microsoft Corporation stdeitz@microsoft.com Angeles Navarro Dept.

More information

Chapel. Cray Cascade s High Productivity Language. Mary Beth Hribar Steven Deitz Brad Chamberlain Cray Inc. CUG 2006

Chapel. Cray Cascade s High Productivity Language. Mary Beth Hribar Steven Deitz Brad Chamberlain Cray Inc. CUG 2006 Chapel Cray Cascade s High Productivity Language Mary Beth Hribar Steven Deitz Brad Chamberlain Cray Inc. CUG 2006 Chapel Contributors Cray Inc. Brad Chamberlain Steven Deitz Shannon Hoffswell John Plevyak

More information

Chapel Introduction and

Chapel Introduction and Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple

More information

Linear Algebra Programming Motifs

Linear Algebra Programming Motifs Linear Algebra Programming Motifs John G. Lewis Cray Inc. (retired) March 2, 2011 Programming Motifs 1, 2 & 9 Dense Linear Algebra Graph Algorithms (and Sparse Matrix Reordering) (2) SIAM CSE 11 Features

More information

HPF High Performance Fortran

HPF High Performance Fortran Table of Contents 270 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF -- influential but failed Chapel

More information

Survey on High Productivity Computing Systems (HPCS) Languages

Survey on High Productivity Computing Systems (HPCS) Languages Survey on High Productivity Computing Systems (HPCS) Languages [Internal Report] Saliya Ekanayake School of Informatics and Computing, Indiana University sekanaya@cs.indiana.edu Abstract Parallel languages

More information

Benchmarks, Performance Optimizations, and Memory Leaks Chapel Team, Cray Inc. Chapel version 1.13 April 7, 2016

Benchmarks, Performance Optimizations, and Memory Leaks Chapel Team, Cray Inc. Chapel version 1.13 April 7, 2016 Benchmarks, Performance Optimizations, and Memory Leaks Chapel Team, Cray Inc. Chapel version 1.13 April 7, 2016 Safe Harbor Statement This presentation may contain forward-looking statements that are

More information

Mechanisms that Separate Algorithms from Implementations for Parallel Patterns

Mechanisms that Separate Algorithms from Implementations for Parallel Patterns Mechanisms that Separate Algorithms from Implementations for Parallel Patterns Christopher D. Krieger Computer Science Dept. Colorado State University 1873 Campus Delivery Fort Collins, CO 80523-1873 krieger@cs.colostate.edu

More information

Chapel Overview. CSCE 569 Parallel Computing

Chapel Overview. CSCE 569 Parallel Computing Chapel Overview CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan yanyh@cse.sc.edu https://passlab.github.io/csce569/ Slides adapted from presentation by Brad Chamberlain,

More information

Runtime Checking for Program Verification Systems

Runtime Checking for Program Verification Systems Runtime Checking for Program Verification Systems Karen Zee, Viktor Kuncak, and Martin Rinard MIT CSAIL Tuesday, March 13, 2007 Workshop on Runtime Verification 1 Background Jahob program verification

More information

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7 Os-slide#1 /*Sequential Producer & Consumer*/ int i=0; repeat forever Gather material for item i; Produce item i; Use item i; Discard item i; I=I+1; end repeat Analogy: Manufacturing and distribution Print

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1 CS321 Languages and Compiler Design I Winter 2012 Lecture 1 1 COURSE GOALS Improve understanding of languages and machines. Learn practicalities of translation. Learn anatomy of programming languages.

More information

Metadata for Component Optimisation

Metadata for Component Optimisation Metadata for Component Optimisation Olav Beckmann, Paul H J Kelly and John Darlington ob3@doc.ic.ac.uk Department of Computing, Imperial College London 80 Queen s Gate, London SW7 2BZ United Kingdom Metadata

More information

Note: There is more in this slide deck than we will be able to cover, so consider it a reference and overview

Note: There is more in this slide deck than we will be able to cover, so consider it a reference and overview Help you understand code in subsequent slide decks Give you the basic skills to program in Chapel Provide a survey of Chapel s base language features Impart an appreciation for the base language design

More information

Chap. 4 Part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 4 Part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 4 Part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Part 2 of textbook: Parallel Abstractions How can we think about conducting computations in parallel before getting down to coding?

More information

1 GF 1988: Cray Y-MP; 8 Processors. Static finite element analysis. 1 TF 1998: Cray T3E; 1,024 Processors. Modeling of metallic magnet atoms

1 GF 1988: Cray Y-MP; 8 Processors. Static finite element analysis. 1 TF 1998: Cray T3E; 1,024 Processors. Modeling of metallic magnet atoms 1 GF 1988: Cray Y-MP; 8 Processors Static finite element analysis 1 TF 1998: Cray T3E; 1,024 Processors Modeling of metallic magnet atoms 1 PF 2008: Cray XT5; 150,000 Processors Superconductive materials

More information

Principles of Parallel Algorithm Design: Concurrency and Decomposition

Principles of Parallel Algorithm Design: Concurrency and Decomposition Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

CPU Scheduling: Objectives

CPU Scheduling: Objectives CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system

More information

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017 CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Outline o Process concept o Process creation o Process states and scheduling o Preemption and context switch o Inter-process communication

More information

Optimizing for Bugs Fixed

Optimizing for Bugs Fixed Optimizing for Bugs Fixed The Design Principles behind the Clang Static Analyzer Anna Zaks, Manager of Program Analysis Team @ Apple What is This Talk About? LLVM/clang project Overview of the Clang Static

More information

Recap: Functions as first-class values

Recap: Functions as first-class values Recap: Functions as first-class values Arguments, return values, bindings What are the benefits? Parameterized, similar functions (e.g. Testers) Creating, (Returning) Functions Iterator, Accumul, Reuse

More information

The Mother of All Chapel Talks

The Mother of All Chapel Talks The Mother of All Chapel Talks Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Lecture Structure 1. Programming Models Landscape 2. Chapel Motivating Themes 3. Chapel Language Features 4. Project Status

More information

The Cascade High Productivity Programming Language

The Cascade High Productivity Programming Language The Cascade High Productivity Programming Language Hans P. Zima University of Vienna, Austria and JPL, California Institute of Technology, Pasadena, CA CMWF Workshop on the Use of High Performance Computing

More information

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: Static Data Structures Outline of Lecture 18 Recap:

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

Affine Loop Optimization Based on Modulo Unrolling in Chapel

Affine Loop Optimization Based on Modulo Unrolling in Chapel Affine Loop Optimization Based on Modulo Unrolling in Chapel ABSTRACT Aroon Sharma, Darren Smith, Joshua Koehler, Rajeev Barua Dept. of Electrical and Computer Engineering University of Maryland, College

More information

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic

More information

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You

More information

Qualifying Exam in Programming Languages and Compilers

Qualifying Exam in Programming Languages and Compilers Qualifying Exam in Programming Languages and Compilers University of Wisconsin Fall 1991 Instructions This exam contains nine questions, divided into two parts. All students taking the exam should answer

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

C++11 and Compiler Update

C++11 and Compiler Update C++11 and Compiler Update John JT Thomas Sr. Director Application Developer Products About this Session A Brief History Features of C++11 you should be using now Questions 2 Bjarne Stroustrup C with Objects

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

Parallel Databases C H A P T E R18. Practice Exercises

Parallel Databases C H A P T E R18. Practice Exercises C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks

More information

CSE544 Database Architecture

CSE544 Database Architecture CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from

More information

Transformer Looping Functions for Pivoting the data :

Transformer Looping Functions for Pivoting the data : Transformer Looping Functions for Pivoting the data : Convert a single row into multiple rows using Transformer Looping Function? (Pivoting of data using parallel transformer in Datastage 8.5,8.7 and 9.1)

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Steve Deitz Cray Inc. Download Chapel v0.9 h1p://sourceforge.net/projects/chapel/ Compa&ble with Linux/Unix, Mac OS X, Cygwin

Steve Deitz Cray Inc. Download Chapel v0.9 h1p://sourceforge.net/projects/chapel/ Compa&ble with Linux/Unix, Mac OS X, Cygwin Steve Deitz Cray Inc. Download Chapel v0.9 h1p://sourceforge.net/projects/chapel/ Compa&ble with Linux/Unix, Mac OS X, Cygwin A new parallel language Under development at Cray Inc. Supported through the

More information

Performance Optimizations Chapel Team, Cray Inc. Chapel version 1.14 October 6, 2016

Performance Optimizations Chapel Team, Cray Inc. Chapel version 1.14 October 6, 2016 Performance Optimizations Chapel Team, Cray Inc. Chapel version 1.14 October 6, 2016 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.

More information

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions Graduate Examination Department of Computer Science The University of Arizona Spring 2004 March 5, 2004 Instructions This examination consists of ten problems. The questions are in three areas: 1. Theory:

More information

COMP-520 GoLite Tutorial

COMP-520 GoLite Tutorial COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements

More information

Scan Algorithm Effects on Parallelism and Memory Conflicts

Scan Algorithm Effects on Parallelism and Memory Conflicts Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency

More information

ABSTRACT AFFINE LOOP OPTIMIZATION BASED ON MODULO UNROLLING IN CHAPEL

ABSTRACT AFFINE LOOP OPTIMIZATION BASED ON MODULO UNROLLING IN CHAPEL ABSTRACT Title of thesis: AFFINE LOOP OPTIMIZATION BASED ON MODULO UNROLLING IN CHAPEL Aroon Sharma, Master of Science, 2014 Thesis directed by: Professor Rajeev Barua Department of Electrical and Computer

More information

Chapter 12. File Management

Chapter 12. File Management Operating System Chapter 12. File Management Lynn Choi School of Electrical Engineering Files In most applications, files are key elements For most systems except some real-time systems, files are used

More information

Abstraction Elimination for Kappa Calculus

Abstraction Elimination for Kappa Calculus Abstraction Elimination for Kappa Calculus Adam Megacz megacz@cs.berkeley.edu 9.Nov.2011 1 Outline Overview: the problem and solution in broad strokes Detail: kappa calculus and abstraction elimination

More information

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from

More information

New Programming Paradigms: Partitioned Global Address Space Languages

New Programming Paradigms: Partitioned Global Address Space Languages Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

The APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering

The APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering The APGAS Programming Model for Heterogeneous Architectures David E. Hudak, Ph.D. Program Director for HPC Engineering dhudak@osc.edu Overview Heterogeneous architectures and their software challenges

More information

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization

More information

System Call. Preview. System Call. System Call. System Call 9/7/2018

System Call. Preview. System Call. System Call. System Call 9/7/2018 Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating

More information