Chapel: Locality and Affinity
|
|
- Joshua Warner
- 5 years ago
- Views:
Transcription
1 Chapel: Locality and Affinity Brad Chamberlain PRACE Winter School 12 February 2009 Outline Basics of Multi-Locale Chapel The locale type and Locales array The on statement, here locale, and communication The local block Domain and Array Distributions Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (2) 1Cray Proprietary
2 The locale Type Definition An abstract unit of the target architecture Supported to permit reasoning about locality Has capacity for processing and storage L0 L1 L2 L3 MEM MEM MEM MEM MEM MEM MEM MEM MEM Properties Threads within a locale have ~uniform access to local memory Memory within other locales is accessible, but at a price Locales are defined for a given architecture by a Chapel compiler e.g., a multicore processor or SMP node could be a locale Chapel: Locality and Affinity (3) Locales and Program Startup Chapel users specify # locales on executable command-line prompt> mychapelprog nl8 # run using 8 locales L0 L1 L2 L3 L4 L5 L6 L7 Chapel launcher bootstraps program execution: obtains necessary machine resources e.g., requests 8 nodes from the job scheduler loads a copy of the executable onto the machine resources starts running the program. Conceptually locale #0 starts running program s entry point (main()) other locales wait for work to arrive Chapel: Locality and Affinity (4) 2Cray Proprietary
3 Locale Variables Built-in variables represent a program s set of locales: config const numlocales: int; // number of locales const LocaleSpace [0..numLocales-1], // locale indices Locales: [LocaleSpace] locale; // locale values numlocales: 8 LocaleSpace: Locales: L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (5) Locale Views Using standard array operations, users can create their own locale views: var TaskALocs Locales[..numTaskALocs]; var TaskBLocs Locales[numTaskALocs1..]; L0 L1 L2 L3 L4 L5 L6 L7 var CompGrid Locales.reshape([1..gridRows, 1..gridCols]); L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (6) 3Cray Proprietary
4 Locale Methods The locale type supports built-in methods: def locale.id: int; // index in LocaleSpace def locale.name: string; // similar to uname -n def locale.numcores: int; // # of processor cores def locale.physicalmemory( ): ; // amount of memory Chapel: Locality and Affinity (7) Executing on Remote Locales Syntax on-stmt: on expr { stmt } Semantics Executes stmt on the locale specified by expr Does not introduce concurrency Example var A: [LocaleSpace] int; coforall loc in Locales do on loc { A(loc.id) computation(loc.id); } Chapel: Locality and Affinity (8) 4Cray Proprietary
5 Querying a variable s locale Syntax locale-query-expr: var-expr. locale Semantics Returns the locale on which var-expr is allocated Example var i: int; write(i.locale.id); on Locales(1) do write(i.locale.id); Output 00 L0 i 0 L1 Chapel: Locality and Affinity (9) Serial on-clause example on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y on Locales(0) do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (10) 5Cray Proprietary
6 Serial on-clause example (data-driven) on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y optionally, in a data-driven manner on Locales(0) x do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (11) Parallel on-clause examples on clauses: indicate where code should execute By naming locales explicitly cobegin { on TaskALocs do computetaska( ); on TaskBLocs do computetaskb( ); on Locales[0] do computetaskc( ); } L0 L1 computetaska() L2 L3 L4 L5 L6 L7 computetaskb() L0 computetaskc() or in a data-driven manner computepivot(data, lo, hi); cobegin { on data[lo] do Quicksort(data, lo, pivot); on data[pivot] do Quicksort(data, pivot, hi); } Chapel: Locality and Affinity (12) 6Cray Proprietary
7 Here Built-in locale function def here: locale; Semantics Returns the locale on which the task is executing Example writeln(here.id); on Locales(1) do writeln(here.id); Output 0 1 Chapel: Locality and Affinity (13) Remote Reads and Writes Example var i 0; on Locales(1) { writeln((here.id, i.locale.id, i)); i 1; writeln((here.id, i.locale.id, i)); } writeln((here.id, i.locale.id, i)); Output (1, 0, 0) (1, 0, 1) (0, 0, 1) L0 i 0 L1 Chapel: Locality and Affinity (14) 7Cray Proprietary
8 Remote Classes Example class C { var x: int; } var c: C; on Locales(1) do c new C(); writeln((here.id, c.locale.id, c)); L0 c L1 C x: 0 Output (0, 1, {x 0}) Chapel: Locality and Affinity (15) Local Blocks Syntax local-stmt: local stmt Semantics Asserts there are no remote references in stmt Checked at runtime by default; can be disabled for performance Example c Root.child(1); on c do local { traversetree(c); } local { A[D] B[D]; } Chapel: Locality and Affinity (16) 8Cray Proprietary
9 Outline Basics of Multi-Locale Chapel Domain and Array Distributions overview a case study: Block1D Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (17) Chapel Distributions Distributions: Recipes for parallel, distributed arrays help the compiler map from the computation s global view α down to the fragmented, per-processor implementation α α α α MEMORY MEMORY MEMORY MEMORY Chapel: Locality and Affinity (18) 9Cray Proprietary
10 Domain Distribution Domains may be distributed across locales var D: domain(2) distributed Block on CompGrid ; L0 L1 L2 L3 D L4 L5 L6 L7 CompGrid A B A distribution implies ownership of the domain s indices (and its arrays elements) the default work ownership for operations on the domains/arrays e.g., forall loops or promoted operations over domains/arrays Chapel: Locality and Affinity (19) Authoring Distributions (Advanced) Programmers can write distributions in Chapel Chapel will support a standard library of distributions research goal: using the same mechanism that users would our compiler should have no knowledge of specific distributions only its structural interface how to create domains and arrays using that distribution map indices to locales access array elements iterate over indices/array elements sequentially in parallel in parallel and zippered with other parallel iteratable types and so forth Distributions are built using the concepts we ve already seen on clauses for expressing what data & tasks each locale owns begins, cobegins, coforalls to express inter- & intra-locale parallelism Chapel: Locality and Affinity (20) 10Cray Proprietary
11 Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (21) Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (22) 11Cray Proprietary
12 A Simple Distribution: Block1D Goal: block a 1D index space across a set of locales L0 L1 L2 Use a bounding box to compute the blocking Chapel: Locality and Affinity (23) Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B1 [0..9], innerd: subdomain(d) [2..7], slided: subdomain(d) [4..6]; Sharing a distribution supports trivial alignment of these domains Chapel: Locality and Affinity (24) 12Cray Proprietary
13 Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B2 [0..9], innerd: distributed B3 [2..7], slided: distributed B4 [4..6]; When each domain is given its own distribution, the alignment between indices is less obvious. Chapel: Locality and Affinity (25) Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) Responsibility: Mapping of indices to locales Responsibility: How to store, iterate over domain indices Responsibility: How to store, access, iterate over array elements local descriptors (one instance per locale) Responsibility: How to store, iterate over local domain indices Responsibility: How to store, access, iterate over local array elements Chapel: Locality and Affinity (26) 13Cray Proprietary
14 Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) distribution descriptor domain descriptor array descriptor target locale set distribution params map index to locale global index info index type index iteration allocate array element type global value iteration random access local descriptors (one instance per locale) local segment descriptor local array segment descriptor legend descriptor state descriptor methods store local indices local index iteration add new indices store local values local value iteration local random access Chapel: Locality and Affinity (27) Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const D: domain(1) distributed B1 [1..8]; var A: [D] real; global descriptors boundingbox 1 8 targetlocales whole 1 8 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) Chapel: Locality and Affinity (28) 14Cray Proprietary
15 Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const sliced: domain(1) distributed B1 [4..6]; var A2: [sliced] real; global descriptors boundingbox 1 8 targetlocales whole 4 6 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) 6 6 Chapel: Locality and Affinity (29) Outline Basics of Multi-Locale Chapel Domain and Array Distributions Sample Uses of Distributed Domains/Arrays HPCC Stream Triad HPCC Random Access (RA) Chapel: Locality and Affinity (30) 15Cray Proprietary
16 Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially: A B C alpha * Chapel: Locality and Affinity (31) Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially (in parallel): A B C alpha * * * * * Chapel: Locality and Affinity (32) 16Cray Proprietary
17 STREAM Triad in Chapel const BlockDist new Block1D(bbox[1..m], tasksperlocale ); m const ProblemSpace: domain(1, int(64)) distributed BlockDist [1..m]; 1 m var A, B, C: [ProblemSpace] real; α forall (a, b, c) in (A, B, C) do a b alpha * c; Chapel: Locality and Affinity (33) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (34) 17Cray Proprietary
18 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (35) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: xor the value 21 into T (21 mod m) repeat N U times Chapel: Locality and Affinity (36) 18Cray Proprietary
19 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Chapel: Locality and Affinity (37) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Random Numbers Not actually generated using lotto ping-pong balls! Instead, implement a pseudo-random stream: kth random value can be generated at some cost given the kth random value, can generate the (k1)-st much more cheaply Chapel: Locality and Affinity (38) 19Cray Proprietary
20 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Conflicts When a conflict occurs an update may be lost; a certain number of these are permitted Chapel: Locality and Affinity (39) RA Declarations in Chapel const TableDist new Block1D(bbox[0..m-1], tasksperlocale ), UpdateDist new Block1D(bbox[0..N_U-1], tasksperlocale ); m N U const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; 0 m-1 0 N_U-1 var T: [TableSpace] uint(64); Chapel: Locality and Affinity (40) 20Cray Proprietary
21 RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (41) RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (42) 21Cray Proprietary
22 RA Computation in Chapel: tune for affinity const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (43) RA Computation in Chapel: fire and forget const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); sync { forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do begin T(r&indexMask) ^ r; } 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (44) 22Cray Proprietary
23 Locality and Affinity Status Stable Features: locale types, methods, and variables on clauses Incomplete Features: the local block has not been stress-tested we ve only just started getting our first distributions working this is the reason that foralls/promotions don t result in parallelism only Block1D and only for basic domain/array operations see examples/hpcc/stream.chpl and ra.chpl for sample uses Future Directions: improved support for replicated, symmetric data distributions as a mechanism for software resiliency richer locale types: multiple flavors and hierarchical locales to better represent machine structure and heterogeneity Chapel: Locality and Affinity (45) Questions? 23Cray Proprietary
Chapel: Multi-Locale Execution 2
Definition Abstract unit of target architecture Capacity for processing and storage (memory) Supports reasoning about locality Properties Locale s tasks have uniform access to local memory Other locale
More informationChapel: Features. Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010
Chapel: Features Brad Chamberlain Inc. CSEP 524 May 20, 2010 Outline Language Overview Locality language concepts Data parallelism : Design Block-structured, imperative programming Intentionally not an
More informationAbstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables
Definition: Abstract unit of target architecture Supports reasoning about locality Capable of running tasks and storing variables i.e., has processors and memory Properties: a locale s tasks have ~uniform
More informationBrad Chamberlain Cray Inc. March 2011
Brad Chamberlain Cray Inc. March 2011 Approach the topic of mapping Chapel to a new target platform by reviewing some core Chapel concepts describing how Chapel s downward-facing interfaces implement those
More informationLocality/Affinity Features COMPUTE STORE ANALYZE
Locality/Affinity Features Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about
More informationChapel: Global HPCC Benchmarks and Status Update. Brad Chamberlain Chapel Team. CUG 2007 May 7, 2007
Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7, 2007 Chapel Chapel: a new parallel language being developed by Cray Themes: general parallelism data-, task-,
More informationHPCC STREAM and RA in Chapel: Performance and Potential
HPCC STREAM and RA in Chapel: Performance and Potential Steven J. Deitz Bradford L. Chamberlain Samuel Figueroa David Iten Cray Inc. chapel info@cray.com Abstract Chapel is a new parallel programming language
More information. Programming in Chapel. Kenjiro Taura. University of Tokyo
.. Programming in Chapel Kenjiro Taura University of Tokyo 1 / 44 Contents. 1 Chapel Chapel overview Minimum introduction to syntax Task Parallelism Locales Data parallel constructs Ranges, domains, and
More informationSteve Deitz Cray Inc.
Steve Deitz Cray Inc. A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Goals Improve programmer productivity Improve the programmability of parallel computers
More informationOverview: Emerging Parallel Programming Models
Overview: Emerging Parallel Programming Models the partitioned global address space paradigm the HPCS initiative; basic idea of PGAS the Chapel language: design principles, task and data parallelism, sum
More informationChapel Hierarchical Locales
Chapel Hierarchical Locales Greg Titus, Chapel Team, Cray Inc. SC14 Emerging Technologies November 18 th, 2014 Safe Harbor Statement This presentation may contain forward-looking statements that are based
More informationChapel: An Emerging Parallel Programming Language. Thomas Van Doren, Chapel Team, Cray Inc. Northwest C++ Users Group April 16 th, 2014
Chapel: An Emerging Parallel Programming Language Thomas Van Doren, Chapel Team, Cray Inc. Northwest C Users Group April 16 th, 2014 My Employer: 2 Parallel Challenges Square-Kilometer Array Photo: www.phy.cam.ac.uk
More informationChapel Team: Brad Chamberlain, Sung-Eun Choi, Tom Hildebrandt, Vass Litvinov, Greg Titus Benchmarks: John Lewis, Kristi Maschhoff, Jonathan Claridge
Chapel Team: Brad Chamberlain, Sung-Eun Choi, Tom Hildebrandt, Vass Litvinov, Greg Titus Benchmarks: John Lewis, Kristi Maschhoff, Jonathan Claridge SC11: November 15 th, 2011 A new parallel programming
More informationSung-Eun Choi and Steve Deitz Cray Inc.
Sung-Eun Choi and Steve Deitz Cray Inc. Domains and Arrays Overview Arithmetic Other Domain Types Data Parallel Operations Examples Chapel: Data Parallelism 2 A first-class index set Specifies size and
More informationSTREAM and RA HPC Challenge Benchmarks
STREAM and RA HPC Challenge Benchmarks simple, regular 1D computations results from SC 09 competition AMR Computations hierarchical, regular computation SSCA #2 unstructured graph computation 2 Two classes
More informationIntroduction to Parallel Programming
2012 Summer School on Concurrency August 22 29, 2012 St. Petersburg, Russia Introduction to Parallel Programming Section 4. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod
More informationTask: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks
Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks Thread: a system-level concept for executing tasks not exposed in the language sometimes exposed in the
More informationTask: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks
Task: a unit of parallel work in a Chapel program all Chapel parallelism is implemented using tasks main() is the only task when execution begins Thread: a system-level concept that executes tasks not
More informationPrimitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency
Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Chapel: Task Parallelism 2 Syntax begin-stmt: begin
More informationData Parallelism COMPUTE STORE ANALYZE
Data Parallelism Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about our financial
More informationSteve Deitz Cray Inc.
Steve Deitz Cray Inc. Primitive Task-Parallel Constructs The begin statement The sync types Structured Task-Parallel Constructs Atomic Transactions and Memory Consistency Implementation Notes and Examples
More informationChapel s New Adventures in Data Locality Brad Chamberlain Chapel Team, Cray Inc. August 2, 2017
Chapel s New Adventures in Data Locality Brad Chamberlain Chapel Team, Cray Inc. August 2, 2017 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current
More informationChapel: Status and Future Directions
: Status and Future Directions Brad Chamberlain PRACE Winter School 12 February 2009 The Team Brad Chamberlain Steve Deitz Samuel Figueroa David Iten Lee Prokowich Interns Robert Bocchino (`06 UIUC) James
More informationChapel: An Emerging Parallel Programming Language. Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013
Chapel: An Emerging Parallel Programming Language Brad Chamberlain, Chapel Team, Cray Inc. Emerging Technologies, SC13 November 20 th, 2013 A Call To Arms Q: Why doesn t HPC have languages as enjoyable
More informationAn Overview of Chapel
A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Status Version 1.1 released April 15, 2010 Open source via BSD license http://chapel.cray.com/ http://sourceforge.net/projects/chapel/
More informationData Parallelism, By Example COMPUTE STORE ANALYZE
Data Parallelism, By Example Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements
More informationBrad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus Casey Battaglino, Rachel Sobel Brandon Holt,
Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus Casey Battaglino, Rachel Sobel Brandon Holt, Jeff Keasler An emerging parallel programming language
More informationData-Centric Locality in Chapel
Data-Centric Locality in Chapel Ben Harshbarger Cray Inc. CHIUW 2015 1 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward
More informationUser-Defined Distributions and Layouts in Chapel Philosophy and Framework
User-Defined Distributions and Layouts in Chapel Philosophy and Framework Brad Chamberlain, Steve Deitz, David Iten, Sung Choi Cray Inc. HotPAR 10 June 15, 2010 What is Chapel? A new parallel language
More informationSung-Eun Choi and Steve Deitz Cray Inc.
Sung-Eun Choi and Steve Deitz Cray Inc. Fast prototyping writeln( hello, world ); Production-grade module HelloWorld { def main() { writeln( hello, world ); } } Chapel: Language Basics 2 Syntax Basics
More informationChapel: an HPC language in a mainstream multicore world
Chapel: an HPC language in a mainstream multicore world Brad Chamberlain UW CSE/MSR Summer Research Institute August 6, 2008 Chapel Chapel: a new parallel language being developed by Cray Inc. Themes:
More informationRecap. Practical Compiling for Modern Machines (Special Topics in Programming Languages)
Recap Practical Compiling for Modern Machines (Special Topics in Programming Languages) Why Compiling? Other reasons: Performance Performance Performance correctness checking language translation hardware
More informationSung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint SC 12
Sung Eun Choi Cray, Inc. KIISE KOCSEA HPC SIG Joint Workshop @ SC 12 Given: m element vectors A, B, C Compute: i 1..m, A i = B i + α C i In pictures: A B C α = + 2 Given: m element vectors A, B, C Compute:
More informationScalable Software Transactional Memory for Chapel High-Productivity Language
Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL
More informationAffine Loop Optimization using Modulo Unrolling in CHAPEL
Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers
More informationSteve Deitz Chapel project, Cray Inc.
Parallel Programming in Chapel LACSI 2006 October 18 th, 2006 Steve Deitz Chapel project, Cray Inc. Why is Parallel Programming Hard? Partitioning of data across processors Partitioning of tasks across
More informationPARALLEL PROGRAMMING (gabor_par, lab_limits)
PARALLEL PROGRAMMING (gabor_par, lab_limits) Introduction Chapel offers language support for task-driven and data-driven parallelism. In task parallelism you define actions that can be done independently.
More informationUser-Defined Parallel Zippered Iterators in Chapel
User-Defined Parallel Zippered Iterators in Chapel Bradford L. Chamberlain Sung-Eun Choi Cray Inc. {bradc sungeun@cray.com Steven J. Deitz Microsoft Corporation stdeitz@microsoft.com Angeles Navarro Dept.
More informationChapel. Cray Cascade s High Productivity Language. Mary Beth Hribar Steven Deitz Brad Chamberlain Cray Inc. CUG 2006
Chapel Cray Cascade s High Productivity Language Mary Beth Hribar Steven Deitz Brad Chamberlain Cray Inc. CUG 2006 Chapel Contributors Cray Inc. Brad Chamberlain Steven Deitz Shannon Hoffswell John Plevyak
More informationChapel Introduction and
Lecture 24 Chapel Introduction and Overview of X10 and Fortress John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 But before that Created a simple
More informationLinear Algebra Programming Motifs
Linear Algebra Programming Motifs John G. Lewis Cray Inc. (retired) March 2, 2011 Programming Motifs 1, 2 & 9 Dense Linear Algebra Graph Algorithms (and Sparse Matrix Reordering) (2) SIAM CSE 11 Features
More informationHPF High Performance Fortran
Table of Contents 270 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF -- influential but failed Chapel
More informationSurvey on High Productivity Computing Systems (HPCS) Languages
Survey on High Productivity Computing Systems (HPCS) Languages [Internal Report] Saliya Ekanayake School of Informatics and Computing, Indiana University sekanaya@cs.indiana.edu Abstract Parallel languages
More informationBenchmarks, Performance Optimizations, and Memory Leaks Chapel Team, Cray Inc. Chapel version 1.13 April 7, 2016
Benchmarks, Performance Optimizations, and Memory Leaks Chapel Team, Cray Inc. Chapel version 1.13 April 7, 2016 Safe Harbor Statement This presentation may contain forward-looking statements that are
More informationMechanisms that Separate Algorithms from Implementations for Parallel Patterns
Mechanisms that Separate Algorithms from Implementations for Parallel Patterns Christopher D. Krieger Computer Science Dept. Colorado State University 1873 Campus Delivery Fort Collins, CO 80523-1873 krieger@cs.colostate.edu
More informationChapel Overview. CSCE 569 Parallel Computing
Chapel Overview CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan yanyh@cse.sc.edu https://passlab.github.io/csce569/ Slides adapted from presentation by Brad Chamberlain,
More informationRuntime Checking for Program Verification Systems
Runtime Checking for Program Verification Systems Karen Zee, Viktor Kuncak, and Martin Rinard MIT CSAIL Tuesday, March 13, 2007 Workshop on Runtime Verification 1 Background Jahob program verification
More informationPage 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7
Os-slide#1 /*Sequential Producer & Consumer*/ int i=0; repeat forever Gather material for item i; Produce item i; Use item i; Discard item i; I=I+1; end repeat Analogy: Manufacturing and distribution Print
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 1
CS321 Languages and Compiler Design I Winter 2012 Lecture 1 1 COURSE GOALS Improve understanding of languages and machines. Learn practicalities of translation. Learn anatomy of programming languages.
More informationMetadata for Component Optimisation
Metadata for Component Optimisation Olav Beckmann, Paul H J Kelly and John Darlington ob3@doc.ic.ac.uk Department of Computing, Imperial College London 80 Queen s Gate, London SW7 2BZ United Kingdom Metadata
More informationNote: There is more in this slide deck than we will be able to cover, so consider it a reference and overview
Help you understand code in subsequent slide decks Give you the basic skills to program in Chapel Provide a survey of Chapel s base language features Impart an appreciation for the base language design
More informationChap. 4 Part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1
Chap. 4 Part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Part 2 of textbook: Parallel Abstractions How can we think about conducting computations in parallel before getting down to coding?
More information1 GF 1988: Cray Y-MP; 8 Processors. Static finite element analysis. 1 TF 1998: Cray T3E; 1,024 Processors. Modeling of metallic magnet atoms
1 GF 1988: Cray Y-MP; 8 Processors Static finite element analysis 1 TF 1998: Cray T3E; 1,024 Processors Modeling of metallic magnet atoms 1 PF 2008: Cray XT5; 150,000 Processors Superconductive materials
More informationPrinciples of Parallel Algorithm Design: Concurrency and Decomposition
Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationCPU Scheduling: Objectives
CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system
More informationCS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017
CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Outline o Process concept o Process creation o Process states and scheduling o Preemption and context switch o Inter-process communication
More informationOptimizing for Bugs Fixed
Optimizing for Bugs Fixed The Design Principles behind the Clang Static Analyzer Anna Zaks, Manager of Program Analysis Team @ Apple What is This Talk About? LLVM/clang project Overview of the Clang Static
More informationRecap: Functions as first-class values
Recap: Functions as first-class values Arguments, return values, bindings What are the benefits? Parameterized, similar functions (e.g. Testers) Creating, (Returning) Functions Iterator, Accumul, Reuse
More informationThe Mother of All Chapel Talks
The Mother of All Chapel Talks Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Lecture Structure 1. Programming Models Landscape 2. Chapel Motivating Themes 3. Chapel Language Features 4. Project Status
More informationThe Cascade High Productivity Programming Language
The Cascade High Productivity Programming Language Hans P. Zima University of Vienna, Austria and JPL, California Institute of Technology, Pasadena, CA CMWF Workshop on the Use of High Performance Computing
More informationOp#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD
Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: Static Data Structures Outline of Lecture 18 Recap:
More informationThe SPL Programming Language Reference Manual
The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming
More informationAffine Loop Optimization Based on Modulo Unrolling in Chapel
Affine Loop Optimization Based on Modulo Unrolling in Chapel ABSTRACT Aroon Sharma, Darren Smith, Joshua Koehler, Rajeev Barua Dept. of Electrical and Computer Engineering University of Maryland, College
More informationMorsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin
Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between
More informationChapter 3. Describing Syntax and Semantics
Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:
More informationRuntime. The optimized program is ready to run What sorts of facilities are available at runtime
Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic
More informationChap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1
Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You
More informationQualifying Exam in Programming Languages and Compilers
Qualifying Exam in Programming Languages and Compilers University of Wisconsin Fall 1991 Instructions This exam contains nine questions, divided into two parts. All students taking the exam should answer
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationC++11 and Compiler Update
C++11 and Compiler Update John JT Thomas Sr. Director Application Developer Products About this Session A Brief History Features of C++11 you should be using now Questions 2 Bjarne Stroustrup C with Objects
More informationWhy do we care about parallel?
Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationCSE544 Database Architecture
CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from
More informationTransformer Looping Functions for Pivoting the data :
Transformer Looping Functions for Pivoting the data : Convert a single row into multiple rows using Transformer Looping Function? (Pivoting of data using parallel transformer in Datastage 8.5,8.7 and 9.1)
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationIntermediate Code Generation
Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target
More informationSteve Deitz Cray Inc. Download Chapel v0.9 h1p://sourceforge.net/projects/chapel/ Compa&ble with Linux/Unix, Mac OS X, Cygwin
Steve Deitz Cray Inc. Download Chapel v0.9 h1p://sourceforge.net/projects/chapel/ Compa&ble with Linux/Unix, Mac OS X, Cygwin A new parallel language Under development at Cray Inc. Supported through the
More informationPerformance Optimizations Chapel Team, Cray Inc. Chapel version 1.14 October 6, 2016
Performance Optimizations Chapel Team, Cray Inc. Chapel version 1.14 October 6, 2016 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
More informationGraduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions
Graduate Examination Department of Computer Science The University of Arizona Spring 2004 March 5, 2004 Instructions This examination consists of ten problems. The questions are in three areas: 1. Theory:
More informationCOMP-520 GoLite Tutorial
COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements
More informationScan Algorithm Effects on Parallelism and Memory Conflicts
Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency
More informationABSTRACT AFFINE LOOP OPTIMIZATION BASED ON MODULO UNROLLING IN CHAPEL
ABSTRACT Title of thesis: AFFINE LOOP OPTIMIZATION BASED ON MODULO UNROLLING IN CHAPEL Aroon Sharma, Master of Science, 2014 Thesis directed by: Professor Rajeev Barua Department of Electrical and Computer
More informationChapter 12. File Management
Operating System Chapter 12. File Management Lynn Choi School of Electrical Engineering Files In most applications, files are key elements For most systems except some real-time systems, files are used
More informationAbstraction Elimination for Kappa Calculus
Abstraction Elimination for Kappa Calculus Adam Megacz megacz@cs.berkeley.edu 9.Nov.2011 1 Outline Overview: the problem and solution in broad strokes Detail: kappa calculus and abstraction elimination
More informationDistributed Systems. Distributed Shared Memory. Paul Krzyzanowski
Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
More informationCS558 Programming Languages
CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from
More informationNew Programming Paradigms: Partitioned Global Address Space Languages
Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationThe APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering
The APGAS Programming Model for Heterogeneous Architectures David E. Hudak, Ph.D. Program Director for HPC Engineering dhudak@osc.edu Overview Heterogeneous architectures and their software challenges
More informationIntel Thread Building Blocks, Part II
Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization
More informationSystem Call. Preview. System Call. System Call. System Call 9/7/2018
Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating
More information