Chapel: Locality and Affinity

Size: px

Start display at page:

Download "Chapel: Locality and Affinity"

Joshua Warner
5 years ago
Views:

statement, here locale, and communication The local block Domain and Array

1 Chapel: Locality and Affinity Brad Chamberlain PRACE Winter School 12 February 2009 Outline Basics of Multi-Locale Chapel The locale type and Locales array The on statement, here locale, and communication The local block Domain and Array Distributions Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (2) 1Cray Proprietary

2 The locale Type Definition An abstract unit of the target architecture Supported to permit reasoning about locality Has capacity for processing and storage L0 L1 L2 L3 MEM MEM MEM MEM MEM MEM MEM MEM MEM Properties Threads within a locale have ~uniform access to local memory Memory within other locales is accessible, but at a price Locales are defined for a given architecture by a Chapel compiler e.g., a multicore processor or SMP node could be a locale Chapel: Locality and Affinity (3) Locales and Program Startup Chapel users specify # locales on executable command-line prompt> mychapelprog nl8 # run using 8 locales L0 L1 L2 L3 L4 L5 L6 L7 Chapel launcher bootstraps program execution: obtains necessary machine resources e.g., requests 8 nodes from the job scheduler loads a copy of the executable onto the machine resources starts running the program. Conceptually locale #0 starts running program s entry point (main()) other locales wait for work to arrive Chapel: Locality and Affinity (4) 2Cray Proprietary

3 Locale Variables Built-in variables represent a program s set of locales: config const numlocales: int; // number of locales const LocaleSpace [0..numLocales-1], // locale indices Locales: [LocaleSpace] locale; // locale values numlocales: 8 LocaleSpace: Locales: L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (5) Locale Views Using standard array operations, users can create their own locale views: var TaskALocs Locales[..numTaskALocs]; var TaskBLocs Locales[numTaskALocs1..]; L0 L1 L2 L3 L4 L5 L6 L7 var CompGrid Locales.reshape([1..gridRows, 1..gridCols]); L0 L1 L2 L3 L4 L5 L6 L7 Chapel: Locality and Affinity (6) 3Cray Proprietary

4 Locale Methods The locale type supports built-in methods: def locale.id: int; // index in LocaleSpace def locale.name: string; // similar to uname -n def locale.numcores: int; // # of processor cores def locale.physicalmemory( ): ; // amount of memory Chapel: Locality and Affinity (7) Executing on Remote Locales Syntax on-stmt: on expr { stmt } Semantics Executes stmt on the locale specified by expr Does not introduce concurrency Example var A: [LocaleSpace] int; coforall loc in Locales do on loc { A(loc.id) computation(loc.id); } Chapel: Locality and Affinity (8) 4Cray Proprietary

5 Querying a variable s locale Syntax locale-query-expr: var-expr. locale Semantics Returns the locale on which var-expr is allocated Example var i: int; write(i.locale.id); on Locales(1) do write(i.locale.id); Output 00 L0 i 0 L1 Chapel: Locality and Affinity (9) Serial on-clause example on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y on Locales(0) do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (10) 5Cray Proprietary

6 Serial on-clause example (data-driven) on clauses: indicate where code should execute // Chapel programs begin running on locale 0 by default var x, y: real; // allocate x & y on locale 0 on Locales(1) { // migrate task to locale 1 var z: real; // allocate z on locale 1 writeln(x.locale.id); // prints 0 writeln(z.locale.id); // prints 1 z x y; // requires get for x and y optionally, in a data-driven manner on Locales(0) x do // migrate back to locale 0 z x y; // requires get for z // return to locale 1 } // return to locale 0 Chapel: Locality and Affinity (11) Parallel on-clause examples on clauses: indicate where code should execute By naming locales explicitly cobegin { on TaskALocs do computetaska( ); on TaskBLocs do computetaskb( ); on Locales[0] do computetaskc( ); } L0 L1 computetaska() L2 L3 L4 L5 L6 L7 computetaskb() L0 computetaskc() or in a data-driven manner computepivot(data, lo, hi); cobegin { on data[lo] do Quicksort(data, lo, pivot); on data[pivot] do Quicksort(data, pivot, hi); } Chapel: Locality and Affinity (12) 6Cray Proprietary

7 Here Built-in locale function def here: locale; Semantics Returns the locale on which the task is executing Example writeln(here.id); on Locales(1) do writeln(here.id); Output 0 1 Chapel: Locality and Affinity (13) Remote Reads and Writes Example var i 0; on Locales(1) { writeln((here.id, i.locale.id, i)); i 1; writeln((here.id, i.locale.id, i)); } writeln((here.id, i.locale.id, i)); Output (1, 0, 0) (1, 0, 1) (0, 0, 1) L0 i 0 L1 Chapel: Locality and Affinity (14) 7Cray Proprietary

8 Remote Classes Example class C { var x: int; } var c: C; on Locales(1) do c new C(); writeln((here.id, c.locale.id, c)); L0 c L1 C x: 0 Output (0, 1, {x 0}) Chapel: Locality and Affinity (15) Local Blocks Syntax local-stmt: local stmt Semantics Asserts there are no remote references in stmt Checked at runtime by default; can be disabled for performance Example c Root.child(1); on c do local { traversetree(c); } local { A[D] B[D]; } Chapel: Locality and Affinity (16) 8Cray Proprietary

9 Outline Basics of Multi-Locale Chapel Domain and Array Distributions overview a case study: Block1D Sample Uses of Distributed Domains/Arrays Chapel: Locality and Affinity (17) Chapel Distributions Distributions: Recipes for parallel, distributed arrays help the compiler map from the computation s global view α down to the fragmented, per-processor implementation α α α α MEMORY MEMORY MEMORY MEMORY Chapel: Locality and Affinity (18) 9Cray Proprietary

10 Domain Distribution Domains may be distributed across locales var D: domain(2) distributed Block on CompGrid ; L0 L1 L2 L3 D L4 L5 L6 L7 CompGrid A B A distribution implies ownership of the domain s indices (and its arrays elements) the default work ownership for operations on the domains/arrays e.g., forall loops or promoted operations over domains/arrays Chapel: Locality and Affinity (19) Authoring Distributions (Advanced) Programmers can write distributions in Chapel Chapel will support a standard library of distributions research goal: using the same mechanism that users would our compiler should have no knowledge of specific distributions only its structural interface how to create domains and arrays using that distribution map indices to locales access array elements iterate over indices/array elements sequentially in parallel in parallel and zippered with other parallel iteratable types and so forth Distributions are built using the concepts we ve already seen on clauses for expressing what data & tasks each locale owns begins, cobegins, coforalls to express inter- & intra-locale parallelism Chapel: Locality and Affinity (20) 10Cray Proprietary

11 Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (21) Distributions All the domain types we ve seen will support distributions Domain/array semantics are independent of distribution performance and parallelism may vary greatly as distributions change George John Thomas James Andrew Martin William Chapel: Locality and Affinity (22) 11Cray Proprietary

12 A Simple Distribution: Block1D Goal: block a 1D index space across a set of locales L0 L1 L2 Use a bounding box to compute the blocking Chapel: Locality and Affinity (23) Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B1 [0..9], innerd: subdomain(d) [2..7], slided: subdomain(d) [4..6]; Sharing a distribution supports trivial alignment of these domains Chapel: Locality and Affinity (24) 12Cray Proprietary

13 Distributions vs. Domains Q1: Why distinguish between distributions and domains? Q2: Why do distributions map an index space rather than a fixed index set? A: To permit several domains to share a single distribution amortizes the overheads of storing a distribution supports trivial domain/array alignment and compiler optimizations L0 L1 L2 const D : distributed B1 [1..8], outerd: distributed B2 [0..9], innerd: distributed B3 [2..7], slided: distributed B4 [4..6]; When each domain is given its own distribution, the alignment between indices is less obvious. Chapel: Locality and Affinity (25) Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) Responsibility: Mapping of indices to locales Responsibility: How to store, iterate over domain indices Responsibility: How to store, access, iterate over array elements local descriptors (one instance per locale) Responsibility: How to store, iterate over local domain indices Responsibility: How to store, access, iterate over local array elements Chapel: Locality and Affinity (26) 13Cray Proprietary

14 Chapel s Distribution Architecture distribution domain array global descriptors (one global instance or replicated per locale) distribution descriptor domain descriptor array descriptor target locale set distribution params map index to locale global index info index type index iteration allocate array element type global value iteration random access local descriptors (one instance per locale) local segment descriptor local array segment descriptor legend descriptor state descriptor methods store local indices local index iteration add new indices store local values local value iteration local random access Chapel: Locality and Affinity (27) Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const D: domain(1) distributed B1 [1..8]; var A: [D] real; global descriptors boundingbox 1 8 targetlocales whole 1 8 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) Chapel: Locality and Affinity (28) 14Cray Proprietary

15 Block1D Distribution Classes code distribution domain array const B1 new Block1D( bbox[1..8]) (LocaleSpace [0..2]) const sliced: domain(1) distributed B1 [4..6]; var A2: [sliced] real; global descriptors boundingbox 1 8 targetlocales whole 4 6 L0 L1 L2 local descriptors mychunk myblock myelems L0 L1 L2 min(idxtype) mychunk myblock myelems mychunk myblock myelems 6 max(idxtype) 6 6 Chapel: Locality and Affinity (29) Outline Basics of Multi-Locale Chapel Domain and Array Distributions Sample Uses of Distributed Domains/Arrays HPCC Stream Triad HPCC Random Access (RA) Chapel: Locality and Affinity (30) 15Cray Proprietary

16 Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially: A B C alpha * Chapel: Locality and Affinity (31) Introduction to STREAM Triad Given: m-element vectors A, B, C Compute: i 1..m, A i B i α C i Pictorially (in parallel): A B C alpha * * * * * Chapel: Locality and Affinity (32) 16Cray Proprietary

17 STREAM Triad in Chapel const BlockDist new Block1D(bbox[1..m], tasksperlocale ); m const ProblemSpace: domain(1, int(64)) distributed BlockDist [1..m]; 1 m var A, B, C: [ProblemSpace] real; α forall (a, b, c) in (A, B, C) do a b alpha * c; Chapel: Locality and Affinity (33) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (34) 17Cray Proprietary

18 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: Chapel: Locality and Affinity (35) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially: xor the value 21 into T (21 mod m) repeat N U times Chapel: Locality and Affinity (36) 18Cray Proprietary

19 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Chapel: Locality and Affinity (37) Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Random Numbers Not actually generated using lotto ping-pong balls! Instead, implement a pseudo-random stream: kth random value can be generated at some cost given the kth random value, can generate the (k1)-st much more cheaply Chapel: Locality and Affinity (38) 19Cray Proprietary

20 Introduction to Random Access Given: m-element table T (where m 2 n and initially T i i) Compute: N U random updates to the table using bitwise-xor Pictorially (in parallel):??????????????? Conflicts When a conflict occurs an update may be lost; a certain number of these are permitted Chapel: Locality and Affinity (39) RA Declarations in Chapel const TableDist new Block1D(bbox[0..m-1], tasksperlocale ), UpdateDist new Block1D(bbox[0..N_U-1], tasksperlocale ); m N U const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; 0 m-1 0 N_U-1 var T: [TableSpace] uint(64); Chapel: Locality and Affinity (40) 20Cray Proprietary

21 RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (41) RA Computation in Chapel const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (42) 21Cray Proprietary

22 RA Computation in Chapel: tune for affinity const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do T(r&indexMask) ^ r; 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (43) RA Computation in Chapel: fire and forget const TableSpace: domain(1, uint(64)) distributed TableDist [0..m-1], Updates: domain(1, uint(64)) distributed UpdateDist [0..N_U-1]; var T: [TableSpace] uint(64); sync { forall (_, r) in (Updates, RAStream()) do on T(r&indexMask) do begin T(r&indexMask) ^ r; } 0 N_U-1 RAStream(): r 0 r 1 r 2 r 3 r 9 r 17 r 23 r N_U-1 Chapel: Locality and Affinity (44) 22Cray Proprietary

23 Locality and Affinity Status Stable Features: locale types, methods, and variables on clauses Incomplete Features: the local block has not been stress-tested we ve only just started getting our first distributions working this is the reason that foralls/promotions don t result in parallelism only Block1D and only for basic domain/array operations see examples/hpcc/stream.chpl and ra.chpl for sample uses Future Directions: improved support for replicated, symmetric data distributions as a mechanism for software resiliency richer locale types: multiple flavors and hierarchical locales to better represent machine structure and heterogeneity Chapel: Locality and Affinity (45) Questions? 23Cray Proprietary

Chapel: Multi-Locale Execution 2

Chapel: Multi-Locale Execution 2 Definition Abstract unit of target architecture Capacity for processing and storage (memory) Supports reasoning about locality Properties Locale s tasks have uniform access to local memory Other locale