Heuristic-based techniques for mapping irregular communication graphs to mesh topologies

Size: px

Start display at page:

Download "Heuristic-based techniques for mapping irregular communication graphs to mesh topologies"

Henry Lynch
6 years ago
Views:

1 Heuristic-based techniques for mapping irregular communication graphs to mesh topologies Abhinav Bhatele and Laxmikant V. Kale University of Illinois at Urbana-Champaign LLNL-PRES , P. O. Box 808, Livermore, CA 94551

2 Introduction Various kinds of interconnect networks deployed today for supercomputers mesh, fat-tree, Kautz graph, dragonfly Link sharing leads to contention and performance slowdowns Communication optimization becoming increasingly important 2

3 Topology aware mapping Mapping the communication graph of an application to the physical topology can optimize communication Diverse set of parameters from the application graph and the processor topology one solution may not do well in all cases Specific heuristics for mesh topologies and regular/ irregular communication graphs 3

Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate

4 Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate information MXOVLP, MXOV_AL, EXC, COCE, AFFN EXC, COCE, AFFN BFT, MHT, Infer structure AFFN, COCE, COCE+MHT Choose best heuristic depending on hop-bytes Processor topology information Output: Mapping file used for the next run 4

5 Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate information MXOVLP, MXOV_AL, EXC, COCE, AFFN EXC, COCE, AFFN BFT, MHT, Infer structure AFFN, COCE, COCE+MHT Choose best heuristic depending on hop-bytes Processor topology information Output: Mapping file used for the next run 4

6 Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate information MXOVLP, MXOV_AL, EXC, COCE, AFFN EXC, COCE, AFFN BFT, MHT, Infer structure AFFN, COCE, COCE+MHT Choose best heuristic depending on hop-bytes Processor topology information Output: Mapping file used for the next run 4

7 Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate information MXOVLP, MXOV_AL, EXC, COCE, AFFN EXC, COCE, AFFN BFT, MHT, Infer structure AFFN, COCE, COCE+MHT Relieve the application developer of the mapping burden Choose best heuristic depending on hop-bytes Output: Mapping file used for the next run Processor topology information 4

8 Automatic Mapping Framework Application communication graph Pattern Matching Framework Regular Graphs Irregular Graphs 2D Object Graph 3D Object Graph W/o coordinate information W/ coordinate information MXOVLP, MXOV_AL, EXC, COCE, AFFN EXC, COCE, AFFN BFT, MHT, Infer structure AFFN, COCE, COCE+MHT Relieve the application developer of the mapping burden Choose best heuristic depending on hop-bytes Output: Mapping file used for the next run Processor topology information No change to the application code 4

9 Two different scenarios There is no spatial information associated with the node Option 1: Work without it Option 2: If we know that the simulation has a geometric configuration, try to infer the structure of the graph We have geometric coordinate information for each node Use coordinate information to avoid crossing of edges and for other optimizations 5

10 Finding the nearest available Algorithms in this paper: processor pick a vertex in the graph to map find a desirable processor to place it on Spiraling Quadtree 6

11 Finding the nearest available Algorithms in this paper: processor pick a vertex in the graph to map find a desirable processor to place it on Spiraling Quadtree 6

12 Finding the nearest available Algorithms in this paper: processor pick a vertex in the graph to map find a desirable processor to place it on Spiraling Quadtree 6

13 Finding the nearest available Algorithms in this paper: processor pick a vertex in the graph to map find a desirable processor to place it on Spiraling Quadtree 6

14 Finding the nearest available processor Algorithms in this paper: pick a vertex in the graph to map find a desirable processor to place it on Spiraling Quadtree 6

15 Finding the nearest available processor 7

16 Finding the nearest available processor consecutive calls to the spiraling and quadtree algorithms for findnearest from the AFFN mapping regular communi- (2D) mesh. We then discuss mapping heuristics for different 7

17 Spiraling vs Quadtree Execution Time (ms) Spiral (corner) Quadtree (corner) Spiral (center) Quadtree (center) Number of nodes in the communication graph 8

18 Spiraling vs Quadtree Performance when called from AFFN: a mapping algorithm Execution Time (ms) Spiraling Quadtree Number of nodes in the communication graph 9

19 Mapping Irregular Graphs Object graph: 90 nodes Processor Mesh: 10 x 9 10

20 No coordinate information 11

21 No coordinate information Breadth first traversal (BFT) Start with a random node and one end of the processor mesh Map nodes as you encounter them close to their parent 11

22 No coordinate information Breadth first traversal (BFT) Start with a random node and one end of the processor mesh Map nodes as you encounter them close to their parent Max heap traversal (MHT) Start with a random node and one end/center of the mesh Put neighbors of a mapped node into the heap (node at the top is the one with maximum number of mapped neighbors) Map elements in the heap one by one around the centroid of their mapped neighbors 11

23 Mapping visualization BFT: 2.89 MHT:

24 Inferring the spatial placement 13

25 Inferring the spatial placement Graph layout algorithms Force-based layout to reduce the total energy in the system Use the graphviz library to obtain coordinates of the nodes 13

26 Inferring the spatial placement Graph layout algorithms Force-based layout to reduce the total energy in the system Use the graphviz library to obtain coordinates of the nodes 13

27 With coordinate information Affine Mapping (AFFN) Stretch/shrink the object graph (based on coordinates of nodes) to map it on to the processor grid In case of conflicts for the same processor, find the nearest available processor Corners to Center (COCE) Use four corners of the object graph based on coordinates Start mapping simultaneously from all sides Place nodes encountered during a BFT close to their parents 14

28 Mapping visualization AFFN: 3.17 COCE:

29 COCE+MHT Hybrid: We fix four nodes at geometric corners of the mesh to four processors in 2D Put neighbors of these nodes into a max heap Map from all sides inwards Starting from centroid of mapped neighbors COCE:

30 Time Complexity 17

31 Time Complexity All algorithms discussed above choose a desired processor and spiral around it to find the nearest available processor Heuristics generally applicable to any topology 17

32 Time Complexity All algorithms discussed above choose a desired processor and spiral around it to find the nearest available processor Heuristics generally applicable to any topology Depending on the running time of findnearest: BFT COCE AFFN MHT COCE+MHT O(n) O(n) O(n) O(n logn) O(n logn) O(n (logn) 2 ) O(n (logn) 2 ) O(n (logn) 2 ) O(n (logn) 2 ) O(n (logn) 2 ) 17

33 Running Time Time (ms) COCE+MHT MHT AFFN COCE BFT Number of nodes 18

34 Evaluation Metric for comparison: hop-bytes ere Indicates amount of traffic and hence contention on the network average hops per byte =! nx d i b i Previously used metric: maximum dilation i=1! nx b i i=1 is the number of hops traversed by a message of 19

35 Mapping Simple2D to a 2D mesh Average hops per byte Default (METIS) BFT MHT COCE COCE+MHT AFFN PAIRS Number of nodes in the communication graph 20

36 Mapping Simple2D to a 2D mesh Average hops per byte Default (METIS) BFT MHT COCE COCE+MHT AFFN 512 x X X 8 64 X X 32 Different aspect ratios of a 2D mesh 21

37 Mapping Simple2D to a 3D mesh Average hops per byte Random Default (METIS) BFT MHT COCE+MHT Number of nodes in the communication graph 22

38 Summary Heuristics for mapping irregular graphs to mesh topologies best heuristic chosen at runtime (based on hop-bytes) Extensible to other topologies also Mapping library to help the application developer 23

39 Questions? Abhinav Bhatele, Automating Topology Aware Mapping for Supercomputers, PhD Thesis, Department of Computer Science, University of Illinois. LLNL-PRES , P. O. Box 808, Livermore, CA 94551

Automated Mapping of Regular Communication Graphs on Mesh Interconnects

Automated Mapping of Regular Communication Graphs on Mesh Interconnects Abhinav Bhatele, Gagan Gupta, Laxmikant V. Kale and I-Hsin Chung Motivation Running a parallel application on a linear array of processors: