A Classiﬁcaon of Scienﬁc Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard

Size: px

Start display at page:

Download "A Classiﬁca*on of Scien*ﬁc Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard"

Marybeth Booker
5 years ago
Views:

Moreland Berk Geveci Kwan- Liu Ma Robert

managed and operated by Sandia Corporation,

1 A Classiﬁca*on of Scien*ﬁc Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard Sandia Na*onal Laboratories Kitware, Inc. University of California at Davis Kitware, Inc. November 17, 2013 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL SAND NO P

Cielo AMD x86 Full x86 Core + Associated

2 Cielo AMD x86 Full x86 Core + Associated Cache 8 cores per die MPI- Only feasible 1mm 1 x86 core 1 Kepler core Trinity (op*on 1) NVIDIA GPU 2,880 cores collected in 15 SMX Shared PC, Cache, Mem Fetches Reduced control logic MPI- Only not feasible

3 Extreme Scale is Threads, Threads, Threads! Jaguar XT5 Titan XK7 Exascale* Cores 224, ,008 and 18,688 gpu To succeed at extreme scale, you need to consider the finest possible level of concurrency Expect each thread to process exactly one element Disallow communica*on among threads 1 billion Concurrency 224,256 way million way billion way Memory 300 Terabytes 700 Terabytes 128 Petabytes *Source: Scien*fic Discovery at the Exascale, Ahern, Shoshani, Ma, et al.

4 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

5 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

6 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

7 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

8 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

9 Key Principles [Law et al., 1999] Data Separability Mappable Input Filter Result Invariant Filter Filter Filter

10 Mappable Input Can Break Down Coarse Parallelism Result reasonably divided among par**ons. Massive Parallelism Most par**ons are empty.

11 Result Invariant Can Break Down

12 Result Invariant Can Break Down Repeated Ver*ces

13 Result Invariant Can Break Down

14 New Key Principles Coarse Parallel Massive Parallel Data Separability Data Separability Mappable Input Discoverable Input Mapping Result Invariant Collec*ve Work

15 Characteriza*on and Classifica*on Key Principles Parameters Data Separability Separable Element Discoverable Input Mapping Point Mapping Cell Mapping Field Mapping Collec*ve Work Collec*ve Work

16 Basic Mapping Input Output

17 Basic Mapping f( ) x f(x) Input Output

18 Basic Mapping Separable Element Any Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping 1 to 1 Collec@ve Work None

19 Basic Mapping functor()

20 Map by Cell Input Output

21 Map by Cell f(,,, ) x 4 x 3 f(x 1, x 2, x 3, x 4 ) x 1 x 2 Input Output

22 Map by Cell Separable Element Cell Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping Points on cell to cell Work None

23 Map by Cell functor()

24 Reconnect Cell Input Output

25 Reconnect Cell Input Output

26 Reconnect Cell Input Output

27 Reconnect Cell Separable Element Cell Point Mapping 1 to 0 or 1 Cell Mapping 1 to 0 or more Field Mapping Iden*ty Collec@ve Work None (or find unused)

28 Reconnect Cell Keep these Remove these Output

29 Build Independent Topology Input Output

30 Build Independent Topology Input Output

31 Build Independent Topology Separable Element Any Point Mapping 1 element to many points Cell Mapping 1 element to many cells (constant number) Field Mapping Iden*ty Collec@ve Work None

32 Build Connected Topology Input Output

33 Build Connected Topology Input Output

34 Build Connected Topology Input Output

35 Build ConnectedTopology Separable Element Cell Point Mapping 1 cell to 0 or more points Cell Mapping 1 to 0 or more Field Mapping Interpolated points Collec@ve Work Resolve duplicate points

36 Build Connected Topology Merge points Output

37 Capture Cell Adjacencies Input Output

38 Capture Cell Adjacencies f(,,, ) x 4 x 3 f(x 1, x 2, x 3, x 4 ) x 1 x 2 Input Output

39 Capture Cell Adjacencies Separable Element Point, edge, or face Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping Interpolated incident fields Work Find incidence rela*onships

40 Capture Cell Adjacencies x 4 x 3 x 1 x 2 Incidence rela*onships may not be explicitly captured by topology structure. Input

41 Globally Reduce f/(x 1,x 2,x 3, ) Input

42 Globally Reduce f/(x 1,x 2,x 3, ) Input

43 Globally Reduce f/(x 1,x 2,x 3, ) Input

44 Globally Reduce Separable Element Any Point Mapping Cell Mapping Field Mapping All to 1 Collec@ve Work Algorithms None in output None in output Global Reduc*on Histogram Integrate Outline (find bounds) Sta*s*cs

45 Query Data x f(x) Search Structure Output

46 Query Data Separable Element Point, key, or query Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping 1 query to 1 output Collec@ve Work Building query structure

47 Summary

48 Conclusion Accelerators and other emerging architectures use massive threading that takes parallelism to a whole different scale Visualiza*on algorithms can be characterized by: How data is separated How input elements are mapped to output elements What collec*ve work is performed Algorithms that share all three characteriza*ons can be implemented with similar mechanisms We can exploit this observa*on to implement new algorithms

Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale

Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale GPU Technology Conference March 26, 2014 Kenneth Moreland Sandia Na5onal Laboratories Robert Maynard Kitware, Inc. Sandia National