A Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard

Size: px
Start display at page:

Download "A Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard"

Transcription

1 A Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard Sandia Na*onal Laboratories Kitware, Inc. University of California at Davis Kitware, Inc. November 17, 2013 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL SAND NO P

2 Cielo AMD x86 Full x86 Core + Associated Cache 8 cores per die MPI- Only feasible 1mm 1 x86 core 1 Kepler core Trinity (op*on 1) NVIDIA GPU 2,880 cores collected in 15 SMX Shared PC, Cache, Mem Fetches Reduced control logic MPI- Only not feasible

3 Extreme Scale is Threads, Threads, Threads! Jaguar XT5 Titan XK7 Exascale* Cores 224, ,008 and 18,688 gpu To succeed at extreme scale, you need to consider the finest possible level of concurrency Expect each thread to process exactly one element Disallow communica*on among threads 1 billion Concurrency 224,256 way million way billion way Memory 300 Terabytes 700 Terabytes 128 Petabytes *Source: Scien*fic Discovery at the Exascale, Ahern, Shoshani, Ma, et al.

4 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

5 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

6 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

7 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

8 Data Parallel Visualiza*on Duplicate processes run independently on different par**ons of data.

9 Key Principles [Law et al., 1999] Data Separability Mappable Input Filter Result Invariant Filter Filter Filter

10 Mappable Input Can Break Down Coarse Parallelism Result reasonably divided among par**ons. Massive Parallelism Most par**ons are empty.

11 Result Invariant Can Break Down

12 Result Invariant Can Break Down Repeated Ver*ces

13 Result Invariant Can Break Down

14 New Key Principles Coarse Parallel Massive Parallel Data Separability Data Separability Mappable Input Discoverable Input Mapping Result Invariant Collec*ve Work

15 Characteriza*on and Classifica*on Key Principles Parameters Data Separability Separable Element Discoverable Input Mapping Point Mapping Cell Mapping Field Mapping Collec*ve Work Collec*ve Work

16 Basic Mapping Input Output

17 Basic Mapping f( ) x f(x) Input Output

18 Basic Mapping Separable Element Any Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping 1 to 1 Collec@ve Work None

19 Basic Mapping functor()

20 Map by Cell Input Output

21 Map by Cell f(,,, ) x 4 x 3 f(x 1, x 2, x 3, x 4 ) x 1 x 2 Input Output

22 Map by Cell Separable Element Cell Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping Points on cell to cell Work None

23 Map by Cell functor()

24 Reconnect Cell Input Output

25 Reconnect Cell Input Output

26 Reconnect Cell Input Output

27 Reconnect Cell Separable Element Cell Point Mapping 1 to 0 or 1 Cell Mapping 1 to 0 or more Field Mapping Iden*ty Collec@ve Work None (or find unused)

28 Reconnect Cell Keep these Remove these Output

29 Build Independent Topology Input Output

30 Build Independent Topology Input Output

31 Build Independent Topology Separable Element Any Point Mapping 1 element to many points Cell Mapping 1 element to many cells (constant number) Field Mapping Iden*ty Collec@ve Work None

32 Build Connected Topology Input Output

33 Build Connected Topology Input Output

34 Build Connected Topology Input Output

35 Build ConnectedTopology Separable Element Cell Point Mapping 1 cell to 0 or more points Cell Mapping 1 to 0 or more Field Mapping Interpolated points Collec@ve Work Resolve duplicate points

36 Build Connected Topology Merge points Output

37 Capture Cell Adjacencies Input Output

38 Capture Cell Adjacencies f(,,, ) x 4 x 3 f(x 1, x 2, x 3, x 4 ) x 1 x 2 Input Output

39 Capture Cell Adjacencies Separable Element Point, edge, or face Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping Interpolated incident fields Work Find incidence rela*onships

40 Capture Cell Adjacencies x 4 x 3 x 1 x 2 Incidence rela*onships may not be explicitly captured by topology structure. Input

41 Globally Reduce f/(x 1,x 2,x 3, ) Input

42 Globally Reduce f/(x 1,x 2,x 3, ) Input

43 Globally Reduce f/(x 1,x 2,x 3, ) Input

44 Globally Reduce Separable Element Any Point Mapping Cell Mapping Field Mapping All to 1 Collec@ve Work Algorithms None in output None in output Global Reduc*on Histogram Integrate Outline (find bounds) Sta*s*cs

45 Query Data x f(x) Search Structure Output

46 Query Data Separable Element Point, key, or query Point Mapping Iden*ty Cell Mapping Iden*ty Field Mapping 1 query to 1 output Collec@ve Work Building query structure

47 Summary

48 Conclusion Accelerators and other emerging architectures use massive threading that takes parallelism to a whole different scale Visualiza*on algorithms can be characterized by: How data is separated How input elements are mapped to output elements What collec*ve work is performed Algorithms that share all three characteriza*ons can be implemented with similar mechanisms We can exploit this observa*on to implement new algorithms

Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale

Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale GPU Technology Conference March 26, 2014 Kenneth Moreland Sandia Na5onal Laboratories Robert Maynard Kitware, Inc. Sandia National

More information

Oh, Exascale! The effect of emerging architectures on scien1fic discovery. Kenneth Moreland, Sandia Na1onal Laboratories

Oh, Exascale! The effect of emerging architectures on scien1fic discovery. Kenneth Moreland, Sandia Na1onal Laboratories Photos placed in horizontal posi1on with even amount of white space between photos and header Oh, $#*@! Exascale! The effect of emerging architectures on scien1fic discovery Ultrascale Visualiza1on Workshop,

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

VTK-m: Uniting GPU Acceleration Successes. Robert Maynard Kitware Inc.

VTK-m: Uniting GPU Acceleration Successes. Robert Maynard Kitware Inc. VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware Inc. VTK-m Project Supercomputer Hardware Advances Everyday More and more parallelism High-Level Parallelism The Free Lunch Is Over (Herb

More information

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core

More information

Integrating Analysis and Computation with Trios Services

Integrating Analysis and Computation with Trios Services October 31, 2012 Integrating Analysis and Computation with Trios Services Approved for Public Release: SAND2012-9323P Ron A. Oldfield Scalable System Software Sandia National Laboratories Albuquerque,

More information

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P GO 08012011 Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary

More information

Large Scale Visualization on the Cray XT3 Using ParaView

Large Scale Visualization on the Cray XT3 Using ParaView Large Scale Visualization on the Cray XT3 Using ParaView Cray User s Group 2008 May 8, 2008 Kenneth Moreland David Rogers John Greenfield Sandia National Laboratories Alexander Neundorf Technical University

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Robert Maynard Kitware, Inc. Kenneth Moreland Sandia National Laboratories

Robert Maynard Kitware, Inc. Kenneth Moreland Sandia National Laboratories Robert Maynard Kitware, Inc. Kenneth Moreland Sandia National Laboratories Data Analysis at Extreme Data Analysis and Visualization library Multi- and Many-core processor support Allows users to create

More information

Perspectives form US Department of Energy work on parallel programming models for performance portability

Perspectives form US Department of Energy work on parallel programming models for performance portability Perspectives form US Department of Energy work on parallel programming models for performance portability Jeremiah J. Wilke Sandia National Labs Livermore, CA IMPACT workshop at HiPEAC Sandia National

More information

Outline. In Situ Data Triage and Visualiza8on

Outline. In Situ Data Triage and Visualiza8on In Situ Data Triage and Visualiza8on Kwan- Liu Ma University of California at Davis Outline In situ data triage and visualiza8on: Issues and strategies Case study: An earthquake simula8on Case study: A

More information

Implementing Many-Body Potentials for Molecular Dynamics Simulations

Implementing Many-Body Potentials for Molecular Dynamics Simulations Official Use Only Implementing Many-Body Potentials for Molecular Dynamics Simulations Using large scale clusters for higher accuracy simulations. Christian Trott, Aidan Thompson Unclassified, Unlimited

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Developing Integrated Data Services for Cray Systems with a Gemini Interconnect

Developing Integrated Data Services for Cray Systems with a Gemini Interconnect Developing Integrated Data Services for Cray Systems with a Gemini Interconnect Ron A. Oldfield Scalable System So4ware Sandia Na9onal Laboratories Albuquerque, NM, USA raoldfi@sandia.gov Cray User Group

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

PULP: Fast and Simple Complex Network Partitioning

PULP: Fast and Simple Complex Network Partitioning PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November

More information

Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales

Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales Margaret Lawson, Jay Lofstead Sandia National Laboratories is a multimission laboratory managed and operated

More information

What is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes.

What is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes. DARMA Janine C. Bennett, Jonathan Lifflander, David S. Hollman, Jeremiah Wilke, Hemanth Kolla, Aram Markosyan, Nicole Slattengren, Robert L. Clay (PM) PSAAP-WEST February 22, 2017 Sandia National Laboratories

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Venkatram Vishwanath, Mark Hereld and Michael E. Papka Argonne Na

More information

Large Scale Visualization on the Cray XT3 Using ParaView

Large Scale Visualization on the Cray XT3 Using ParaView Large Scale Visualization on the Cray XT3 Using ParaView Kenneth Moreland, Sandia National Laboratories David Rogers, Sandia National Laboratories John Greenfield, Sandia National Laboratories Berk Geveci,

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

Simple Parallel Biconnectivity Algorithms for Multicore Platforms

Simple Parallel Biconnectivity Algorithms for Multicore Platforms Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info

More information

Harnessing GPU speed to accelerate LAMMPS particle simulations

Harnessing GPU speed to accelerate LAMMPS particle simulations Harnessing GPU speed to accelerate LAMMPS particle simulations Paul S. Crozier, W. Michael Brown, Peng Wang pscrozi@sandia.gov, wmbrown@sandia.gov, penwang@nvidia.com SC09, Portland, Oregon November 18,

More information

Improving Uintah s Scalability Through the Use of Portable

Improving Uintah s Scalability Through the Use of Portable Improving Uintah s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2

More information

In situ visualization is the coupling of visualization

In situ visualization is the coupling of visualization Visualization Viewpoints Editor: Theresa-Marie Rhyne The Tensions of In Situ Visualization Kenneth Moreland Sandia National Laboratories In situ is the coupling of software with a simulation or other data

More information

Security Metrics. Mark Torgerson Sandia National Laboratories 6/20/2007. Entry I-108

Security Metrics. Mark Torgerson Sandia National Laboratories 6/20/2007. Entry I-108 Security Metrics Mark Torgerson Sandia National Laboratories 6/20/2007 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

More information

Hobbes: Composi,on and Virtualiza,on as the Founda,ons of an Extreme- Scale OS/R

Hobbes: Composi,on and Virtualiza,on as the Founda,ons of an Extreme- Scale OS/R Hobbes: Composi,on and Virtualiza,on as the Founda,ons of an Extreme- Scale OS/R Ron Brightwell, Ron Oldfield Sandia Na,onal Laboratories Arthur B. Maccabe, David E. Bernholdt Oak Ridge Na,onal Laboratory

More information

ECP Alpine: Algorithms and Infrastructure for In Situ Visualization and Analysis

ECP Alpine: Algorithms and Infrastructure for In Situ Visualization and Analysis ECP Alpine: Algorithms and Infrastructure for In Situ Visualization and Analysis Presented By: Matt Larsen LLNL-PRES-731545 This work was performed under the auspices of the U.S. Department of Energy by

More information

MRPB: Memory Request Priori1za1on for Massively Parallel Processors

MRPB: Memory Request Priori1za1on for Massively Parallel Processors MRPB: Memory Request Priori1za1on for Massively Parallel Processors Wenhao Jia, Princeton University Kelly A. Shaw, University of Richmond Margaret Martonosi, Princeton University Benefits of GPU Caches

More information

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,

More information

War Stories : Graph Algorithms in GPUs

War Stories : Graph Algorithms in GPUs SAND2014-18323PE War Stories : Graph Algorithms in GPUs Siva Rajamanickam(SNL) George Slota, Kamesh Madduri (PSU) FASTMath Meeting Exceptional service in the national interest is a multi-program laboratory

More information

Qualita've DNS Measurement Perspec'ves

Qualita've DNS Measurement Perspec'ves Qualita've DNS Measurement Perspec'ves Casey Deccio Sandia Na/onal Laboratories ISC/CAIDA Data Collabora/on Workshop Oct 22, 2012 Sandia National Laboratories is a multi-program laboratory managed and

More information

ECP VTK-m: Updating HPC Visualization Software for Exascale-Era Processors

ECP VTK-m: Updating HPC Visualization Software for Exascale-Era Processors ECP VTK-m: Updating HPC Visualization Software for Exascale-Era Processors Kenneth Moreland Sandia National Laboratories kmorel@sandia.gov David Pugmire Oak Ridge National Laboratory pugmire@ornl.gov Christopher

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Using the Cray Gemini Performance Counters

Using the Cray Gemini Performance Counters Photos placed in horizontal position with even amount of white space between photos and header Using the Cray Gemini Performance Counters 0 1 2 3 4 5 6 7 Backplane Backplane 8 9 10 11 12 13 14 15 Backplane

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Shallow Water Simulations on Graphics Hardware

Shallow Water Simulations on Graphics Hardware Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis

More information

Massively Parallel Graph Analytics

Massively Parallel Graph Analytics Massively Parallel Graph Analytics Manycore graph processing, distributed graph layout, and supercomputing for graph analytics George M. Slota 1,2,3 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia

More information

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software GPU Debugging Made Easy David Lecomber CTO, Allinea Software david@allinea.com Allinea Software HPC development tools company Leading in HPC software tools market Wide customer base Blue-chip engineering,

More information

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM 25th March, GTC 2014, San Jose CA AnE- Pekka Hynninen ane.pekka.hynninen@nrel.gov NREL is a na*onal laboratory of the U.S. Department of Energy,

More information

Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs

Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs The 34 th IEEE International Conference on Computer Design Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs Shin-Ying Lee and Carole-Jean Wu Arizona State University October

More information

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Photos placed in horizontal position with even amount of white space between photos and header Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Christopher Forster,

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

Challenges in HPC I/O

Challenges in HPC I/O Challenges in HPC I/O Universität Basel Julian M. Kunkel German Climate Computing Center / Universität Hamburg 10. October 2014 Outline 1 High-Performance Computing 2 Parallel File Systems and Challenges

More information

Hypergraph Exploitation for Data Sciences

Hypergraph Exploitation for Data Sciences Photos placed in horizontal position with even amount of white space between photos and header Hypergraph Exploitation for Data Sciences Photos placed in horizontal position with even amount of white space

More information

Aras Innovator Active Directory Sync

Aras Innovator Active Directory Sync Photos placed in horizontal position with even amount of white space between photos and header Aras Innovator Active Directory Sync Sandia National Laboratories is a multi-program laboratory managed and

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Parallel Programming Futures: What We Have and Will Have Will Not Be Enough

Parallel Programming Futures: What We Have and Will Have Will Not Be Enough Parallel Programming Futures: What We Have and Will Have Will Not Be Enough Michael Heroux SOS 22 March 27, 2018 Sandia National Laboratories is a multimission laboratory managed and operated by National

More information

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations Photos placed in horizontal position with even amount of white space between photos and header Margaret Lawson, Jay Lofstead,

More information

Asynchronous Termination Detection Module User s Guide

Asynchronous Termination Detection Module User s Guide Asynchronous Termination Detection Module User s Guide William McLon III Sandia National Laboratories wcmclen@sandia.gov 1 Introduction Interprocessor communications are performed through point-to-point

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania

More information

Microgrid Design Toolkit (MDT)

Microgrid Design Toolkit (MDT) SAND2016-8151 C Microgrid Design Toolkit (MDT) 24 October 2016 John Eddy, Ph.D. Microgrid Design Toolkit (MDT) Principal Investigator System Sustainment & Readiness Technologies Department Sandia National

More information

HPCGraph: Benchmarking Massive Graph Analytics on Supercomputers

HPCGraph: Benchmarking Massive Graph Analytics on Supercomputers HPCGraph: Benchmarking Massive Graph Analytics on Supercomputers George M. Slota 1, Siva Rajamanickam 2, Kamesh Madduri 3 1 Rensselaer Polytechnic Institute 2 Sandia National Laboratories a 3 The Pennsylvania

More information

Parallel Graph Coloring For Many- core Architectures

Parallel Graph Coloring For Many- core Architectures Parallel Graph Coloring For Many- core Architectures Mehmet Deveci, Erik Boman, Siva Rajamanickam Sandia Na;onal Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Irregular Graph Algorithms on Parallel Processing Systems

Irregular Graph Algorithms on Parallel Processing Systems Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories

More information

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens LA-UR- 14-25437 Approved for public release; distribution is unlimited. Title: Portable Parallel Halo and Center Finders for HACC Author(s): Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James

More information

SST + MacSim. Case Studies Using SST MacSim. Genie Hsieh Sandia National Labs

SST + MacSim. Case Studies Using SST MacSim. Genie Hsieh Sandia National Labs Photos placed in horizontal position with even amount of white space between photos and header SST + MacSim Case Studies Using SST MacSim Genie Hsieh Sandia National Labs Sandia National Laboratories is

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Hashing Strategies for the Cray XMT MTAAP 2010

Hashing Strategies for the Cray XMT MTAAP 2010 Hashing Strategies for the Cray XMT MTAAP 2010 Eric Goodman (SNL) David Haglin (PNNL) Chad Scherrer (PNNL) Daniel Chavarría-Miranda (PNNL) Jace Mogill (PNNL) John Feo (PNNL) Sandia is a multiprogram laboratory

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

Spatio-temporal Feature Exploration in Combined Particle/Volume Reference Frames

Spatio-temporal Feature Exploration in Combined Particle/Volume Reference Frames Spatio-temporal Feature Eploration in Combined Particle/Volume Reference Frames Franz Sauer and Kwan-Liu Ma VIDI Labs University of California, Davis 2 Introduction: Problem Statement More scientific simulations

More information

MPICH: A High-Performance Open-Source MPI Implementation. SC11 Birds of a Feather Session

MPICH: A High-Performance Open-Source MPI Implementation. SC11 Birds of a Feather Session MPICH: A High-Performance Open-Source MPI Implementation SC11 Birds of a Feather Session Schedule MPICH2 status and plans Presenta

More information

Challenges and Opportunities for HPC Interconnects and MPI

Challenges and Opportunities for HPC Interconnects and MPI Challenges and Opportunities for HPC Interconnects and MPI Ron Brightwell, R&D Manager Scalable System Software Department Sandia National Laboratories is a multi-mission laboratory managed and operated

More information

Extracting Hidden Messages in Steganographic Images

Extracting Hidden Messages in Steganographic Images DIGITAL FORENSIC RESEARCH CONFERENCE Extracting Hidden Messages in Steganographic Images By Tu-Thach Quach Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver, CO (Aug 3 rd - 6

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

8. Solving Stochastic Programs

8. Solving Stochastic Programs 8. Solving Stochastic Programs Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.

More information

CUDA (Compute Unified Device Architecture)

CUDA (Compute Unified Device Architecture) CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

Computational and Statistical Tradeoffs in VoI-Driven Learning

Computational and Statistical Tradeoffs in VoI-Driven Learning ARO ARO MURI MURI on on Value-centered Theory for for Adaptive Learning, Inference, Tracking, and and Exploitation Computational and Statistical Tradefs in VoI-Driven Learning Co-PI Michael Jordan University

More information

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms

More information

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng

More information

GPU Performance Nuggets

GPU Performance Nuggets GPU Performance Nuggets Simon Garcia de Gonzalo & Carl Pearson PhD Students, IMPACT Research Group Advised by Professor Wen-mei Hwu Jun. 15, 2016 grcdgnz2@illinois.edu pearson@illinois.edu GPU Performance

More information

Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System

Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins, Todd Harman Scientific Computing and Imaging Institute & University of Utah I. Uintah

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

Sandia National Laboratories Solutions for Aras Innovator

Sandia National Laboratories Solutions for Aras Innovator Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories Solutions for Aras Innovator Sandia National Laboratories is a multi-program

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Simulation of Workflow and Threat Characteristics for Cyber Security Incident Response Teams

Simulation of Workflow and Threat Characteristics for Cyber Security Incident Response Teams Simulation of Workflow and Threat Characteristics for Cyber Security Incident Response Teams Theodore Reed, Robert G. Abbott, Benjamin Anderson, Kevin Nauer & Chris Forsythe Sandia National Laboratories

More information

Risk Informed Cyber Security for Nuclear Power Plants

Risk Informed Cyber Security for Nuclear Power Plants Risk Informed Cyber Security for Nuclear Power Plants Phillip L. Turner, Timothy A. Wheeler, Matt Gibson Sandia National Laboratories Electric Power Research Institute Albuquerque, NM USA Charlotte, NC

More information

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable

More information

Supercomputer Field Data. DRAM, SRAM, and Projections for Future Systems

Supercomputer Field Data. DRAM, SRAM, and Projections for Future Systems Supercomputer Field Data DRAM, SRAM, and Projections for Future Systems Nathan DeBardeleben, Ph.D. (LANL) Ultrascale Systems Research Center (USRC) 6 th Soft Error Rate (SER) Workshop Santa Clara, October

More information

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information