High End Computing Is Bringing Atomistic Simulation To Macroscopic

Size: px
Start display at page:

Download "High End Computing Is Bringing Atomistic Simulation To Macroscopic"

Transcription

1 Virtualization-aware Application Framework for High-end Classical-quantum Atomistic Simulations of Nanosystems Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science, Department of Physics & Astronomy, Department of Materials Science & Engineering University of Southern California Collaborators: Rajiv K. Kalia, Ashish Sharma, Priya Vashishta (CACS), Hiroshi Iyetomi (Niigata), Hideaki Kikuchi (Louisiana), Shuji Ogata (Nagoya Inst. Tech.), Fuyuki Shimojo (Kumamoto), Kenji Tsuruta (Okayama) High End Computing Is Bringing Atomistic Simulation To Macroscopic 1.5 billion atom molecular dynamics simulation on 1,024 IBM SP3 processors at NAVO-MSRC 0.3 µm Color code: Si 3 N 4 ; SiC; SiO 2 Computing beyond Teraflop: Grid of distributed supercomputers?

2 Parallel Benchmark Platforms Earth Simulator DoD-NAVO IBM SP4 LSU/Atipa Linux cluster SuperMike Computing Beyond Teraflop: Grid of Globally Distributed PC Clusters? NASA Information Power Grid Grid of globally distributed supercomputers LSU/Atipa Linux cluster SuperMike Commodity Xeon-based multi-teraflop Linux cluster

3 Virtualization-aware Application Framework Atomistic materials simulation methods Molecular Dynamics (MD) Quantum Mechanics (QM) r ij i j r ik k Atom Electron density Virtualized applications Grid Scalability Portable performance Adaptation Data-locality principles Molecular Dynamics: N-Body Problem Newton s equation of motion d 2 r m i i dt 2 = V r N ( ) (i =1,..., N) r i N-body problem O(N 2 ) Long-range electrostatic interaction N q i V es (x) = x = x j (j = 1,...,N) i=1 x x i Application: drug design, robotics, entertainment, etc. A scene from the movie Twister

4 Spatial Locality: Fast Multipole Method 1. Clustering: Encapsulate far-field information using multipoles l N V ( x)= q i r l i Y *m Y l ( θ i,φ i ) m l ( θ,φ) r l+1 (r i,θ i,φ i ) l=0m= l i=1 2. Hierarchical abstraction: Octree data structure 3. O(N) algorithm: Constant number of interactive cells per octree node l = 0 l = 1 (r,θ,φ) delegate compute inherit l = 2 l = 3 Temporal Locality: Multiple Time Stepping Different force-update schedules for different force components i) Reduced computation ii) Enhanced data locality & parallel efficiency Far field: every steps Near field Tertiary: every steps Secondary: every 5-15 steps Reversible symplectic integrator Simulation-loop invariant: phase-space volume Long-time stability T (p N n t,r n t (p N 0,r N 0 ) N ) Primary: every step 0 I ( p N n t,r N n t ) I 0 (p N 0,r N 0 ) = 0 I I 0

5 Oxide Growth in an Al Nanoparticle Unique metal/ceramic nanocomposite Al AlO x 70 Å 110 Å Oxide thickness saturates at 40 Å after 0.5 ns Excellent agreement with experiments Quantum N-Body Problem Challenge: Exponential complexity Density functional theory (DFT) (Kohn, Nobel Chemistry Prize, 98) { ψ n (r) n =1,K,N el } O(C N ) O(N 3 ) ψ(r 1,r 2,K,r N el ) > Pseudopotential (Troullier & Martins, 91) > Generalized gradient approximation (Perdew, 96) Minimize: N E[ { ψ n }]= d 3 * h 2 2 el rψ n (r) 2m e r 2 +V ion (r) ψ n (r) + e2 2 n=1 Constrained minimization problem: with orthonormal constraints: d 3 * rψ m (r)ψ n (r) =δ mn N el Charge density: ρ(r) = ψ n (r) 2 n=1 d 3 rd 3 r ρ(r)ρ( r ) r r + E XC [ ρ(r) ]

6 Real-Space DFT on Hierarchical Grids Efficient parallelization of DFT: real-space approaches High-order finite difference (Chelikowsky, Troullier, Saad, 94) Multigrid acceleration (Bernholc et al., 96; Beck, 00) Double-grid method (Ono, Hirose, 99) Spatial decomposition/divide-&-conquer Multigrid acceleration of preconditioned conjugate gradient Adaptive high-resolution grid for atomic pseudopotentials Quantum-Nearsightedness Locality Principle O(N) DFT algorithm (Mauri & Galli, 94) Asymptotic decay of density matrix: Localized functions: N el ρ(r, r * ) ψ n (r)ψ n ( r ) n=1 ( ) exp C r r ψ n (r) φ m (r) φ m (r) = n ψ n (r)u nm R cut Unconstrained minimization: N wf N wf m=1 n=1 ρ(r, r ) ( ) E [{ φ n }]= d 3 rφ * m (r)(h ηi)φ n (r) 2δ nm d 3 rφ * n (r)φ m (r) + ηn el

7 Data Locality in Parallelization Challenge: Load balancing for irregular data structures Irregular data-structures/ processor-speed Map Parallel computer Optimization problem: Minimize the load-imbalance cost Minimize the communication cost Topology-preserving spatial decomposition structured 6-step message passing minimizes latency E = t comp ( max p {i r i p} )+ t comm max p {i r i p < r c } +t latency max p N message ( p) ( [ ]) ( ) Computational-Space Decomposition Topology-preserving computational-space decomposition in curved space Curvilinear coordinate transformation ξ = x + u(x) Particle-processor mapping: regular 3D mesh topology ( ) p( ξ i )= p x ( ξ ix )P y P z + p y ( ξ iy )P z + p z ξ iz p α ( ξ iα )= ξ iα P α /L α ( α = x, y,z) Regular mesh topology in computational space, ξ Curved partition in physical space, x

8 Wavelet-based Adaptive Load Balancing Simulated annealing to minimize the load-imbalance & communication costs, E[ξ(x)] Wavelet representation speeds up the optimization ξ(x) = x + d lm ψ lm (x) l,m 3000 Total cost Plane wave Wavelet CPU time (min) Locality in Data Compression Massive data transfer via wide area network: 75 GB/step of data for 1.5 billion-atom MD! Compressed software pipeline Scalable encoding: Store relative positions on spacefilling curve: O(NlogN) O(N) Result: Data size, 50 Bytes/atom 6 Bytes/atom

9 Spacefilling-curve Data Compression Algorithm: 1. Sort particles along the spacefilling curve 2. Store relative positions: O(NlogN) O(N) Adaptive variable-length encoding to handle outliers User-controlled error bound Result: An order-of-magnitude reduction of I/O size: 50 6 Bytes/atom Scalable Scientific Algorithm Suite Design-space diagram on 1,024 IBM SP3 & Cray T3E processors On 1,024 IBM SP3 processors: 6.44-billion-atom MD of SiO 2 444,000-electron DFT of GaAs Supercomputing 2001 Best Technical Paper Award

10 Grid Enabling: Multiple QM Clustering system E = E MD + cluster cluster [EQM ({rqm },{r HS }) E MD ({rqm },{r HS })] cluster Divide-&-conquer QM2 MD QM1 QM3 Global Collaborative Simulation Hybrid MD/QM simulation on a Grid of distributed PC clusters in the US & Japan Japan:Yamaguchi 65 P4 2.0GHz Hiroshima, Okayama, Niigata 3 24 P4 1.8GHz US: Louisiana 17 Athlon XP 1900+

11 Preliminary Benchmark Results MD 91,256 atoms QM (DFT) 76n atoms on n clusters Scaled speedup, P = 1 (for MD) + 8n (for QM) Efficiency = 94.0% on 25 processors over 3 PC clusters in the US & Japan Environmental Effect on Fracture Reaction of H 2 O molecules at a Si crack tip MD QM Blue: Oxygen White: Hydrogen Green: Silicon

12 Data Locality in Visualization Octree-based fast view-frustum culling Probabilistic occlusion culling Parallel/distributed processing Reduced data Graphics server W A N D User position PC cluster ImmersaDesk Interactive visualization of a billion-atom dataset in immersive environment Octree-based View-Frustum Culling Use the octree data structure to efficiently select only visible atoms Complexity Insertion into octree: O(N) Data extraction: O(logN)

13 Probabilstic Occlusion Culling Remove atoms that are occluded by other atoms closer to the viewer Draw fewer atoms per region as the distance of a region from the viewer increases Without occlusion With occlusion 68% fewer atoms & 3 times higher frame rate Distributed Architecture RENDERING & VISUALIZATION MODULE RENDERING SYSTEM PER-ATOM OCCLUDER USER POSITION NEAR COMPLETE LIST OF VIEWABLE ATOMS OCTREE BASED DATA EXTRACTION MODULE PROBABALISTIC OCCLUSION CULLING MODULE REGIONS OF INTEREST TCP/IP SOCKET

14 Parallel & Distributed Atomsviewer Real-time walkthrough for a billion atoms on an SGI Onyx2 (2 MIPS R10K, 4GB RAM) connected to a PC cluster (4 800MHz P3) IEEE Virtual Reality 2002 Best Paper 209 Million Atom MD of Hypervelocity Impact AlN plate with impact velocity 15 km/s

15 Application of Multiscale Simulations Oxidation on Si Surface MD FE QM cluster QM O MD Si QM Si Handshake H Interfacing quantum-dot devices & biological cells (e.g. neural implant to restore vision) Computational Science Education Dual-Degree Program Ph.D. in physical/biological sciences & MS from computer science Computational Science Workshop for Underrepresented Groups Intercontinental Courses The Netherlands Delft Univ. T3E Origin SP Alpha cluster USA Louisiana State Univ. VR workbench Video Conferencing Virtual Classroom ImmersaDesk Chat Tool Whiteboard Tool

16 Conclusion Multiscale simulation approach: Can be virtualized on a Grid of distributed PC clusters through data-locality principles Will allow global collaboration of scientists to increase the scope & size of simulation study

Massive Dataset Visualization

Massive Dataset Visualization Massive Dataset Visualization Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials Science,

More information

Grid Computing: Application to Science

Grid Computing: Application to Science Grid Computing: Application to Science Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials

More information

How does a crack propagate in a composite

How does a crack propagate in a composite H IGH-D IMENSIONAL D ATA LARGE MULTIDIMENSIONAL DATA VISUALIZATION FOR MATERIALS SCIENCE Materials scientists use scientific visualization to explore very large multidimensional data sets. The Atomsviewer

More information

Optimizing Molecular Dynamics

Optimizing Molecular Dynamics Optimizing Molecular Dynamics This chapter discusses performance tuning of parallel and distributed molecular dynamics (MD) simulations, which involves both: (1) intranode optimization within each node

More information

NAMD Serial and Parallel Performance

NAMD Serial and Parallel Performance NAMD Serial and Parallel Performance Jim Phillips Theoretical Biophysics Group Serial performance basics Main factors affecting serial performance: Molecular system size and composition. Cutoff distance

More information

Techniques and algorithms for immersive and interactive visualization of large datasets

Techniques and algorithms for immersive and interactive visualization of large datasets Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2002 Techniques and algorithms for immersive and interactive visualization of large datasets Ashish Sharma Louisiana State

More information

Scalable and portable visualization of large atomistic datasets

Scalable and portable visualization of large atomistic datasets Computer Physics Communications 163 (2004) 53 64 www.elsevier.com/locate/cpc Scalable and portable visualization of large atomistic datasets Ashish Sharma, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta

More information

International Journal of Computational Science (Print) (Online) Global Information Publisher 2007, Vol. 1, No.

International Journal of Computational Science (Print) (Online) Global Information Publisher 2007, Vol. 1, No. International Journal of Computational Science 1992-6669 (Print) 1992-6677 (Online) Global Information Publisher 2007, Vol. 1, No. 4, 407-421 ParaViz: A Spatially Decomposed Parallel Visualization Algorithm

More information

Immersive and Interactive Exploration of Billion-Atom Systems

Immersive and Interactive Exploration of Billion-Atom Systems Ashish Sharma sharmaa@usc.edu Collaboratory for Advanced Computing and Simulations Dept. of Computer Science Immersive and Interactive Exploration of Billion-Atom Systems Aiichiro Nakano anakano@usc.edu

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor * Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Intermediate Parallel Programming & Cluster Computing

Intermediate Parallel Programming & Cluster Computing High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science

More information

Performance Analysis of Cluster based Interactive 3D Visualisation

Performance Analysis of Cluster based Interactive 3D Visualisation Performance Analysis of Cluster based Interactive 3D Visualisation P.D. MASELINO N. KALANTERY S.WINTER Centre for Parallel Computing University of Westminster 115 New Cavendish Street, London UNITED KINGDOM

More information

Parallel Molecular Dynamics

Parallel Molecular Dynamics Parallel Molecular Dynamics Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials

More information

Lecture 7: Introduction to HFSS-IE

Lecture 7: Introduction to HFSS-IE Lecture 7: Introduction to HFSS-IE 2015.0 Release ANSYS HFSS for Antenna Design 1 2015 ANSYS, Inc. HFSS-IE: Integral Equation Solver Introduction HFSS-IE: Technology An Integral Equation solver technology

More information

Scientific Visualization. CSC 7443: Scientific Information Visualization

Scientific Visualization. CSC 7443: Scientific Information Visualization Scientific Visualization Scientific Datasets Gaining insight into scientific data by representing the data by computer graphics Scientific data sources Computation Real material simulation/modeling (e.g.,

More information

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace

More information

A Kernel-independent Adaptive Fast Multipole Method

A Kernel-independent Adaptive Fast Multipole Method A Kernel-independent Adaptive Fast Multipole Method Lexing Ying Caltech Joint work with George Biros and Denis Zorin Problem Statement Given G an elliptic PDE kernel, e.g. {x i } points in {φ i } charges

More information

Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm

Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm Computer Physics Communications 131 (2000) 78 85 www.elsevier.nl/locate/cpc Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm Andrey Omeltchenko, Timothy J. Campbell,

More information

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Simulation Advances. Antenna Applications

Simulation Advances. Antenna Applications Simulation Advances for RF, Microwave and Antenna Applications Presented by Martin Vogel, PhD Application Engineer 1 Overview Advanced Integrated Solver Technologies Finite Arrays with Domain Decomposition

More information

Parallel Rendering. Jongwook Jin Kwangyun Wohn VR Lab. CS Dept. KAIST MAR

Parallel Rendering. Jongwook Jin Kwangyun Wohn VR Lab. CS Dept. KAIST MAR Optimal Partition for Parallel Rendering Jongwook Jin Kwangyun Wohn VR Lab. CS Dept. KAIST Culture Technology Research Center MAR.06.2008 Optimal Partition for Parallel Rendering Load balanced partition

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

Petascale Multiscale Simulations of Biomolecular Systems. John Grime Voth Group Argonne National Laboratory / University of Chicago

Petascale Multiscale Simulations of Biomolecular Systems. John Grime Voth Group Argonne National Laboratory / University of Chicago Petascale Multiscale Simulations of Biomolecular Systems John Grime Voth Group Argonne National Laboratory / University of Chicago About me Background: experimental guy in grad school (LSCM, drug delivery)

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model Goal Interactive Walkthroughs using Multiple GPUs Dinesh Manocha University of North Carolina- Chapel Hill http://www.cs.unc.edu/~walk SIGGRAPH COURSE #11, 2003 Interactive Walkthrough of complex 3D environments

More information

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Taisuke Boku, Hajime Susa, Masayuki Umemura, Akira Ukawa Center for Computational Physics, University of Tsukuba

More information

Molecular Dynamics and Quantum Mechanics Applications

Molecular Dynamics and Quantum Mechanics Applications Understanding the Performance of Molecular Dynamics and Quantum Mechanics Applications on Dell HPC Clusters High-performance computing (HPC) clusters are proving to be suitable environments for running

More information

Fast Multipole Method on the GPU

Fast Multipole Method on the GPU Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Dynamic Spatial Partitioning for Real-Time Visibility Determination. Joshua Shagam Computer Science

Dynamic Spatial Partitioning for Real-Time Visibility Determination. Joshua Shagam Computer Science Dynamic Spatial Partitioning for Real-Time Visibility Determination Joshua Shagam Computer Science Master s Defense May 2, 2003 Problem Complex 3D environments have large numbers of objects Computer hardware

More information

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies.

More information

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

DiFi: Distance Fields - Fast Computation Using Graphics Hardware DiFi: Distance Fields - Fast Computation Using Graphics Hardware Avneesh Sud Dinesh Manocha UNC-Chapel Hill http://gamma.cs.unc.edu/difi Distance Fields Distance Function For a site a scalar function f:r

More information

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation

More information

Harnessing GPU speed to accelerate LAMMPS particle simulations

Harnessing GPU speed to accelerate LAMMPS particle simulations Harnessing GPU speed to accelerate LAMMPS particle simulations Paul S. Crozier, W. Michael Brown, Peng Wang pscrozi@sandia.gov, wmbrown@sandia.gov, penwang@nvidia.com SC09, Portland, Oregon November 18,

More information

Visualization and VR for the Grid

Visualization and VR for the Grid Visualization and VR for the Grid Chris Johnson Scientific Computing and Imaging Institute University of Utah Computational Science Pipeline Construct a model of the physical domain (Mesh Generation, CAD)

More information

Parallel and Distributed Systems Lab.

Parallel and Distributed Systems Lab. Parallel and Distributed Systems Lab. Department of Computer Sciences Purdue University. Jie Chi, Ronaldo Ferreira, Ananth Grama, Tzvetan Horozov, Ioannis Ioannidis, Mehmet Koyuturk, Shan Lei, Robert Light,

More information

Indirect Volume Rendering

Indirect Volume Rendering Indirect Volume Rendering Visualization Torsten Möller Weiskopf/Machiraju/Möller Overview Contour tracing Marching cubes Marching tetrahedra Optimization octree-based range query Weiskopf/Machiraju/Möller

More information

Accelerating Finite Element Analysis in MATLAB with Parallel Computing

Accelerating Finite Element Analysis in MATLAB with Parallel Computing MATLAB Digest Accelerating Finite Element Analysis in MATLAB with Parallel Computing By Vaishali Hosagrahara, Krishna Tamminana, and Gaurav Sharma The Finite Element Method is a powerful numerical technique

More information

Improving the Scalability of Comparative Debugging with MRNet

Improving the Scalability of Comparative Debugging with MRNet Improving the Scalability of Comparative Debugging with MRNet Jin Chao MeSsAGE Lab (Monash Uni.) Cray Inc. David Abramson Minh Ngoc Dinh Jin Chao Luiz DeRose Robert Moench Andrew Gontarek Outline Assertion-based

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

Center Extreme Scale CS Research

Center Extreme Scale CS Research Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek

More information

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011 ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include

More information

VMD Key Features of Recent Release

VMD Key Features of Recent Release VMD 1.8.7 Key Features of Recent Release John Stone Research/vmd/ Quick Summary of Features Broader platform support Updated user interfaces Accelerated analysis, rendering, display: multi-core CPUs, GPUs

More information

by Virtual Reality System

by Virtual Reality System Particle Simulation Analysis by Virtual Reality System Hiroaki Ohtani 12) 1,2), Nobuaki Ohno 3), Ritoku Horiuchi 12) 1,2) 1 National Institute for Fusion Science (NIFS), Japan 2 The Graduate University

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Architectural constraints to attain 1 Exaflop/s for three scientific application classes

Architectural constraints to attain 1 Exaflop/s for three scientific application classes Architectural constraints to attain Exaflop/s for three scientific application classes Abhinav Bhatele, Pritish Jetley, Hormozd Gahvari, Lukasz Wesolowski, William D. Gropp, Laxmikant V. Kalé Department

More information

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Kun Yuan, Jae-Seo Yang, David Z. Pan Dept. of Electrical and Computer Engineering The University of Texas at Austin

More information

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of

More information

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS

CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS CLUSTER-BASED MOLECULAR DYNAMICS PARALLEL SIMULATION IN THERMOPHYSICS JIWU SHU * BING WANG JINZHAO WANG 2 MIN CHEN 2 WEIMIN ZHENG ( Dept. of Computer Science and Technology, Tsinghua Univ., Beijing, 00084)

More information

Large Scale Data Visualization. CSC 7443: Scientific Information Visualization

Large Scale Data Visualization. CSC 7443: Scientific Information Visualization Large Scale Data Visualization Large Datasets Large datasets: D >> 10 M D D: Hundreds of gigabytes to terabytes and even petabytes M D : 1 to 4 GB of RAM Examples: Single large data set Time-varying data

More information

Commodity Cluster Computing

Commodity Cluster Computing Commodity Cluster Computing Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne http://capawww.epfl.ch Commodity Cluster Computing 1. Introduction 2. Characterisation of nodes, parallel machines,applications

More information

ENABLING NEW SCIENCE GPU SOLUTIONS

ENABLING NEW SCIENCE GPU SOLUTIONS ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a

More information

Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters J Supercomput (2011) 57: 20 33 DOI 10.1007/s11227-011-0560-1 Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters Liu Peng Manaschai Kunaseth Hikmet Dursun Ken-ichi

More information

Splotch: High Performance Visualization using MPI, OpenMP and CUDA

Splotch: High Performance Visualization using MPI, OpenMP and CUDA Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,

More information

Multiscale Techniques: Wavelet Applications in Volume Rendering

Multiscale Techniques: Wavelet Applications in Volume Rendering Multiscale Techniques: Wavelet Applications in Volume Rendering Michael H. F. Wilkinson, Michel A. Westenberg and Jos B.T.M. Roerdink Institute for Mathematics and University of Groningen The Netherlands

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION Vignesh Adhinarayanan Ph.D. (CS) Student Synergy Lab, Virginia Tech INTRODUCTION Supercomputers are constrained by power Power

More information

Storage Hierarchy Management for Scientific Computing

Storage Hierarchy Management for Scientific Computing Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction

More information

ANSYS HPC Technology Leadership

ANSYS HPC Technology Leadership ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables

More information

Architecture Conscious Data Mining. Srinivasan Parthasarathy Data Mining Research Lab Ohio State University

Architecture Conscious Data Mining. Srinivasan Parthasarathy Data Mining Research Lab Ohio State University Architecture Conscious Data Mining Srinivasan Parthasarathy Data Mining Research Lab Ohio State University KDD & Next Generation Challenges KDD is an iterative and interactive process the goal of which

More information

Overview. Pipeline implementation I. Overview. Required Tasks. Preliminaries Clipping. Hidden Surface removal

Overview. Pipeline implementation I. Overview. Required Tasks. Preliminaries Clipping. Hidden Surface removal Overview Pipeline implementation I Preliminaries Clipping Line clipping Hidden Surface removal Overview At end of the geometric pipeline, vertices have been assembled into primitives Must clip out primitives

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Center for Computational Science

Center for Computational Science Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,

More information

MODEL-FREE, STATISTICAL DETECTION AND TRACKING OF MOVING OBJECTS

MODEL-FREE, STATISTICAL DETECTION AND TRACKING OF MOVING OBJECTS MODEL-FREE, STATISTICAL DETECTION AND TRACKING OF MOVING OBJECTS University of Koblenz Germany 13th Int. Conference on Image Processing, Atlanta, USA, 2006 Introduction Features of the new approach: Automatic

More information

Efficient use of hybrid computing clusters for nanosciences

Efficient use of hybrid computing clusters for nanosciences International Conference on Parallel Computing ÉCOLE NORMALE SUPÉRIEURE LYON Efficient use of hybrid computing clusters for nanosciences Luigi Genovese CEA, ESRF, BULL, LIG 16 Octobre 2008 with Matthieu

More information

GPU-based Distributed Behavior Models with CUDA

GPU-based Distributed Behavior Models with CUDA GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Building scalable 3D applications. Ville Miettinen Hybrid Graphics Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game

More information

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer wimmer@cg.tuwien.ac.at Visibility Overview Basics about visibility Basics about occlusion culling View-frustum culling / backface culling Occlusion

More information

Large Data Visualization

Large Data Visualization Large Data Visualization Seven Lectures 1. Overview (this one) 2. Scalable parallel rendering algorithms 3. Particle data visualization 4. Vector field visualization 5. Visual analytics techniques for

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

Defense Technical Information Center Compilation Part Notice

Defense Technical Information Center Compilation Part Notice UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP023800 TITLE: A Comparative Study of ARL Linux Cluster Performance DISTRIBUTION: Approved for public release, distribution unlimited

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

Victory Advanced Structure Editor. 3D Process Simulator for Large Structures

Victory Advanced Structure Editor. 3D Process Simulator for Large Structures Victory Advanced Structure Editor 3D Process Simulator for Large Structures Applications Victory Advanced Structure Editor is designed for engineers who need to create layout driven 3D process based structures

More information

Row Tracing with Hierarchical Occlusion Maps

Row Tracing with Hierarchical Occlusion Maps Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing

More information

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION

More information

Fast Multipole and Related Algorithms

Fast Multipole and Related Algorithms Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov Efficiency by exploiting symmetry and A general

More information

CS-235 Computational Geometry

CS-235 Computational Geometry CS-235 Computational Geometry Computer Science Department Fall Quarter 2002. Computational Geometry Study of algorithms for geometric problems. Deals with discrete shapes: points, lines, polyhedra, polygonal

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 02 130124 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Basics Image Formation Image Processing 3 Intelligent

More information

HPCS HPCchallenge Benchmark Suite

HPCS HPCchallenge Benchmark Suite HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary

More information

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs DOE Visiting Faculty Program Project Report

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs DOE Visiting Faculty Program Project Report Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs 2013 DOE Visiting Faculty Program Project Report By Jianting Zhang (Visiting Faculty) (Department of Computer Science,

More information

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?

More information

Parallel Computing: From Inexpensive Servers to Supercomputers

Parallel Computing: From Inexpensive Servers to Supercomputers Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl

More information

Particle-Based Volume Rendering of Unstructured Volume Data

Particle-Based Volume Rendering of Unstructured Volume Data Particle-Based Volume Rendering of Unstructured Volume Data Takuma KAWAMURA 1)*) Jorji NONAKA 3) Naohisa SAKAMOTO 2),3) Koji KOYAMADA 2) 1) Graduate School of Engineering, Kyoto University 2) Center for

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Introduction. Summary. Why computer architecture? Technology trends Cost issues Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have

More information

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens Ray Tracing with Multi-Core/Shared Memory Systems Abe Stephens Real-time Interactive Massive Model Visualization Tutorial EuroGraphics 2006. Vienna Austria. Monday September 4, 2006 http://www.sci.utah.edu/~abe/massive06/

More information

The Center for Computational Research

The Center for Computational Research The Center for Computational Research Russ Miller Director, Center for Computational Research UB Distinguished Professor, Computer Science & Engineering Senior Research Scientist, Hauptman-Woodward Medical

More information

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-00 USA 1 Graph/Mesh Partitioning Why do it? The graph bisection problem What are the standard heuristic

More information