Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX

Size: px
Start display at page:

Download "Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX"

Transcription

1 Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX

2 Outline Introduction Knights Ferry Technical Specifications CFD Governing Equations Numerical Algorithm Solver Pseudocode Test Problems Intel MIC Performance of Test Problems Intel MIC Performance after Optimization General Observations

3 Introduction CPU clock speeds have recently hit a plateau, and further improvements are now achieved through parallel computing Intel has introduced the Intel Many Integrated Core (Intel MIC) architecture that combines many Intel CPU cores onto a single chip Two CFD solvers using OpenMP are developed for use on the Intel MIC software development platform known as Knights Ferry (KNF) The solvers currently run in native mode only, meaning the entire application is launched on Knights Ferry card itself A strong scaling study is done to determine how well these OpenMP applications can effectively use the many cores

4 Knights Ferry Knights Ferry (KNF) is the software development platform (SDP) for the Intel Many Integrated Core (Intel MIC) architecture. Core Count 32 cores Hardware Threads 4 per core IO Bus Memory Type Memory Size PCIe Gen2 GDDR5 2 Gigabytes

5 Euler Equations!!!"!!!!"!!!!"!!!!"!!!!!!!!"!!"!!"!!!!!!!"!!!!!!!!"#!!"#!!!!!!!!!!"!!"#!!!!!!!!"#!!!!!!!!!!"!!"#!!"#!!!!!!!!!!!!!!!!"#!$%&!'&(#"$)!!!!!!!!!!*+&!$%&!,&-./"$"&#!!!"#!$%&!0+&##1+&!!!"#!$%&!$.$*-!&(&+2)!0&+!1("$!,.-13&!

6 BGK Model Boltzmann Equation!!!!"!!!!!!!!!!!!!"!!!!!"!!!!!!!"!!!!!!!!!!!!!!"!!!!"#!$%&!'()*+*","$-!."#$("*/$")0!1/02$")0!+$!."#2(&$&!3&,)2"$-!!!!!!!!!!!!!!!!!!+(&!$%&!."#2(&$&!4),&2/,+(!3&,)2"$"&#!!!!!!!!!+(&!$%&!*/,5!6+3&(+7&8!3&,)2"$"&#!!!"#!$%&!0/4*&(!.&0#"$-!!!"#!$%&!$&4'&(+$/(&!!!"#!$%&!7+#!3"#2)#"$-!"0.&9! :0!"#!$%&!:0/.#&0!0/4*&(!;%"2%!2%+(+2$&("<&#!$%&!1,);! :0!"#!$%&!(+$")!*&$;&&0! $%&!4&+0!1(&&!'+$%!!!!+0.!$%&!2%+(+2$&("#$"2!,&07$%!!!!!

7 Comparison of the CFD solvers Two separate CFD solvers are developed to showcase the capability of the Intel MIC coprocessors The first is based on the Euler equations The second is based on the Boltzmann equation Both are developed using a Newton based iterative algorithm to converge the solutions Data parallelism on the Intel MIC coprocessor cores is achieved through the use of OpenMP threads. Euler Solver Boltzmann Solver Number of equations per physical grid point 5 Hundreds of thousands Target applications Inviscid fluid flow Rarefied gas flow

8 & Numerical Algorithm!"#$%&'&%(%)"%$'*&+$,&(-&$./',"(%+0&!!!!0&& 122)"3',"(%&(-&4$5,(%6+&7$,8(9&*$+/),+&"%&!!!!!!!!!!!!!!!!!!!! & 58$*$&!&"+&,8$&4$5,(%&",$*',"(%&'%9&!&"+&,8$&:'3(;"'%&'+&9$-"%$9&;$)(5&!!"!!!!!!! &<&!!!!!!!& =$'**'%>"%>&,$*7+&)$'9+&,(&!!!!!!!& 58$*$&!!!!!!!!!!! )"%$'*"A$9&+B+,$7&(-&$./',"(%+&>"#$+&!!!!!!!!!!!!!!!!"!!!!!!!!!! More details can be found in the dissertation of Glenn Brook Brook, R. Glenn, A Parallel, Matrix-Free Newton Method for Solving Approximate Boltzmann Equations on Unstructured Topologies, PhD Dissertation, University of Tennessee at Chattanooga, December &

9 Numerical Algorithm Continued!"#$%&'()*$*+#,&+*-#$./0&+#$#1.&+*(2$'&2$)#$ '&3+$*2+($&$0#4+&$5(,6.4&+*(2$&3$3"(72$)#4(7$!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! $!!!"#$8&'()*&2$0(#3$2(+$2##0$+($)#$ #9/4*'*+4:$'&4'.4&+#0$&20$3+(,#0$ ;+$'&2$)#$'&4'.4&+#0$*6/4*'*+4:$+",(.<"$ +"#$.3#$(5$0.&4$2.6)#,3$&20$!&:4(,$ 3#,*#3$#9/&23*(23$*2$+"#$0.&4$3/&'#=$!!!!!!!!!!!!"#$!!!!!!!!!!!!!!! $!!!!"#$!!!!!!!!!!!!!!! $!!!!!!!"#$!!!!!!!!!!! $ $ 7"#,#$!!!!>$!!!$*3$&2$&,)*+,&,:$/#,+.,)&+*(2$ &20$!! $*3$!+"$-#'+(,$*2$+"#$3+&20&,0$)&3*3$5(,$!! =$!!!!!!!!!!! $

10 Solver Pseudocode Parallel for loop over grid points Set initial values End parallel for loop over grid points Loop over timesteps Loop over number of Newton iterations Apply boundary conditions Parallel for loop over grid points Reset values of dq to zero End parallel for loop over grid points Loop over number of Jacobi iterations Parallel for loop over grid points Solve for ddq via Jacobi iteration End parallel for loop over grid points Parallel for loop over grid points Update dq using ddq End parallel for loop over grid points End loop over number of Jacobi iterations Parallel for loop over grid points Apply dq End parallel for loop over grid points End loop over number of Newton iterations End loop over timesteps

11 Test Problems Run on the Intel MIC Sod Shock Initial conditions:! =1.0 u = 0.0 P =1.0! = u = 0.0 P = 0.1 Unsteady flow problem using the Euler equations The first test problem uses the Euler solver to simulate a Sod Shock. In a shock wave, the properties of a fluid change almost instantaneously. The standard Sod Shock starts off with a fluid at rest with initial conditions shown to the left. The Sod Shock is a popular test case for verifying a solver s ability to appropriately capture shocks and contact discontinuities in unsteady fluid flows.

12 Test Problems Run on the Intel MIC The second test problem uses the Boltzmann solver to simulate a Couette flow. In Couette flows, gas is initially at rest between two infinitely long parallel plates. For this problem, the left plate is stationary while the right plate moves. Over time, the gas settles into a solution that does not change. Couette flow makes a great test problem to verify a solver s ability to handle solid surfaces and moving boundary conditions. u wall = 0m/s T wall = 273.0K Couette Flow! 0 = 9.28!10 "8 kg/m 3 u x0 = 0.0m/s u y0 = 0.0m/s T 0 = 273.0K Kn =1.199 Steady state flow problem using the BGK model Boltzmann equation u wall = 300m/s T wall = 273.0K This particular test problem used 27 grid cells in physical space and 36x36x36 grid points in velocity space

13 Euler Solver Solution

14 BGK Model Boltzmann Solver Solution

15 Intel MIC Performance 32 Strong Scaling Study of Euler Solver on KNF 16 Speedup 8 4 Single Precision 2 Double Precision Double Precision (2x Problem Size) Number of OpenMP Threads

16 Intel MIC Performance Continued 32 Strong Scaling Study of BGK Solver on KNF 16 Speedup 8 4 Single Precision Number of OpenMP Threads

17 Initial Intel MIC Performance Remarks Previous speedup results were gathered using untuned and unoptimized versions of the solvers Rob Van der Winjgaart, a senior Intel software engineer, put the model BGK Boltzmann code through phases of optimization His optimizations include fusing loops to expose more parallelism vectorizing loops through alignment and compiler directives using intrinsics where appropriate

18 Intel MIC Performance Revisited 64 Strong Scaling Study of Optimized BGK Solver on KNF Speedup No affinity Compact Scatter Number of OpenMP Threads

19 Couette Problem Revisited In the previous Euler solution, doubling the problem size improved the overall speedup plot Here, the Couette Problem is rerun using the optimized solver provided by Rob, but this time using 37 grid cells, and 46x46x46 grid points in velocity space That is over a 200% increase in the number of state variables to be solved

20 64 Strong Scaling Study of Optimized BGK Solver on KNF with Larger Problem Size 32 Speedup No affinity Compact Scatter Number of OpenMP Threads

21 Observations from CFD Applications developed for Intel MIC Performance results indicate that iterative algorithms using OpenMP threads can effectively use the Intel MIC cores General optimization, vectorization of loops, and making sure that each thread has plenty of work are all key to using the Intel MIC cards effectively Parallel speedups greater than the total number of cores available can be achieved

22 Acknowledgments Thanks to Intel and NICS for making this research possible Special thanks to Rob van der Wijngaart for optimizing the BGK model Boltzmann solver and explaining the optimizations Particular thanks to R. Glenn Brook for coordinating this research and providing guidance and support

23 Contact Ryan C. Hulguin

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload

More information

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors

Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors Vincent C. Betro, Ph.D. Computational Scientist National Institute for

More information

Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard

Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard Dimitri P. Tselepidakis & Lewis Collins ASME 2012 Verification and Validation Symposium May 3 rd, 2012 1 Outline

More information

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 First-Order Hyperbolic System Method If you have a CFD book for hyperbolic problems, you have a CFD book for all problems.

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

Supersonic Flow Over a Wedge

Supersonic Flow Over a Wedge SPC 407 Supersonic & Hypersonic Fluid Dynamics Ansys Fluent Tutorial 2 Supersonic Flow Over a Wedge Ahmed M Nagib Elmekawy, PhD, P.E. Problem Specification A uniform supersonic stream encounters a wedge

More information

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

More information

Validation of Tools to Accelerate High-Speed CFD Simulations Using OpenFOAM

Validation of Tools to Accelerate High-Speed CFD Simulations Using OpenFOAM Validation of Tools to Accelerate High-Speed CFD Simulations Using OpenFOAM Daniel E. R. Espinoza, Thomas J. Scanlon and Richard E. Brown Centre for Future Air-Space Transportation Technology, University

More information

Ryan Hulguin

Ryan Hulguin Ryan Hulguin ryan-hulguin@tennessee.edu Outline Beacon The Beacon project The Beacon cluster TOP500 ranking System specs Xeon Phi Coprocessor Technical specs Many core trend Programming models Applications

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization

Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization Final Presentation Dylan Jude Graduate Research Assistant University of Maryland AMSC 663/664 May 4, 2017

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture

Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture 1 Introduction Robert Harkness National Institute for Computational Sciences Oak Ridge National Laboratory The National

More information

Isotropic Porous Media Tutorial

Isotropic Porous Media Tutorial STAR-CCM+ User Guide 3927 Isotropic Porous Media Tutorial This tutorial models flow through the catalyst geometry described in the introductory section. In the porous region, the theoretical pressure drop

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr.

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr. Mid-Year Report Discontinuous Galerkin Euler Equation Solver Friday, December 14, 2012 Andrey Andreyev Advisor: Dr. James Baeder Abstract: The focus of this effort is to produce a two dimensional inviscid,

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

The purpose of this tutorial is to illustrate how to set up and solve a problem using the. Moving Deforming Mesh (MDM) using the layering algorithm.

The purpose of this tutorial is to illustrate how to set up and solve a problem using the. Moving Deforming Mesh (MDM) using the layering algorithm. Tutorial: Introduction The purpose of this tutorial is to illustrate how to set up and solve a problem using the following two features in FLUENT. Moving Deforming Mesh (MDM) using the layering algorithm.

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Introduction to C omputational F luid Dynamics. D. Murrin

Introduction to C omputational F luid Dynamics. D. Murrin Introduction to C omputational F luid Dynamics D. Murrin Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat transfer, mass transfer, chemical reactions, and related phenomena

More information

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation

More information

HPC-BLAST Scalable Sequence Analysis for the Intel Many Integrated Core Future

HPC-BLAST Scalable Sequence Analysis for the Intel Many Integrated Core Future HPC-BLAST Scalable Sequence Analysis for the Intel Many Integrated Core Future Dr. R. Glenn Brook & Shane Sawyer Joint Institute For Computational Sciences University of Tennessee, Knoxville Dr. Bhanu

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY. Analyzing wind flow around the square plate using ADINA Project. Ankur Bajoria

MASSACHUSETTS INSTITUTE OF TECHNOLOGY. Analyzing wind flow around the square plate using ADINA Project. Ankur Bajoria MASSACHUSETTS INSTITUTE OF TECHNOLOGY Analyzing wind flow around the square plate using ADINA 2.094 - Project Ankur Bajoria May 1, 2008 Acknowledgement I would like to thank ADINA R & D, Inc for the full

More information

Modeling & Simulation of Supersonic Flow Using McCormack s Technique

Modeling & Simulation of Supersonic Flow Using McCormack s Technique Modeling & Simulation of Supersonic Flow Using McCormack s Technique M. Saif Ullah Khalid*, Afzaal M. Malik** Abstract In this work, two-dimensional inviscid supersonic flow around a wedge has been investigated

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Troy A. Porter Hansen Experimental Physics Laboratory and Kavli Institute for Particle Astrophysics and Cosmology Stanford

More information

Performance of Implicit Solver Strategies on GPUs

Performance of Implicit Solver Strategies on GPUs 9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used

More information

Enabling In Situ Viz and Data Analysis with Provenance in libmesh

Enabling In Situ Viz and Data Analysis with Provenance in libmesh Enabling In Situ Viz and Data Analysis with Provenance in libmesh Vítor Silva Jose J. Camata Marta Mattoso Alvaro L. G. A. Coutinho (Federal university Of Rio de Janeiro/Brazil) Patrick Valduriez (INRIA/France)

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

PROTECTION AGAINST MODELING AND SIMULATION UNCERTAINTIES IN DESIGN OPTIMIZATION NSF GRANT DMI

PROTECTION AGAINST MODELING AND SIMULATION UNCERTAINTIES IN DESIGN OPTIMIZATION NSF GRANT DMI PROTECTION AGAINST MODELING AND SIMULATION UNCERTAINTIES IN DESIGN OPTIMIZATION NSF GRANT DMI-9979711 Bernard Grossman, William H. Mason, Layne T. Watson, Serhat Hosder, and Hongman Kim Virginia Polytechnic

More information

Lattice Boltzmann with CUDA

Lattice Boltzmann with CUDA Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization

More information

Store Separation Simulation using Oct-tree Grid Based Solver

Store Separation Simulation using Oct-tree Grid Based Solver SAROD 2009 142 Symposium on Applied Aerodynamics and Design of Aerospace Vehicles (SAROD 2009) December 10-12, 2009, Bengaluru, India Store Separation Simulation using Oct-tree Grid Based Solver Saurabh

More information

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund

More information

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Juan J. Alonso Department of Aeronautics & Astronautics Stanford University CME342 Lecture 14 May 26, 2014 Outline Non-linear

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics December 0, 0 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Tutorial: Modeling Domains with Embedded Reference Frames: Part 2 Sliding Mesh Modeling

Tutorial: Modeling Domains with Embedded Reference Frames: Part 2 Sliding Mesh Modeling Tutorial: Modeling Domains with Embedded Reference Frames: Part 2 Sliding Mesh Modeling Introduction The motion of rotating components is often complicated by the fact that the rotational axis about which

More information

cuibm A GPU Accelerated Immersed Boundary Method

cuibm A GPU Accelerated Immersed Boundary Method cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,

More information

Computer Architecture and Structured Parallel Programming James Reinders, Intel

Computer Architecture and Structured Parallel Programming James Reinders, Intel Computer Architecture and Structured Parallel Programming James Reinders, Intel Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 17 Manycore Computing and GPUs Computer

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

TAU mesh deformation. Thomas Gerhold

TAU mesh deformation. Thomas Gerhold TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

SIMULATION OF A DETONATION CHAMBER TEST CASE

SIMULATION OF A DETONATION CHAMBER TEST CASE SIMULATION OF A DETONATION CHAMBER TEST CASE Daniel Hilding Engineering Research Nordic AB Garnisonen I4, Byggnad 5 SE-582 10 Linköping www.erab.se daniel.hilding@erab.se Abstract The purpose of a detonation

More information

Accelerator Programming Lecture 1

Accelerator Programming Lecture 1 Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming

More information

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:

More information

A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS

A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS L. Mangani Maschinentechnik CC Fluidmechanik und Hydromaschinen Hochschule Luzern Technik& Architektur

More information

This is an author-deposited version published in: Eprints ID: 4362

This is an author-deposited version published in:   Eprints ID: 4362 This is an author-deposited version published in: http://oatao.univ-toulouse.fr/ Eprints ID: 4362 To cite this document: CHIKHAOUI Oussama, GRESSIER Jérémie, GRONDIN Gilles. Assessment of the Spectral

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS

A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS HEFAT2012 9 th International Conference on Heat Transfer, Fluid Mechanics and Thermodynamics 16 18 July 2012 Malta A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS Muthukumaran.C.K.

More information

Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent

Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent MEGR 7090-003, Computational Fluid Dynamics :1 7 Spring 2015 Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent Rahul R Upadhyay Master of Science, Dept of Mechanical

More information

Express Introductory Training in ANSYS Fluent Workshop 02 Using the Discrete Phase Model (DPM)

Express Introductory Training in ANSYS Fluent Workshop 02 Using the Discrete Phase Model (DPM) Express Introductory Training in ANSYS Fluent Workshop 02 Using the Discrete Phase Model (DPM) Dimitrios Sofialidis Technical Manager, SimTec Ltd. Mechanical Engineer, PhD PRACE Autumn School 2013 - Industry

More information

Introduction to ANSYS CFX

Introduction to ANSYS CFX Workshop 03 Fluid flow around the NACA0012 Airfoil 16.0 Release Introduction to ANSYS CFX 2015 ANSYS, Inc. March 13, 2015 1 Release 16.0 Workshop Description: The flow simulated is an external aerodynamics

More information

Putting the Spin in CFD

Putting the Spin in CFD w h i t e p a p e r Putting the Spin in CFD insight S U M MARY Engineers who design equipment with rotating components need to analyze and understand the behavior of those components if they want to improve

More information

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,

More information

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI François Thirifay and Philippe Geuzaine CENAERO, Avenue Jean Mermoz 30, B-6041 Gosselies, Belgium Abstract. This paper reports

More information

Numerical and theoretical analysis of shock waves interaction and reflection

Numerical and theoretical analysis of shock waves interaction and reflection Fluid Structure Interaction and Moving Boundary Problems IV 299 Numerical and theoretical analysis of shock waves interaction and reflection K. Alhussan Space Research Institute, King Abdulaziz City for

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering Debojyoti Ghosh Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering To study the Dynamic Stalling of rotor blade cross-sections Unsteady Aerodynamics: Time varying

More information

CFD Simulation of a dry Scroll Vacuum Pump including Leakage Flows

CFD Simulation of a dry Scroll Vacuum Pump including Leakage Flows CFD Simulation of a dry Scroll Vacuum Pump including Leakage Flows Jan Hesse, Rainer Andres CFX Berlin Software GmbH, Berlin, Germany 1 Introduction Numerical simulation results of a dry scroll vacuum

More information

Detached Eddy Simulation Analysis of a Transonic Rocket Booster for Steady & Unsteady Buffet Loads

Detached Eddy Simulation Analysis of a Transonic Rocket Booster for Steady & Unsteady Buffet Loads Detached Eddy Simulation Analysis of a Transonic Rocket Booster for Steady & Unsteady Buffet Loads Matt Knapp Chief Aerodynamicist TLG Aerospace, LLC Presentation Overview Introduction to TLG Aerospace

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics July 11, 2016 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

New Technologies in CST STUDIO SUITE CST COMPUTER SIMULATION TECHNOLOGY

New Technologies in CST STUDIO SUITE CST COMPUTER SIMULATION TECHNOLOGY New Technologies in CST STUDIO SUITE 2016 Outline Design Tools & Modeling Antenna Magus Filter Designer 2D/3D Modeling 3D EM Solver Technology Cable / Circuit / PCB Systems Multiphysics CST Design Tools

More information

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland

More information

Oblique Shock Reflection From Wall

Oblique Shock Reflection From Wall Reflected Waves Already examined what happens when normal shock hits a boundary if incident shock hits solid wall, get reflected (normal) shock - required to satisfy velocity (bc) boundary condition (v=0)

More information

Steady Flow: Lid-Driven Cavity Flow

Steady Flow: Lid-Driven Cavity Flow STAR-CCM+ User Guide Steady Flow: Lid-Driven Cavity Flow 2 Steady Flow: Lid-Driven Cavity Flow This tutorial demonstrates the performance of STAR-CCM+ in solving a traditional square lid-driven cavity

More information

OPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL

OPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL Journal of Theoretical and Applied Mechanics, Sofia, 2013, vol. 43, No. 2, pp. 77 82 OPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL P. Dobreva Institute of Mechanics, Bulgarian

More information

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Computational Fluid Dynamics (CFD) using Graphics Processing Units Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores

More information

Experiences with ENZO on the Intel Many Integrated Core Architecture

Experiences with ENZO on the Intel Many Integrated Core Architecture Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and

More information

Fluent User Services Center

Fluent User Services Center Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume

More information

Missile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011

Missile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011 Missile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011 StarCCM_StarEurope_2011 4/6/11 1 Overview 2 Role of CFD in Aerodynamic Analyses Classical aerodynamics / Semi-Empirical

More information

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir

More information

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison

More information

Non-Newtonian Transitional Flow in an Eccentric Annulus

Non-Newtonian Transitional Flow in an Eccentric Annulus Tutorial 8. Non-Newtonian Transitional Flow in an Eccentric Annulus Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D, turbulent flow of a non-newtonian fluid. Turbulent

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Intel Xeon Phi Coprocessors

Intel Xeon Phi Coprocessors Intel Xeon Phi Coprocessors Reference: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, by A. Vladimirov and V. Karpusenko, 2013 Ring Bus on Intel Xeon Phi Example with 8 cores Xeon

More information

CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality

CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality Judd Kaiser ANSYS Inc. judd.kaiser@ansys.com 2005 ANSYS, Inc. 1 ANSYS, Inc. Proprietary Overview

More information

Heat transfer and Transient computations

Heat transfer and Transient computations Lecture Heat transfer and Transient computations 12-1 Introduction to TRANSIENT calculation 10-2 Motivation Nearly all flows in nature are transient! Steady-state assumption is possible if we: Ignore transient

More information

Express Introductory Training in ANSYS Fluent Workshop 06 Using Moving Reference Frames and Sliding Meshes

Express Introductory Training in ANSYS Fluent Workshop 06 Using Moving Reference Frames and Sliding Meshes Express Introductory Training in ANSYS Fluent Workshop 06 Using Moving Reference Frames and Sliding Meshes Dimitrios Sofialidis Technical Manager, SimTec Ltd. Mechanical Engineer, PhD PRACE Autumn School

More information

Faculty of Mechanical and Manufacturing Engineering, University Tun Hussein Onn Malaysia (UTHM), Parit Raja, Batu Pahat, Johor, Malaysia

Faculty of Mechanical and Manufacturing Engineering, University Tun Hussein Onn Malaysia (UTHM), Parit Raja, Batu Pahat, Johor, Malaysia Applied Mechanics and Materials Vol. 393 (2013) pp 305-310 (2013) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/amm.393.305 The Implementation of Cell-Centred Finite Volume Method

More information

CFD MODELING FOR PNEUMATIC CONVEYING

CFD MODELING FOR PNEUMATIC CONVEYING CFD MODELING FOR PNEUMATIC CONVEYING Arvind Kumar 1, D.R. Kaushal 2, Navneet Kumar 3 1 Associate Professor YMCAUST, Faridabad 2 Associate Professor, IIT, Delhi 3 Research Scholar IIT, Delhi e-mail: arvindeem@yahoo.co.in

More information

A new multidimensional-type reconstruction and limiting procedure for unstructured (cell-centered) FVs solving hyperbolic conservation laws

A new multidimensional-type reconstruction and limiting procedure for unstructured (cell-centered) FVs solving hyperbolic conservation laws HYP 2012, Padova A new multidimensional-type reconstruction and limiting procedure for unstructured (cell-centered) FVs solving hyperbolic conservation laws Argiris I. Delis & Ioannis K. Nikolos (TUC)

More information

Parallel Poisson Solver in Fortran

Parallel Poisson Solver in Fortran Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first

More information

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Objective: The objective of this laboratory is to introduce how to use FLUENT to solve both transient and natural convection problems.

More information

Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform

Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform Karl W. Schulz Rhys Ulerich Nicholas Malaya Paul T. Bauman Roy Stogner Chris Simmons Abstract This paper presents

More information

A Hybrid Cartesian Grid and Gridless Method for Compressible Flows

A Hybrid Cartesian Grid and Gridless Method for Compressible Flows rd AIAA Aerospace Sciences Meeting and Exhibit,, January 5, Reno, Nevada A Hybrid Cartesian Grid and Gridless Method for Compressible Flows Hong Luo and Joseph D. Baum Science Applications International

More information

Code modernization of Polyhedron benchmark suite

Code modernization of Polyhedron benchmark suite Code modernization of Polyhedron benchmark suite Manel Fernández Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 18 th 2016, Barcelona Approaches for

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

Adaptive Mesh Refinement of Supersonic Channel Flows on Unstructured Meshes*

Adaptive Mesh Refinement of Supersonic Channel Flows on Unstructured Meshes* International Journal of Computational Fluid Dynamics, 2003 Vol. not known (not known), pp. 1 10 Barron Special Issue Adaptive Mesh Refinement of Supersonic Channel Flows on Unstructured Meshes* R.C. RIPLEY

More information

EVALUATE SHOCK CAPTURING CAPABILITY WITH THE NUMERICAL METHODS IN OpenFOAM

EVALUATE SHOCK CAPTURING CAPABILITY WITH THE NUMERICAL METHODS IN OpenFOAM THERMAL SCIENCE: Year 2013, Vol. 17, No. 4, pp. 1255-1260 1255 Open forum EVALUATE SHOCK CAPTURING CAPABILITY WITH THE NUMERICAL METHODS IN OpenFOAM by Reza KHODADADI AZADBONI a*, Mohammad Rahim MALEKBALA

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information