Optimizing TELEMAC-2D for Large-scale Flood Simulations

Size: px
Start display at page:

Download "Optimizing TELEMAC-2D for Large-scale Flood Simulations"

Transcription

1 Available on-line at Partnership for Advanced Computing in Europe Optimizing TELEMAC-2D for Large-scale Flood Simulations Charles Moulinec a,, Yoann Audouin a, Andrew Sunderland a a STFC Daresbury Laboratory, UK Abstract This report details optimization undertaken on the Computational Fluid Dynamic (CFD) software suite TELEMAC, a modelling system for free surface waters with over 200 installations worldwide. The main focus of the work has involved eliminating memory bottlenecks occuring at the pre-processing stage that have historically limited the size of simulations processed. This has been achieved by localizing global arrays in the pre-processing tool, known as PARTEL. Parallelism in the partitioning stage has also been improved by replacing the serial partitioning tool with a new parallel implementation. These optimizations have enabled massively parallel runs of TELEMAC-2D, a Shallow Water Equations based code, involving over 200 million elements to be undertaken on Tier-0 systems. These runs simulate extreme flooding events on very fine meshes (locally less than one meter). Simulations at this scale are crucial for predicting and understanding flooding events occurring, e.g. in the region of the Rhine river Project ID: PRA4IC 1. Introduction An increasing number of the world s population are inhabiting areas that are at significant risk of serious flooding events, such as river basins. It is therefore essential that tools are developed to assess the impact of a flooding on wetted regions, and ultimately to better warn people of serious events. Numerical tools are of vital importance in aiding a better understanding of flooding impact. TELEMAC [1, 2] enables, among other applications, the simulation of river systems, and can model free-surface flows, including flooding, wetting and drying. The system is highly portable and has been under development for over 20 years by EDF R&D. The whole system will go Open-Source in August 2011, with TELEMAC-2D (GPL licensing), the BIEF (Bibliothèque déléments Finis) (LGPL licensing) and the pre-processing libraries already available to any user. TELEMAC-2D is based on the depth-integrated Shallow Water (hydrostatic) Equations when the horizontal length scale of the flow is greater than the vertical scale. A research project between Bundesanstalt für Wasserbau (BAW, Karlsruhe, Germany) [3] and the Science and Technology Facilities Council (STFC, Daresbury, UK) [4] has recently been agreed to investigate flooding of the Rhine river from Bonn to the North Sea. The originality of this work resides in the fact that the flooding of this long section of river (about 250kms) will be undertaken in one simulation by a 2D approach (TELEMAC- 2D) with a fine resolution of less than a metre in some parts of the mesh. It is expected that simulations involving grids of a finer resolution will produce more accurate results, thus enabling better understanding of the likelihood and extent of flooding effects at any place and time. This geometry has been meshed with 5M of elements. Some results already exist for portions of the Rhine river between Bonn and the North Sea, which have been studied by BAW, but the complete mesh has not yet been run. These intermediate data will be used for comparison. Two larger meshes have been identified to investigate the quality of the results and the sensitivity of the results to the grid size. The first mesh (20M) will be built by applying one level of refinement to the 5M element mesh and the second mesh by refining it twice (80M elements). Tier-1 systems can be used for the smaller cases, but simulations on Tier-0 systems are required to run calculations involving element meshes of 80M and beyond. Introducing the capability to prepare and compute these larger problem sizes using TELEMAC-2D is the focus of this project. TELEMAC-2D has already been successfully run on Argonne BlueGene/P (BG/P) Intrepid [5] (a machine with the same architecture as Jugene [6]) up to 16,384 cores for a straight channel demonstration case of 25M elements. Here, performance was shown to scale very well up to 4,096 cores (VN mode). To generate the input data for these cases, a parallel prototype of the pre-processor was designed, that would perform pre-processing for the 25M case much quicker, but the memory overheads, due to replicated data structures are limiting the Corresponding author. tel fax charles.moulinec@stfc.ac.uk

2 problem sizes that can be addressed. Another limitation in using this prototype for pre-processing runs is that the number of parallel tasks must match that which is used in the subsequent target calculation with TELEMAC- 2D (25M case required 16K cores of BG/P, but in SMP mode to pre-process whereas TELEMAC-2D is run in VN mode). In fact the pre-processing of any grid with > 25M elements for subsequent Tier-0 simulations therefore requires a new memory-optimized parallel version based on the original serial pre-processor called PARTEL. The number of parallel tasks used in this tool, called PARTEL P, is independent of the number of sub-domains used by TELEMAC-2D. This paper is arranged as follows: Section 2 describes the computational approach used by TELEMAC-2D, with a special focus on the formats of the IOs; Section 3 and Section 4 explains the strategy used to redesign the pre-processor; Section 5 presents the results obtained by the new pre-processor; Section 6 measures TELEMAC-2D performance for a 200-million element grid test case. 2. TELEMAC-2D Computational Approach The TELEMAC system is a multi-scale hydrodynamics free-surface suite able to solve Shallow Water Equations (TELEMAC-2D) and Navier-Stokes Equations (TELEMAC-3D) depending on the topology of the configuration and the approximation in the calculation of the vertical velocity. The system relies on the BIEF Finite Element Library. This library contains basic operations, a few linear solvers, and some of the discretisation schemes used in the hydrodynamics solvers. As the scientific project aims at solving the Shallow Water Equations, the following description is restricted to the computational properties of TELEMAC-2D. The steps to perform a simulation with the TELEMAC system proceed as follows: Generation of the grid (triangular elements), with a mesh generator taking into account the bathymetry. This step is computed in serial with the current existing tools. However it is also possible to globally refine an existing mesh to increase resolution. This is also performed in serial, but with a tool that has been recently optimised, Pre-processing, including mesh partitioning by METIS 5.0pre2 (serial version) [7] and calculation of the connectivities, boundary conditions, halo cells, and pre-processing for the method of characteristics for advection (if used). The mesh partitioning and all other pre-processing tasks are performed by the same tool, i.e. PARTEL. Serial mesh partitioning is limited by memory availability whereas the rest of the preprocessing tasks are limited by time constraints. There exist two versions of PARTEL, a fully serial one and a partially parallel prototype, which runs partitioning serially, but perform the rest of the pre-processing in parallel on the same number of processors as the number of sub-domains. This version uses global arrays, because it was designed to speed-up the pre-processing process, therefore to date no optimization in terms of memory has been undertaken, Solution of the shallow water equations using TELEMAC-2D. The equations might be solved coupled or with the help of a wave equation, depending on the option chosen. The space discretisation is, in general, linear. Several advection schemes are available and used depending on the flow, namely, the method of characteristics, the streamline-upwind Petrov-Galerkin (SUPG), Residual Distributive Schemes (N-Scheme and Psi-Scheme). Matrix-storage is edge-based. Several linear solvers are available in the BIEF library, e.g. Conjugate Gradient, Conjugate Residual, CGSTAB and GMRES. TELEMAC-2D is fully parallelised by MPI. Input files consist of a parameter file (ASCII) read by all the processors, and a geometry file (binary, SELAFIN format) and a boundary file (ASCII) per MPI task read by each processor. Those files are generated by one of the PARTEL tools (either serial or parallel), which prepares the initial geometry (binary, SELAFIN format) and boundary files (ASCII) for each MPI task. Output files are handled in the same way, with a result file (binary, SELAFIN format) per processor, as well as another output file (ASCII) per processor, showing the evolution of the simulation. The result file (binary, SELAFIN format) can also be used to restart a simulation. 3. Outline of PARTEL P To overcome the 2-10 million element grid limit of the serial pre-processor, PARTEL, a parallel version has been developed within the PRACE-1IP project, called PARTEL P, which runs on NPROCS cores and partitions grids into NSUBS sub-domains. It should be noted that the current version of PARTEL P does not support parallel IO nor the method of characteristics for the pre-processing stage Description of files output by PARTEL Two files per sub-domain are output by PARTEL; a geometry file in SELAFIN format and a boundary file in ASCII format following the TELEMAC-2D standard. The geometry file contains a header, and the number of elements, nodes, physical boundaries and interfaces for a given sub-domain. It also contains the local connectivities of the nodes, the local-to-global node table and finally the coordinates and/or other quantities known at each node. The boundary file contains the information for the physical boundaries, the number of interfaces with other sub-domains, and the information required to handle the interfaces. Each physical boundary requires knowledge of the neighbouring nodes located in a different sub-domain. The treatment of the interfaces is more

3 Table 1. CPU time (s) (IBM POWER7) for PARTEL P 1 and PARTEL P 2 to pre-process the 200-million element grid, using METIS as the partitioner. PARTEL P 1 PARTEL P 2 PARTEL P Table 2. CPU time (s) (IBM POWER7) for PARTEL P 1 and PARTEL P 2 to pre-process the 200-million element grid, using SCOTCH as the partitioner. PARTEL P 1 PARTEL P 2 PARTEL P complex as the number of contiguous sub-domains has to be known, as well as their partition index. interfaces also need to be sorted into ascending order to comply with the TELEMAC-2D standard. The 4. Description of PARTEL P PARTEL P is actually split into two parts; PARTEL P 1 is used to generate NPROCS files to be read by PARTEL P 2 so as to reduce memory consumption. These two programs will be merged in the future since both PARTEL P 1 and PARTEL P 2 are run on the same number of cores. PARTEL P 1 is used to distribute the information of NSUBS/NPROCS sub-domains over the NPROCS cores. The initial stage is mainly serial as no attempt has yet been made to improve the IO operations. The input parameters are read, i.e. the name of the geometry file, the name of the boundary file, NSUBS and the library used to partition the grid. Each processor reads the geometry and the boundary files, and calls the subroutines VOISIN PARTEL, ELEBD PARTEL and FRONT2 PARTEL. The partitioning is performed using either METIS or SCOTCH [8] on the master node (ParMETIS and PT-SCOTCH have yet to be tested), and its output is broadcast to the other cores. NSUBS/NPROCS sub-domains are gathered over the NPROCS cores in order to reduce the array sizes in PARTEL P 2. Two files per core are written; one for the geometry, and the other for the boundary conditions with some additional information compared to a regular boundary condition file. PARTEL P 1 transmits the information about the adjacent neighbouring nodes located in a different subdomain but on a different core to PARTEL P 2. Interfaces are not dealt with at this stage. PARTEL P 2 is run on NPROCS cores. It first reads the input parameters, i.e. the name of the original geometry file, the name of the original boundary file, NSUBS and NPROCS. Each core then reads the files output by PARTEL P 1 which contains information for NSUBS/NPROCS sub-domains. The number of elements, nodes, physical boundaries, and interfaces per sub-domain are easily computed. This information, together with the knowledge of the subdomain local connectivity and coordinates, helps build the NSUBS geometry files that are read by TELEMAC- 2D. The local-to-global node table is also easily accessible. The information relative to the interfaces on all NSUBS sub-domains has to be computed. The neighbouring nodes located on the same core but in a different sub-domain have to be identified. Working on a given core, all the physical boundaries are gathered in an array containing the global index. This array is sorted in ascending order and global indices that occur twice, or more, indicate that the corresponding nodes belong to several sub-domains. Their neighbours are easily identified and the array is sorted back to its original structure to comply with the TELEMAC-2D standard. The interfaces are treated globally. A loop over all the NSUBS/NPROCS sub-domains allows the code to gather the interfaces of all the NPROCS cores before using MPI Allgatherv to get their global index, as well as the index of the sub-domain they belong to. This array is sorted by global indices in ascending order. The number of consecutive occurrences, NINTERF, of a given global index indicates that the same interface belongs to NINTERF sub-domains and these partition indices have to be saved. To comply with the TELEMAC-2D standard, the information per interface has also to be sorted. All this information is then distributed in two stages, first onto the NPROCS, using an MPI Scatterv command, and then to the NSUBS sub-domains. The information relating to the physical boundaries and the interfaces is finally copied into the boundary files which are read by TELEMAC-2D. 5. Timings for PARTEL P 1 and PARTEL P 2 PARTEL P 1 and PARTEL P 2 have been run to pre-process a 200-million element grid, and the ouput is used in the next section to test TELEMAC-2D. Tables 1 and 2 indicate the total time spent by PARTEL P to pre-process the grid into 4096, 8192, 16384

4 Fig. 1. Scaling performance of TELEMAC-2D for the 200-million element grid on the IBM BG/P. and sub-domains respectively, using METIS and SCOTCH as the partitioner. Overall, partitioning by METIS allows a faster pre-processing. All PARTEL P 1 simulations are faster when METIS rather than SCOTCH is used as the partitioner. However, PARTEL P 2 is normally faster when SCOTCH is used. A more thorough study should be able to confirm whether this is due to the fact that the edge-cut should be smaller with SCOTCH, which has a direct impact on the global communications used in PARTEL P Scaling Performance of TELEMAC-2D The 200-million element grid has been used to evaluate the performance of TELEMAC-2D on 32,768 cores of Argonneś IBM Blue Gene/P [5]. PARTEL P was used to perform the pre-processing with both METIS and SCOTCH being used as partitioners. The positive stream-wise implicit (PSI) advection scheme was selected since PARTEL P does not yet support the method of characteristics. The scaling performance of TELEMAC- 2D was evaluated using simulations of 60 seconds (1200 time steps). The CPU time is reported as the time for the executable to complete (T T OT AL ), as well as the time difference between the end and the beginning of the main program, homere telemac2d.f (T SOLV ER ). Figure 6. shows that T SOLV ER decreases linearly as a function of the number of cores, whether METIS or SCOTCH is used as the partitioner. Good performances are observed with about 6100 elements per core. A 65,536 sub-domain simulation would help assess the performance of TELEMAC-2D with half this number (about 3000 elements) assigned to each core. However, T T OT AL shows a different behaviour, with no real speedup for the 32,768-core simulations. This non-scaling behaiour is probably due to the time spent opening and closing files, along with aspects of the way the system manages the simulations. 7. Summary of Results The pre-processing stage routine in PARTEL has been re-written and optimized in order to be run on NPROCS MPI tasks in PARTEL (to date typically up to 3 256GB RAM multicore nodes of an IBM PWR7 cluster) to deal with up to 100K NSUBS subdomains. This optimized pre-processing stage now enables very large scale TELEMAC-2D simulations on Jugene IBM BG/P. The pre-processing stage still takes place in two stages, with Fortran data files between, and has been optimized as follows: 1. The first stage run on NPROCS cores is the reading of the original mesh, then partitioning it into NSUBS subdomains and writing 2 files per NPROCS cores. These two files contain information for NSUBS/NPROCS subdomains. The first of the two files contains the geometry quantities (position and connectivity between elements and nodes) and the second one information concerning boundary conditions and interfaces between subdomains. 2. The second stage is also run on NPROCS and reads the output of the first stage. Improvements made to this stage result in data now being distributed, rather than replicated, thereby reducing markedly the local memory consumption. The outputs from this stage are the geometry and boundary files readable by TELEMAC-2D. First results of the new pre-processing stage applied to a 200M element grid partitioned into 32,768 subdomains on 8 and 24 MPI tasks on the IBM PWR7 cluster are now obtained in close to

5 three hours (9524 secs) rather than the previous run-times of several days with PARTEL1. The results from the new tool have been verified using different parallel runs with NPROCS=8 and NPROCS=24. Metis 5.0, ParMETIS, Scotch , and PT-Scotch have all been implemented in the pre-processing stage. A demonstration case run on the IBM PWR7 cluster has shown that serial Scotch is able to partition a 400M element demonstration case into 294,912 subdomains. This has fully prepared suitable partitioned grids for future parallel runs using large numbers of cores (up to the largest available job size) on Jugene for the large datasets described in this project. Acknowledgements This work was financially supported by the PRACE project funded in part by the EUs 7th Framework Programme (FP7/ ) under grant agreement no. RI and FP The work is achieved using the PRACE Research Infrastructure resources Jugene at Jülich, Germany. This research also used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH The authors would also like to thank the UK Engineering and Physical Sciences Research Council (EPSRC) for their support of Collaborative Computational Project 12 (CCP12) and the Distributed Computing Group at STFC Daresbury Laboratory. References 1. TELEMAC system, 2. Jean-Michel Hervouet, Hydrodynamics of Free Surface Flows: Modelling with the finite element method, Wiley. 3. BAW, 4. STFC Computational Engineering Group, Computational 5. Argonne Blue Gene /P Intrepid, 6. Jülich Blue Gene /P Jugene, 7. METIS 5.0, 8. SCOTCH,

Impact and Optimum Placement of Off-Shore Energy Generating Platforms

Impact and Optimum Placement of Off-Shore Energy Generating Platforms Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Impact and Optimum Placement of Off-Shore Energy Generating Platforms Charles Moulinec a,, David R. Emerson a a STFC Daresbury,

More information

Parallel Mesh Partitioning in Alya

Parallel Mesh Partitioning in Alya Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es

More information

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions

More information

Parallel Mesh Multiplication for Code_Saturne

Parallel Mesh Multiplication for Code_Saturne Parallel Mesh Multiplication for Code_Saturne Pavla Kabelikova, Ales Ronovsky, Vit Vondrak a Dept. of Applied Mathematics, VSB-Technical University of Ostrava, Tr. 17. listopadu 15, 708 00 Ostrava, Czech

More information

Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers.

Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. G. Borrell, J.A. Sillero and J. Jiménez, Corresponding author: guillem@torroja.dmt.upm.es School of Aeronautics, Universidad

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

MODELLING THE FLOW AROUND AN ISLAND AND A HEADLAND: APPLICATION OF A TWO MIXING LENGTH MODEL WITH TELEMAC3D. Nicolas Chini 1 and Peter K.

MODELLING THE FLOW AROUND AN ISLAND AND A HEADLAND: APPLICATION OF A TWO MIXING LENGTH MODEL WITH TELEMAC3D. Nicolas Chini 1 and Peter K. MODELLING THE FLOW AROUND AN ISLAND AND A HEADLAND: APPLICATION OF A TWO MIXING LENGTH MODEL WITH TELEMAC3D Nicolas Chini 1 and Peter K. Stansby 2 Numerical modelling of the circulation around islands

More information

Code Saturne on POWER8 clusters: First Investigations

Code Saturne on POWER8 clusters: First Investigations Code Saturne on POWER8 clusters: First Investigations C. MOULINEC, V. SZEREMI, D.R. EMERSON (STFC Daresbury Lab., UK) Y. FOURNIER (EDF R&D, FR) P. VEZOLLE, L. ENAULT (IBM Montpellier, FR) B. ANLAUF, M.

More information

Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics

Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics Thibaut Deloze a, Yannick

More information

Parallel Uniform Mesh Subdivision in Alya

Parallel Uniform Mesh Subdivision in Alya Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Uniform Mesh Subdivision in Alya G. Houzeaux a,,r.delacruz a,m.vázquez a a Barcelona Supercomputing Center, Edificio

More information

Multi-Physics Multi-Code Coupling On Supercomputers

Multi-Physics Multi-Code Coupling On Supercomputers Multi-Physics Multi-Code Coupling On Supercomputers J.C. Cajas 1, G. Houzeaux 1, M. Zavala 1, M. Vázquez 1, B. Uekermann 2, B. Gatzhammer 2, M. Mehl 2, Y. Fournier 3, C. Moulinec 4 1) er, Edificio NEXUS

More information

Uncertainty Analysis: Parameter Estimation. Jackie P. Hallberg Coastal and Hydraulics Laboratory Engineer Research and Development Center

Uncertainty Analysis: Parameter Estimation. Jackie P. Hallberg Coastal and Hydraulics Laboratory Engineer Research and Development Center Uncertainty Analysis: Parameter Estimation Jackie P. Hallberg Coastal and Hydraulics Laboratory Engineer Research and Development Center Outline ADH Optimization Techniques Parameter space Observation

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

High Performance Calculation with Code_Saturne at EDF. Toolchain evoution and roadmap

High Performance Calculation with Code_Saturne at EDF. Toolchain evoution and roadmap High Performance Calculation with Code_Saturne at EDF Toolchain evoution and roadmap Code_Saturne Features of note to HPC Segregated solver All variables are solved or independently, coupling terms are

More information

SHAPE pilot Monotricat SRL: Hull resistance simulations for an innovative hull using OpenFOAM

SHAPE pilot Monotricat SRL: Hull resistance simulations for an innovative hull using OpenFOAM Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe SHAPE pilot Monotricat SRL: Hull resistance simulations for an innovative hull using OpenFOAM Lilit Axner a,b, Jing Gong

More information

OpenFOAM Parallel Performance on HPC

OpenFOAM Parallel Performance on HPC OpenFOAM Parallel Performance on HPC Chua Kie Hian (Dept. of Civil & Environmental Engineering) 1 Introduction OpenFOAM is an open-source C++ class library used for computational continuum mechanics simulations.

More information

Introducing OpenMP Tasks into the HYDRO Benchmark

Introducing OpenMP Tasks into the HYDRO Benchmark Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Introducing OpenMP Tasks into the HYDRO Benchmark Jérémie Gaidamour a, Dimitri Lecas a, Pierre-François Lavallée a a 506,

More information

PRACE Workshop: Application Case Study: Code_Saturne. Andrew Sunderland, Charles Moulinec, Zhi Shang. Daresbury Laboratory, UK

PRACE Workshop: Application Case Study: Code_Saturne. Andrew Sunderland, Charles Moulinec, Zhi Shang. Daresbury Laboratory, UK PRACE Workshop: Application Case Study: Code_Saturne Andrew Sunderland, Charles Moulinec, Zhi Shang Science and Technology Facilities Council, Daresbury Laboratory, UK Yvan Fournier, Electricite it de

More information

Data I/O Optimization in GROMACS Using the Global Arrays Toolkit

Data I/O Optimization in GROMACS Using the Global Arrays Toolkit Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Data I/O Optimization in GROMACS Using the Global Arrays Toolkit Valentin Pavlov*, Peicho Petkov NCSA, Acad. G. Bonchev

More information

Handling Parallelisation in OpenFOAM

Handling Parallelisation in OpenFOAM Handling Parallelisation in OpenFOAM Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1 Parallelisation

More information

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms

More information

Mesh reordering in Fluidity using Hilbert space-filling curves

Mesh reordering in Fluidity using Hilbert space-filling curves Mesh reordering in Fluidity using Hilbert space-filling curves Mark Filipiak EPCC, University of Edinburgh March 2013 Abstract Fluidity is open-source, multi-scale, general purpose CFD model. It is a finite

More information

COMPUTATIONAL FLUID DYNAMICS ANALYSIS OF ORIFICE PLATE METERING SITUATIONS UNDER ABNORMAL CONFIGURATIONS

COMPUTATIONAL FLUID DYNAMICS ANALYSIS OF ORIFICE PLATE METERING SITUATIONS UNDER ABNORMAL CONFIGURATIONS COMPUTATIONAL FLUID DYNAMICS ANALYSIS OF ORIFICE PLATE METERING SITUATIONS UNDER ABNORMAL CONFIGURATIONS Dr W. Malalasekera Version 3.0 August 2013 1 COMPUTATIONAL FLUID DYNAMICS ANALYSIS OF ORIFICE PLATE

More information

Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH

Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

Hybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores

Hybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores Hybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores T/NT INTERFACE y/ x/ z/ 99 99 Juan A. Sillero, Guillem Borrell, Javier Jiménez (Universidad Politécnica de Madrid) and Robert D. Moser (U.

More information

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles Moulinec and Vendel Szeremi (STFC, Daresbury Laboratory Outline Parallel IO problem

More information

I/O Optimization Strategies in the PLUTO Code

I/O Optimization Strategies in the PLUTO Code Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe I/O Optimization Strategies in the PLUTO Code A. Mignone a,, G. Muscianisi b, M. Rivi b, G. Bodo c a Diparnto di Fisica

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Shallow Water Simulations on Graphics Hardware

Shallow Water Simulations on Graphics Hardware Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

Simulating the effect of Tsunamis within the Build Environment

Simulating the effect of Tsunamis within the Build Environment Simulating the effect of Tsunamis within the Build Environment Stephen Roberts Department of Mathematics Australian National University Ole Nielsen Geospatial and Earth Monitoring Division GeoSciences

More information

Parallel edge-based implementation of the finite element method for shallow water equations

Parallel edge-based implementation of the finite element method for shallow water equations Parallel edge-based implementation of the finite element method for shallow water equations I. Slobodcicov, F.L.B. Ribeiro, & A.L.G.A. Coutinho Programa de Engenharia Civil, COPPE / Universidade Federal

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

Unstructured grid modelling

Unstructured grid modelling Unstructured grid modelling Intercomparison between several finite element and finite volume approaches to model the North Sea tides Silvia Maßmann 1, Alexey Androsov 1, Sergey Danilov 1 and Jens Schröter

More information

High performance computing in an operational setting. Johan

High performance computing in an operational setting. Johan High performance computing in an operational setting Johan Hartnack(jnh@dhigroup.com) DHI in short We re an independent, private and not-for-profit organisation Our people are highly qualified 80% of our

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing

More information

River inundation modelling for risk analysis

River inundation modelling for risk analysis River inundation modelling for risk analysis L. H. C. Chua, F. Merting & K. P. Holz Institute for Bauinformatik, Brandenburg Technical University, Germany Abstract This paper presents the results of an

More information

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Dick K.P. Yue Center for Ocean Engineering Department of Mechanical Engineering Massachusetts Institute of Technology Cambridge,

More information

AllScale Pilots Applications AmDaDos Adaptive Meshing and Data Assimilation for the Deepwater Horizon Oil Spill

AllScale Pilots Applications AmDaDos Adaptive Meshing and Data Assimilation for the Deepwater Horizon Oil Spill This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No. 671603 An Exascale Programming, Multi-objective Optimisation and Resilience

More information

Towards real-time prediction of Tsunami impact effects on nearshore infrastructure

Towards real-time prediction of Tsunami impact effects on nearshore infrastructure Towards real-time prediction of Tsunami impact effects on nearshore infrastructure Manfred Krafczyk & Jonas Tölke Inst. for Computational Modeling in Civil Engineering http://www.cab.bau.tu-bs.de 24.04.2007

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Numerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge

Numerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge Numerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge Christophe DENIS Christophe.Denis@edf.fr EDF Resarch and Development - EDF Lab Clamart August 22, 2014 16

More information

Numerical Wave Tank Modeling of Hydrodynamics of Permeable Barriers

Numerical Wave Tank Modeling of Hydrodynamics of Permeable Barriers ICHE 2014, Hamburg - Lehfeldt & Kopmann (eds) - 2014 Bundesanstalt für Wasserbau ISBN 978-3-939230-32-8 Numerical Wave Tank Modeling of Hydrodynamics of Permeable Barriers K. Rajendra & R. Balaji Indian

More information

WAVE PATTERNS, WAVE INDUCED FORCES AND MOMENTS FOR A GRAVITY BASED STRUCTURE PREDICTED USING CFD

WAVE PATTERNS, WAVE INDUCED FORCES AND MOMENTS FOR A GRAVITY BASED STRUCTURE PREDICTED USING CFD Proceedings of the ASME 2011 30th International Conference on Ocean, Offshore and Arctic Engineering OMAE2011 June 19-24, 2011, Rotterdam, The Netherlands OMAE2011-49593 WAVE PATTERNS, WAVE INDUCED FORCES

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Experimental Validation of the Computation Method for Strongly Nonlinear Wave-Body Interactions

Experimental Validation of the Computation Method for Strongly Nonlinear Wave-Body Interactions Experimental Validation of the Computation Method for Strongly Nonlinear Wave-Body Interactions by Changhong HU and Masashi KASHIWAGI Research Institute for Applied Mechanics, Kyushu University Kasuga

More information

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi

More information

SPE Distinguished Lecturer Program

SPE Distinguished Lecturer Program SPE Distinguished Lecturer Program Primary funding is provided by The SPE Foundation through member donations and a contribution from Offshore Europe The Society is grateful to those companies that allow

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements Unstructured Grid Numbering Schemes for GPU Coalescing Requirements Andrew Corrigan 1 and Johann Dahm 2 Laboratories for Computational Physics and Fluid Dynamics Naval Research Laboratory 1 Department

More information

Large Data Set Computations with CFX: To 100 Million Nodes & Beyond!

Large Data Set Computations with CFX: To 100 Million Nodes & Beyond! Large Data Set Computations with CFX: To 100 Million Nodes & Beyond! Dr. Mark E. Braaten Mechanical Engineer GE Global Research, Niskayuna, NY Acknowledgements Thanks to many people, in particular: GEGR

More information

Using Graph Partitioning and Coloring for Flexible Coarse-Grained Shared-Memory Parallel Mesh Adaptation

Using Graph Partitioning and Coloring for Flexible Coarse-Grained Shared-Memory Parallel Mesh Adaptation Available online at www.sciencedirect.com Procedia Engineering 00 (2017) 000 000 www.elsevier.com/locate/procedia 26th International Meshing Roundtable, IMR26, 18-21 September 2017, Barcelona, Spain Using

More information

OzenCloud Case Studies

OzenCloud Case Studies OzenCloud Case Studies Case Studies, April 20, 2015 ANSYS in the Cloud Case Studies: Aerodynamics & fluttering study on an aircraft wing using fluid structure interaction 1 Powered by UberCloud http://www.theubercloud.com

More information

Experimental Data Confirms CFD Models of Mixer Performance

Experimental Data Confirms CFD Models of Mixer Performance The Problem Over the years, manufacturers of mixing systems have presented computational fluid dynamics (CFD) calculations to claim that their technology can achieve adequate mixing in water storage tanks

More information

Jülich Supercomputing Centre

Jülich Supercomputing Centre Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at

More information

INTERNATIONAL JOURNAL OF CIVIL AND STRUCTURAL ENGINEERING Volume 2, No 3, 2012

INTERNATIONAL JOURNAL OF CIVIL AND STRUCTURAL ENGINEERING Volume 2, No 3, 2012 INTERNATIONAL JOURNAL OF CIVIL AND STRUCTURAL ENGINEERING Volume 2, No 3, 2012 Copyright 2010 All rights reserved Integrated Publishing services Research article ISSN 0976 4399 Efficiency and performances

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United

More information

Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways

Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways Latif Bouhadji ASL-AQFlow Inc., Sidney, British Columbia, Canada Email: lbouhadji@aslenv.com ABSTRACT Turbulent flows over a spillway

More information

The TUFLOW Link. Past, Present and Future. Stephanie Dufour

The TUFLOW Link. Past, Present and Future. Stephanie Dufour The TUFLOW Link Past, Present and Future Stephanie Dufour stephanie.dufour@bmtwbm.co.uk Contents PAST Software background Thames Embayments Inundation Study PRESENT Flood Modeller -TUFLOW link Flood Modeller

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

XP Solutions has a long history of Providing original, high-performing software solutions Leading the industry in customer service and support

XP Solutions has a long history of Providing original, high-performing software solutions Leading the industry in customer service and support XP Solutions has a long history of Providing original, high-performing software solutions Leading the industry in customer service and support Educating our customers to be more successful in their work.

More information

COMPUTATIONAL FLUIDAND SOLID MECHANICS

COMPUTATIONAL FLUIDAND SOLID MECHANICS COMPUTATIONAL FLUIDAND SOLID MECHANICS K.J. Bathe, editor Proceedings First MIT Conference on ComDutational Fluid and Solid Mechanics June t2.t5,2oot ElSeVief 968 Aerodynamic interaction between multiple

More information

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES A Thesis by XUYANG GUO Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

Potsdam Propeller Test Case (PPTC)

Potsdam Propeller Test Case (PPTC) Second International Symposium on Marine Propulsors smp 11, Hamburg, Germany, June 2011 Workshop: Propeller performance Potsdam Propeller Test Case (PPTC) Olof Klerebrant Klasson 1, Tobias Huuva 2 1 Core

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Numerical Hydraulics

Numerical Hydraulics ETHZ, Fall 2017 Numerical Hydraulics Assignment 3 Comparison of two numerical solutions of river flow: use of Finite Elements (HEC-RAS) and Finite Volumes (BASEMENT) 1 Introduction In the course, two different

More information

Parallel Computing for Reacting Flows Using Adaptive Grid Refinement

Parallel Computing for Reacting Flows Using Adaptive Grid Refinement Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03054-5 Parallel Computing for Reacting Flows Using Adaptive Grid Refinement Robbert L. Verweij, Aris Twerda, and Tim W.J. Peeters 1. Introduction

More information

The Interaction and Merger of Nonlinear Internal Waves (NLIW)

The Interaction and Merger of Nonlinear Internal Waves (NLIW) The Interaction and Merger of Nonlinear Internal Waves (NLIW) PI: Darryl D. Holm Mathematics Department Imperial College London 180 Queen s Gate SW7 2AZ London, UK phone: +44 20 7594 8531 fax: +44 20 7594

More information

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Recent results with elsa on multi-cores

Recent results with elsa on multi-cores Michel Gazaix (ONERA) Steeve Champagneux (AIRBUS) October 15th, 2009 Outline Short introduction to elsa elsa benchmark on HPC platforms Detailed performance evaluation IBM Power5, AMD Opteron, INTEL Nehalem

More information

Mixed OpenMP/MPI approaches on Blue Gene for CDF applications (EDF R&D and IBM Research collaboration)

Mixed OpenMP/MPI approaches on Blue Gene for CDF applications (EDF R&D and IBM Research collaboration) IBM eserver pseries Sciomp15- May 2008 - Barcelona Mixed OpenMP/MPI approaches on Blue Gene for CDF applications (EDF R&D and IBM Research collaboration) Pascal Vezolle - IBM Deep Computing, vezolle@fr.ibm.com

More information

Continuum-Microscopic Models

Continuum-Microscopic Models Scientific Computing and Numerical Analysis Seminar October 1, 2010 Outline Heterogeneous Multiscale Method Adaptive Mesh ad Algorithm Refinement Equation-Free Method Incorporates two scales (length, time

More information

Simulation and Optimization of Biomedical Devices. Mike Nicolai Jülich

Simulation and Optimization of Biomedical Devices. Mike Nicolai Jülich Simulation and Optimization of Biomedical Devices. Mike Nicolai Jülich 7.5.2013 Contents SimLab: Highly Scalable Fluids and Solids Engineering Goals Projects The XNS flow solver Features Code Performance

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Large-scale Gas Turbine Simulations on GPU clusters

Large-scale Gas Turbine Simulations on GPU clusters Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based

More information

Numerical Modelling and Analysis of Water Free Surface Flows

Numerical Modelling and Analysis of Water Free Surface Flows EnviroInfo 2005 (Brno) Informatics for Environmental Protection - Networking Environmental Information Numerical Modelling and Analysis of Water Free Surface Flows Fadi Dabaghi 1, Abdellah El Kacimi 1,2,

More information

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013 A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv:1309.1230v1 [cs.dc] 5 Sep 2013 Kerry A. Seitz, Jr. 1, Alex Kennedy 1, Owen Ransom 2, Bassam A. Younis 2, and John D. Owens 3 1 Department

More information

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Vol. 12, Issue 1/2016, 63-68 DOI: 10.1515/cee-2016-0009 MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Juraj MUŽÍK 1,* 1 Department of Geotechnics, Faculty of Civil Engineering, University

More information

Transactions on Information and Communications Technologies vol 9, 1995 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 9, 1995 WIT Press,   ISSN Parallelization of software for coastal hydraulic simulations for distributed memory parallel computers using FORGE 90 Z.W. Song, D. Roose, C.S. Yu, J. Berlamont B-3001 Heverlee, Belgium 2, Abstract Due

More information

QUASI-3D SOLVER OF MEANDERING RIVER FLOWS BY CIP-SOROBAN SCHEME IN CYLINDRICAL COORDINATES WITH SUPPORT OF BOUNDARY FITTED COORDINATE METHOD

QUASI-3D SOLVER OF MEANDERING RIVER FLOWS BY CIP-SOROBAN SCHEME IN CYLINDRICAL COORDINATES WITH SUPPORT OF BOUNDARY FITTED COORDINATE METHOD QUASI-3D SOLVER OF MEANDERING RIVER FLOWS BY CIP-SOROBAN SCHEME IN CYLINDRICAL COORDINATES WITH SUPPORT OF BOUNDARY FITTED COORDINATE METHOD Keisuke Yoshida, Tadaharu Ishikawa Dr. Eng., Tokyo Institute

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Introduction to the SBLI code. N. De Tullio and N.D.Sandham Faculty of Engineering and the Environment University of Southampton

Introduction to the SBLI code. N. De Tullio and N.D.Sandham Faculty of Engineering and the Environment University of Southampton Introduction to the SBLI code N. De Tullio and N.D.Sandham Faculty of Engineering and the Environment University of Southampton 1 Outline Overview of the numerical algorithms Review of recent code re-engineering

More information

The NEMO Ocean Modelling Code: A Case Study

The NEMO Ocean Modelling Code: A Case Study The NEMO Ocean Modelling Code: A Case Study CUG 24 th 27 th May 2010 Dr Fiona J. L. Reid Applications Consultant, EPCC f.reid@epcc.ed.ac.uk +44 (0)131 651 3394 Acknowledgements Cray Centre of Excellence

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania

More information

How to overcome common performance problems in legacy climate models

How to overcome common performance problems in legacy climate models How to overcome common performance problems in legacy climate models Jörg Behrens 1 Contributing: Moritz Handke 1, Ha Ho 2, Thomas Jahns 1, Mathias Pütz 3 1 Deutsches Klimarechenzentrum (DKRZ) 2 Helmholtz-Zentrum

More information

Corrected/Updated References

Corrected/Updated References K. Kashiyama, H. Ito, M. Behr and T. Tezduyar, "Massively Parallel Finite Element Strategies for Large-Scale Computation of Shallow Water Flows and Contaminant Transport", Extended Abstracts of the Second

More information

GPU - Next Generation Modeling for Catchment Floodplain Management. ASFPM Conference, Grand Rapids (June 2016) Chris Huxley

GPU - Next Generation Modeling for Catchment Floodplain Management. ASFPM Conference, Grand Rapids (June 2016) Chris Huxley GPU - Next Generation Modeling for Catchment Floodplain Management ASFPM Conference, Grand Rapids (June 2016) Chris Huxley Presentation Overview 1. What is GPU flood modeling? 2. What is possible using

More information

CFD FOR OFFSHORE APPLICATIONS USING REFRESCO. Arjen Koop - Senior Project Manager Offshore MARIN

CFD FOR OFFSHORE APPLICATIONS USING REFRESCO. Arjen Koop - Senior Project Manager Offshore MARIN CFD FOR OFFSHORE APPLICATIONS USING REFRESCO Arjen Koop - Senior Project Manager Offshore MARIN COMPUTATIONAL FLUID DYNAMICS (CFD) Advantages: Quantitative predictions Detailed insight in physical processes

More information

τ-extrapolation on 3D semi-structured finite element meshes

τ-extrapolation on 3D semi-structured finite element meshes τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation

More information