OpenFOAM on BG/Q porting and performance
|
|
- Albert Cobb
- 5 years ago
- Views:
Transcription
1 OpenFOAM on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA
2 SYSTEM OVERVIEW OpenFOAM : selected application inside of PRACE project Fermi : PRACE Tier- System Model: IBM-BlueGene /Q Architecture: 1 BGQ Frame with 2 MidPlanes each Front-end Nodes OS: Red-Hat EL 6.2 Compute Node Kernel: lightweight Linux-like kernel Processor Type: IBM PowerA2, 16 cores, 1.6 GHz Computing Nodes: 1.24 Computing Cores: RAM: 16GB / node Internal Network: Network interface with 11 links ->5D Torus Disk Space: more than 2PB of scratch space Peak Performance: 2.1 PFlop/s
3 SYSTEM OVERVIEW Compute node (back-end): each compute node comprise 17 cores on a single chip with16 GB of dedicated physical memory Applications run on 16 of the cores with the 17th core reserved for system software. Nearly the full 16 GB of physical memory is dedicated to application usage. On each core it s possible to run up to 4 processes/threads for a total of 64 processes/threads per node Single Chip Module Compute card: One chip module, 16 GB DDR3 Memory Applications : Applications are submitted to the compute nodes by the batch scheduler system To run on the compute nodes (back-end), applications must be cross-compiled
4 Porting of OpenFOAM on BG/Q Compiling OpenFOAM for the back-end nodes on BG/Q requires some system specific changes to the configuration scripts of OpenFOAM and Third-party package It s not possible to use Third-party MPI, rules for BG/Q MPI must be inserted Environment configuration: Configure environment with compilers and zlib using modules module load bgq-gnu module load zlib OpenFOAM configuration scripts and rules: Files bashrc and settings.sh must be changed inserting the rules for BG/Q MPI Files c/c++ in wmake/rules folders must be modified for dynamic linking Scotch library build Before running Allwmake in the OpenFOAM main folder some changes need to be made to the compiling and dynamic linking rules in the file Makefile.inc contained in the scotch library. Cross-compile and execute on the back-end the dummysizes scotch utility to build properly the header files scotch.h and scotchf.h Compile Go in $WM_PROJECT/$WM_PROJECT_VERSION and compile with./allwmake
5 Performance of OpenFOAM on BG/Q Test cases Cavity 3D Isothermal Incompressible Flow Solver : icofoam BoxTurb 3D Omogeneus Isotropic Turbulence on compressible flow Solver : sonicfoam Airfoil wing section External aerodynamic Solver : simplefoam Dtmb hull Marine hydrodynamics Solver : interfoam
6 Performance of OpenFOAM on BG/Q Systems Model: IBM-BlueGene /Q (Fermi) Processor Type: IBM PowerA2, 1.6 GHz Computing Node: 16 cores RAM: 16GB / node; 1GB/core Internal Network: Network interface with 11 links ->5D Torus Model: Hewlett Packard C7 (Lagrange) Processor Type: Intel, Xeon Westmere, 2.8 GHz Computing Node: 12 cores RAM: 24GB / node; 2GB/core Internal Network: Infiniband QDR/DDR Voltaire, Fat Tree
7 Cavity 3D Flow : laminar, isothermal, incompressible Mesh : fully structured 3D Mesh elements : cubes Elements 1.. Elements 2.. Simple Scotch Simple Scotch icofoam icofoam
8 Cavity 3D and Mesh :1.. Solution saved at final time step Fermi Lagrange Ideal Fermi Lagrange Ideal Fermi Lagrange Ideal
9 Cavity 3D and Mesh :1.. Solution saved every 1 time steps Fermi Lagrange Ideal
10 Increment % Cavity 3D Profiling Number of iterations : 1 Files per core : 3 MPI_Allreduce average message size per core (B) : 8 -- #cores 124 Average message size sent and received per core (KB) : 4,6 -- #cores 124 MPI and I/O profiling : 512 cores # Cores Cumulative I/O (GB) Files Size per core (MB) 64 13, 5, , 2, , 1, ,, , 25% 2% I/O overhead on simulation time MPI and I/O profiling : 124 cores 15% 1% 5% % Fermi Lagrange
11 Cavity 3D and efficiency Mesh :2.. Solution saved at final time step Fermi Lagrange Ideal Fermi Lagrange Ideal Fermi Lagrange Ideal
12 Cavity 3D and efficiency Mesh :2.. Solution saved every 1 time steps Fermi Lagrange Ideal
13 % Increment Cavity 3D Profiling Number of iterations : 1 Files per core : 3 MPI_Allreduce average message size per core (B) : 8 -- #cores 124 Average message size sent and received per core (KB) : 6,4 -- #cores 124 MPI and I/O profiling : 512 cores # Cores Cumulative I/O (GB) Files Size per core (MB) 64 18,1 9, ,1 4, ,5 2, ,5 1, ,1,63 3% 25% 2% 15% 1% 5% % Fermi I/O overhead on simulation time Lagrange MPI and I/O profiling : 124 cores
14 BoxTurb 3D Flow : compressible Case study : homogeneous, isotropic turbulence Mesh : uniform 3D Number of cells : 17.. Solver : sonicfoam Partition method : simple Courtesy of : Matteo Cerminara (INGV), Pisa
15 BoxTurb 3D and efficiency Solution saved at the final time step Patition method - simple Fermi Lagrange Ideal Fermi Lagrange Ideal
16 BoxTurb 3D and efficiency Solution saved every 1 time steps Patition method - simple Fermi Lagrange Ideal Fermi Lagrange Ideal
17 Increment % BoxTurb 3D Profiling Number of iterations : 18 Files per core : 4 MPI_Allreduce average message size per core (B) : 8 -- #cores 124 Average message size sent and received per core (KB) : 9,3 -- #cores 124 MPI and I/O profiling : 512 cores # Cores Cumulative I/O (GB) Files Size per core (MB) 64 18,4 4, ,4 2, ,6 1, , ,2,32 14% 12% 1% 8% 6% 4% 2% % Fermi I/O overhead on simulation time Lagrange MPI and I/O profiling : 124 cores
18 Airfoil wing section Flow : turbulent, incompressible Case study : steady state, extruded NACA airfoil Mesh : fully structured 3D Number of cells : 9.. Solver : simplefoam Method : simple - scotch
19 Airfoil wing section - and efficiency Solution saved at the final time step
20 Airfoil wing section Profiling MPI profiling simple cores MPI profiling simple cores MPI profiling scotch cores MPI profiling scotch cores
21 Airfoil wing section - and efficiency Solution saved every 1 time steps Fermi Lagrange Ideal
22 Airfoil wing section Profiling Number of iterations : 1 Files per core : 6 MPI_Allreduce average message size per core (B) : 8 -- #cores 512 Average message size sent and received per core (KB) : 4,2 -- #cores 512 MPI and I/O profiling : 512 cores # Cores Cumulative I/O (GB) Files Size per core (MB) 64 5,6 1, ,8, ,6, ,9, , 8% 7% 6% 5% 4% 3% 2% 1% % Fermi Decomposition method - scotch Lagrange MPI and I/O profiling : 124 cores
23 Free surface - dtmb hull 3D Flow : turbulent, incompressible Case study : unsteady, multiphase Mesh : unstructured 3D Number of cells : 5.5. Solver : interfoam Method : simple - scotch
24 Free surface - dtmb hull 3D Speed up and efficiency Solution saved at the final time step Fermi Lagrange Ideal ,2 1,8,6,4,
25 Free surface, dtmb hull 3D Speed up and efficiency Solution saved every 1 time steps 1,2 1,8,6,4, ,2 1,8,6,4,
26 Increment % Free surface - dtmb hull 3D - Profiling Number of iterations : 1 Files per core : 8 MPI_Allreduce average message size per core (B) : 8 -- #cores 512 Average message size sent and received per core (KB) : 29,4 -- #cores 512 # Cores Cumulative I/O (GB) Files Size per core (MB) 64 18,4 4,5 MPI and I/O profiling : 256 cores ,4 2, ,6 1, ,6 MPI and I/O profiling : 512 cores 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Fermi Lagrange I/O overhead on simulation time
27 Conclusions OpenFOAM scaling and efficiency performance on Fermi and classic HPC systems are comparable but for well suited case studies with a good balancing between computation, I/O and MPI communications we could benefit from the larger amount of available cores on Fermi. OpenFOAM efficiency and scaling are constrained by poor I/O design and intra-process communication A new scheme of I/O based on MPI Parallel I/O routines or available parallel I/O libraries, able to use efficiently parallel file system facilities, should dramatically reduce I/O overhead A multi-threaded hybrid MPI/OpenMP version of the solvers will indeed mitigate the time spent in MPI routines with the increase in the number of cores.
28 Acknowledgements Bob Danani Matteo Cerminara Massimiliano Culpo Piero Lanucara Andrea Penza Francesco Salvadore Ivan Spisso VLSCI Carlton, Melbourne INGV CINECA CINECA CINECA CINECA CINECA
29 Questions?
Carlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationAerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project
Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),
More informationHybrid MPI+OpenMP Optimization,BGQ exploitation SPEED TIGRA - OpenFOAM
Hybrid MPI+OpenMP Optimization,BGQ exploitation SPEED TIGRA - OpenFOAM Paride Dagna p.dagna@cineca.it SuperComputing Applications and Innovation Department February 11-15, 2013 Hybrid MPI+OpenMP Optimization,BGQ
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System
More informationMulti-GPU simulations in OpenFOAM with SpeedIT technology.
Multi-GPU simulations in OpenFOAM with SpeedIT technology. Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in 2008. CMRS format
More informationIntroduction to CINECA HPC Environment
Introduction to CINECA HPC Environment 23nd Summer School on Parallel Computing 19-30 May 2014 m.cestari@cineca.it, i.baccarelli@cineca.it Goals You will learn: The basic overview of CINECA HPC systems
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationHPC Architectures past,present and emerging trends
HPC Architectures past,present and emerging trends Andrew Emerson, Cineca a.emerson@cineca.it 27/09/2016 High Performance Molecular 1 Dynamics - HPC architectures Agenda Computational Science Trends in
More informationHPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,
HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info
More informationExascale: challenges and opportunities in a power constrained world
Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department CINECA CINECA non profit Consortium, made
More informationIntroduction to HPC Numerical libraries on FERMI and PLX
Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationSHAPE pilot Monotricat SRL: Hull resistance simulations for an innovative hull using OpenFOAM
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe SHAPE pilot Monotricat SRL: Hull resistance simulations for an innovative hull using OpenFOAM Lilit Axner a,b, Jing Gong
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationBenchmark results on Knight Landing (KNL) architecture
Benchmark results on Knight Landing (KNL) architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Roma 23/10/2017 KNL, BDW, SKL A1 BDW A2 KNL A3 SKL cores per node 2 x 18 @2.3
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,
More informationTeam 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud
Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud CFD Support s OpenFOAM and UberCloud Containers enable efficient, effective, and easy access and use of MEET THE TEAM End-User/CFD
More informationHPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara
HPC Architectures evolution: the case of Marconi, the new CINECA flagship system Piero Lanucara Many advantages as a supercomputing resource: Low energy consumption. Limited floor space requirements Fast
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)
PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationPerformance of the hybrid MPI/OpenMP version of the HERACLES code on the Curie «Fat nodes» system
Performance of the hybrid MPI/OpenMP version of the HERACLES code on the Curie «Fat nodes» system Edouard Audit, Matthias Gonzalez, Pierre Kestener and Pierre-François Lavallé The HERACLES code Fixed grid
More informationHPC Cineca Infrastructure: State of the art and towards the exascale
HPC Cineca Infrastructure: State of the art and towards the exascale HPC Methods for CFD and Astrophysics 13 Nov. 2017, Casalecchio di Reno, Bologna Ivan Spisso, i.spisso@cineca.it Contents CINECA in a
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationCode Saturne on POWER8 clusters: First Investigations
Code Saturne on POWER8 clusters: First Investigations C. MOULINEC, V. SZEREMI, D.R. EMERSON (STFC Daresbury Lab., UK) Y. FOURNIER (EDF R&D, FR) P. VEZOLLE, L. ENAULT (IBM Montpellier, FR) B. ANLAUF, M.
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationI/O: State of the art and Future developments
I/O: State of the art and Future developments Giorgio Amati SCAI Dept. Rome, 18/19 May 2016 Some questions Just to know each other: Why are you here? Which is the typical I/O size you work with? GB? TB?
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationBenchmark results on Knight Landing architecture
Benchmark results on Knight Landing architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Milano, 21/04/2017 KNL vs BDW A1 BDW A2 KNL cores per node 2 x 18 @2.3 GHz 1 x 68
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier0) Contributing sites and the corresponding computer systems for this call are: GENCI CEA, France Bull Bullx cluster GCS HLRS, Germany Cray
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationEfficient use of OpenFOAM in industry
Elmer Technologies: Efficient use of OpenFOAM in industry Author: Oskar Elmgren Elmer Technologies Built on motorsport experience Specializing in product and technology development Simulation and prototype
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationHPC Architectures past,present and emerging trends
HPC Architectures past,present and emerging trends Author: Andrew Emerson, Cineca a.emerson@cineca.it Speaker: Alessandro Marani, Cineca a.marani@cineca.it Agenda Computational Science Trends in HPC technology
More informationHybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores
Hybrid OpenMP-MPI Turbulent boundary Layer code over 32k cores T/NT INTERFACE y/ x/ z/ 99 99 Juan A. Sillero, Guillem Borrell, Javier Jiménez (Universidad Politécnica de Madrid) and Robert D. Moser (U.
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationHow to get started with OpenFOAM at SHARCNET
How to get started with OpenFOAM at SHARCNET, High Performance Technical Consultant SHARCNET, York University isaac@sharcnet.ca Outlines Introduction to OpenFOAM Compilation in SHARCNET Pre/Post-Processing
More informationApplication of GPU technology to OpenFOAM simulations
Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM
More informationProceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014) Porto, Portugal
Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014) Porto, Portugal Jesus Carretero, Javier Garcia Blas Jorge Barbosa, Ricardo Morla (Editors) August
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationPRACE Project Access Technical Guidelines - 19 th Call for Proposals
PRACE Project Access Technical Guidelines - 19 th Call for Proposals Peer-Review Office Version 5 06/03/2019 The contributing sites and the corresponding computer systems for this call are: System Architecture
More informationBasic Specification of Oakforest-PACS
Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS
More informationPre-compiled applications and utilities in OpenFOAM
Pre-compiled applications and utilities in OpenFOAM Tommaso Lucchini Department of Energy Politecnico di Milano Learning outcome You will learn... the meaning of the words applications, solvers, and utilities
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationRunning in parallel. Total number of cores available after hyper threading (virtual cores)
First at all, to know how many processors/cores you have available in your computer, type in the terminal: $> lscpu The output for this particular workstation is the following: Architecture: x86_64 CPU
More informationDirect Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics Thibaut Deloze a, Yannick
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationOpen Source Software Course: Assignment 1
Open Source Software Course: Assignment 1 Mengmeng Zhang Aeronautical and Vehicle Engineering, Royal Insistute of Technology (KTH), Stockholm, Sweden 2012-09-09 Mengmeng Zhang Open Source Software Course
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationGPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation
GPU-Acceleration of CAE Simulations Bhushan Desam NVIDIA Corporation bdesam@nvidia.com 1 AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications
More informationGaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationChallenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures. Allison H. Baker, Todd Gamblin, Martin Schulz, and Ulrike Meier Yang
Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures. Allison H. Baker, Todd Gamblin, Martin Schulz, and Ulrike Meier Yang Multigrid Solvers Method of solving linear equation
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationHMEM and Lemaitre2: First bricks of the CÉCI s infrastructure
HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure - CÉCI: What we want - Cluster HMEM - Cluster Lemaitre2 - Comparison - What next? - Support and training - Conclusions CÉCI: What we want CÉCI:
More informationTaming OpenFOAM for Ship Hydrodynamics Applications
Taming OpenFOAM for Ship Hydrodynamics Applications Sung-Eun Kim, Ph. D. Computational Hydromechanics Division (Code 5700) Naval Surface Warfare Center Carderock Division Background Target Applications
More informationCastNet: GUI environment for OpenFOAM
CastNet: GUI environment for OpenFOAM CastNet is a preprocessing system and job-control system for OpenFOAM. CastNet works with the standard OpenFOAM releases provided by ESI Group as well as ports for
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationEnabling Scalable Parallel Processing of Venus/OMNeT++ Network Models on the IBM Blue Gene/Q Supercomputer
Enabling Scalable Parallel Processing of Venus/OMNeT++ Network Models on the IBM Blue Gene/Q Supercomputer Chris Carothers, Elsa Gonsiorowski and Justin LaPre Center for Computational Innovations Rensselaer
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationLS-DYNA Performance on 64-Bit Intel Xeon Processor-Based Clusters
9 th International LS-DYNA Users Conference Computing / Code Technology (2) LS-DYNA Performance on 64-Bit Intel Xeon Processor-Based Clusters Tim Prince, PhD ME Hisaki Ohara, MS IS Nick Meng, MS EE Intel
More informationHigh Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA
High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it
More informationMaxwell: a 64-FPGA Supercomputer
Maxwell: a 64-FPGA Supercomputer Copyright 2007, the University of Edinburgh Dr Rob Baxter Software Development Group Manager, EPCC R.Baxter@epcc.ed.ac.uk +44 131 651 3579 Outline The FHPCA Why build Maxwell?
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationFast Setup and Integration of Abaqus on HPC Linux Cluster and the Study of Its Scalability
Fast Setup and Integration of Abaqus on HPC Linux Cluster and the Study of Its Scalability Betty Huang, Jeff Williams, Richard Xu Baker Hughes Incorporated Abstract: High-performance computing (HPC), the
More informationHOKUSAI System. Figure 0-1 System diagram
HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationCEE 618 Scientific Parallel Computing (Lecture 10)
1 / 20 CEE 618 Scientific Parallel Computing (Lecture 10) Computational Fluid Mechanics using OpenFOAM: Cavity (2) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i
More informationEarly Experiences with Trinity - The First Advanced Technology Platform for the ASC Program
Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,
More informationParallel Mesh Multiplication for Code_Saturne
Parallel Mesh Multiplication for Code_Saturne Pavla Kabelikova, Ales Ronovsky, Vit Vondrak a Dept. of Applied Mathematics, VSB-Technical University of Ostrava, Tr. 17. listopadu 15, 708 00 Ostrava, Czech
More informationUniversity at Buffalo Center for Computational Research
University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support
More informationCray Users Group 2012 May 2, 2012
Cray Users Group 2012 May 2, 2012 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC pf3d simulates
More informationQLogic TrueScale InfiniBand and Teraflop Simulations
WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging
More informationHPC & Cloud Computing
www. cineca.it Giuseppe Fiameni - g.fiameni@cineca.it SuperComputing Application and Innovation - CINECA INFN CCR GARR Workshop May 15th, 2012 - Napoli Agenda Cloud Definition Scalability vs. Performance
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationImmutable Application Containers Reproducibility of CAE-computations through Immutable Application Containers
Immutable Application Containers Reproducibility of CAE-computations through Immutable Application Containers HPC Advisory Council - Swiss Workshop 2015 About Me >10y iteration so far SysAdmin, SysOps,
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationTianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer
Tianhe-2, the world s fastest supercomputer Shaohua Wu Senior HPC application development engineer Inspur Inspur revenue 5.8 2010-2013 6.4 2011 2012 Unit: billion$ 8.8 2013 21% Staff: 14, 000+ 12% 10%
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationSolving Large Complex Problems. Efficient and Smart Solutions for Large Models
Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationOpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017
OpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking
More informationOptimization of parameter settings for GAMG solver in simple solver
Optimization of parameter settings for GAMG solver in simple solver Masashi Imano (OCAEL Co.Ltd.) Aug. 26th 2012 OpenFOAM Study Meeting for beginner @ Kanto Test cluster condition Hardware: SGI Altix ICE8200
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationDATARMOR: Comment s'y préparer? Tina Odaka
DATARMOR: Comment s'y préparer? Tina Odaka 30.09.2016 PLAN DATARMOR: Detailed explanation on hard ware What can you do today to be ready for DATARMOR DATARMOR : convention de nommage ClusterHPC REF SCRATCH
More informationTutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE
Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.
More informationOutline. March 5, 2012 CIRMMT - McGill University 2
Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More information