ANSYS HPC Technology Leadership
|
|
- Miles Dixon
- 6 years ago
- Views:
Transcription
1 ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14,
2 Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables high-fidelity Include details - for reliable results Getting it right the first time Innovate with confidence HPC enables design exploration & optimization Consider multiple design ideas Optimize the design Ensure performance across range of conditions 2 ANSYS, Inc. November 14,
3 15 % spent on R&D 570 software developers Partner relationships ANSYS HPC Leadership A History of HPC Performance Parallel dynamic moving/deforming mesh Distributed memory particle tracking Integration with load management systems Support for Linux clusters, low latency interconnects 10M cell fluids simulations, 128 processors Ideal scaling to 2048 cores (fluids) Teraflop performance at 512 core (structures) Parallel I/O (fluids) Domain Decomposition introduced (HFSS 12) Parallel meshing (fluids) Support for clusters using Windows HPC Parallel dynamic mesh refinement and coarsening Dynamic load balancing st general-purpose parallel CFD with interactive client-server user environment s Vector Processing on Mainframes Ideal scaling to 3072 cores (fluids) Hybrid parallel for sustained multicore performance (fluids) Architecture-aware partitioning (fluids) GPU acceleration (structures) Optimized performance on multicore processors 1 st One Billion cell fluids simulation Distributed sparse solver Distributed PCG solver Variational Technology DANSYS released Distributed Solve (DSO) HFSS st company to solve 100M structural DOF Today s multi-core / many-core 64bit hardware large memory addressing evolution Shared memory multiprocessing (HFSS 7) makes HPC a software development imperative. ANSYS 1990 is committed to maintaining performance 1990 Shared Memory Multiprocessing for structural simulations leadership Iterative PCG Solver Introduced for large structural analysis 3 ANSYS, Inc. November 14, ANSYS, Inc. Proprietary
4 HPC Hardware Terminology Machine 1 (or Node 1) Core1 Core2 Core3 Core4 Processor 1 (or Socket 1) Core1 Core2 Core3 Core4 Processor 2 (or Socket 2) Interconnect (gige or Infiniband) Memory (RAM) Machine N (or Node N) Core1 Core2 Core3 Core4 Processor 1 (or Socket 1) Core1 Core2 Core3 Core4 Processor 2 (or Socket 2) Memory (RAM) GPU 4 ANSYS, Inc. November 14,
5 HPC A Software Development Imperative Clock Speed Leveling off Core Counts Growing Exploding (GPUs) Future performance depends on highly scalable parallel software 5 Source: ANSYS, Inc. November 14,
6 ANSYS Fluent Scaling Hardware (Intel Harpertown, DDR IB) 2010 Hardware (Intel Westmere, QDR IB) RATING IDEAL RATING IDEAL Number of Cores Systems keep improving: faster processors, more cores Ideal rating (speed) doubled in two years! ANSYS, Inc. November 14, Number of Cores Memory bandwidth per core and network latency/bw stress scalability 2008 release (12.0) re-architected MPI huge scaling improvement, for a while 2010 release (13.0) introduces hybrid parallelism and scaling continues!
7 ANSYS Fluent Scaling Extreme CFD Scaling s of cores Enabled by ongoing software innovation Hybrid parallel: fast shared memory communication (OpenMP) within a machine to speed up overall solver performance; distributed memory (MPI) between machines 7 ANSYS, Inc. November 14,
8 ANSYS Fluent Scaling 14.0 Improved HPC performance through Faster auto-partitioning for multi-core clusters Improved scalability for simulations with monitors enabled Optimized writing of cluster-to-cluster view factor data Cores: 1.1 million surface clusters R13 R million surface clusters ANSYS, Inc. November 14, Sample clusterto-cluster view factor data writing using 32-way parallel and InfiniBand Example data for scaling with R14 monitors
9 ANSYS Fluent Scaling 14.0 sedan_4m truck_14m Rating Rating NCores NCores truck_111m Rating HPC configuration 2-socket Intel Westmere hex-core 2.93 GHz 2 X 6= 12 cores per node QDR InfiniBand NCores 9 ANSYS, Inc. November 14,
10 ANSYS Fluent Scaling sedan_4m 2500 truck_poly_14m Rating Rating NCores NCores 350 truck_111m Architecture aware partitioning to minimize inter-machine communications Rating Hierarchy based hybrid AMG at coarsest levels 4-socket AMD Magny-Cour 2.3GHz 4 X 12 = 48 cores per node QDR InfiniBand NCores 10 ANSYS, Inc. November 14,
11 ANSYS CFD Scaling 14.0 Network-aware partitioning Original partitions are remapped to the cluster considering the network topology and latencies Minimizes inter-machine traffic reducing load on network switches Improves performance, particularly on slow interconnects and/or large clusters Partition Graph 3 machines, 8 cores each Colors indicate machines Original mapping New mapping 11 ANSYS, Inc. November 14,
12 12 ANSYS Fluent File I/O Performance 13.0 Case file IO Both read and write significantly faster in R13 A combination of serial-io optimizations as well as parallel-io techniques, where available Parallel-IO (.pdat) Significant speedup of parallel IO, particularly for cases with large number of zones Support for Lustre, EMC/MPFS, AIX/GPFS file systems added Data file IO (.dat) Performance in R12 was highly optimized. Further incremental improvements done in R13 ANSYS, Inc. November 14, 91.2 Parallel Data write truck_14m, case read R13 vs. R12 BMW -68% FL5L2 4M -63% Circuit -97% Truck 14M -64%
13 ANSYS Mechanical Scaling Sparse Solver (Parallel Re-Ordering) Focus on bottlenecks in Solution Rating Solution Rating R12.1 R13.0 Number of cores PCG Solver (Pre-Conditioner Scaling) R12.1 R13.0 the distributed memory solvers (DANSYS) sparse direct solver Parallelized equation ordering 40% faster w/ updated Intel MKL Preconditioned Conjugate Gradient (PCG) iterative solver Parallelized preconditioning step ANSYS, Inc. November 14, Number of cores
14 What about GPU Computing? CPUs and GPUs work in a collaborative fashion CPU GPU PCI Express channel Multi-core processors Typically 4-6 cores Powerful, general purpose Many-core processors Typically hundreds of cores Great for highly parallel code, within memory constraints 14 ANSYS, Inc. November 14,
15 ANSYS Mechanical SMP GPU 13.0 Solver Kernel Speedups Overall Speedups From NAFEMS World Congress May Boston, MA, USA Accelerate FEA Simulations with a GPU -by Jeff Beisheim, ANSYS 15 ANSYS, Inc. November 14, Tesla C2050 and Intel Xeon 5560
16 Distributed ANSYS Supports MDOF, Nonlinear Structural Analysis using the Distributed Sparse Solver GPU Acceleration can now be used with Distributed ANSYS to combine the advantage of GPU technology and the power of distributed ANSYS 16 ANSYS, Inc. November 14,
17 Distributed ANSYS Supports Distributed ANSYS 14.0 Total Simulation Speedups (ANSYS 13.0 benchmark problems) 4 CPU cores CPU cores + 1 GPU V13cg-1 (JCG, 1100k) V13sp-1 (sparse, 430k) V13sp-2 (sparse, 500k) V13sp-3 (sparse, 2400k) V13sp-4 (sparse, 1000k) V13sp-5 (sparse, 2100k) Windows workstation : Two Intel Xeon 5560 processors (2.8 GHz, 8 cores total), 32 GB RAM, NVIDIA Tesla C2070, Windows 7, TCC driver mode 17 ANSYS, Inc. November 14,
18 ANSYS Mechanical Multi-Node GPU Solder Joint Benchmark (4 MDOF, Creep Strain Analysis) Linux cluster : Each node contains 12 Intel Xeon 5600-series cores, 96 GB RAM, NVIDIA Tesla M2070, InfiniBand Solder balls Mold PCB Total Speedup R14 Distributed ANSYS w/wo GPU Without GPU With GPU 1.7x 1.9x 3.4x 3.2x 4.4x 16 cores 32 cores 64 cores 18 ANSYS, Inc. November 14, Results Courtesy of MicroConsult Engineering, GmbH
19 ANSYS Mechanical Multi-Node GPU Solder Joint Benchmark (4 MDOF, Creep Strain Analysis) Linux cluster : Each node contains 12 Intel Xeon 5600-series cores, 96 GB RAM, NVIDIA Tesla M2070, InfiniBand Solder balls 1 8 cores 1 8 cores, 1 GPU 2 4 cores, 2 GPU 8 1 core, 8 GPU 19 ANSYS, Inc. November 14, Mold PCB Results Courtesy of MicroConsult Engineering, GmbH
20 GPU Acceleration for CFD Radiation View Factor Calculation (ANSYS Fluent 14 - beta) First capability for specialty physics view factors, ray tracing, reaction rates, etc. R&D focus on linear solvers, smoothers but potential limited by Amdahl s Law 20 ANSYS, Inc. November 14,
21 Case Study HPC for High Fidelity CFD 8M to 12M element turbocharger models (ANSYS CFX) Previous practice (8 nodes HPC) Full stage compressor runs hours Turbine simulations up to 72 hours Current practice (160 nodes) 32 nodes per simulation Full stage compressor 4 hours Turbine simulations 5-6 hours Simultaneous consideration of 5 ideas Ability to address design uncertainty clearance tolerance ANSYS HPC technology is enabling Cummins to use larger models with greater geometric details and more-realistic treatment of physical phenomena ANSYS, Inc. November 14,
22 Case Study HPC for High Fidelity CFD EURO/CFD Model sizes up to 200M cells (ANSYS FLUENT) cluster of 700 cores cores per simulation 25 Millions (4 Days) 50 Millions (2 Days) 3 Millions of Cells (6 Days) 10 Millions (5 Days) Compressibility Conduction/Convection Supersonic Multiphase Radiation Increase of : Transient Optimisation / DOE Dynamic Mesh Requirement for Accuracy Complexity of models LES Combustion Aeroacoustic Fluid Structure Interaction 22 ANSYS, Inc. November 14,
23 Microconsult GmbH Case Study HPC for High Fidelity Mechanical Solder joint failure analysis Thermal stress 7.8 MDOF Creep strain 5.5 MDOF Simulation time reduced from 2 weeks to 1 day From 8 26 cores (past) to 128 cores (present) HPC is an important competitive advantage for companies looking to optimize the performance of their products and reduce time to market. 23 ANSYS, Inc. November 14,
24 Case Study HPC for Desktop Productivity Cognity Limited steerable conductors for oil recovery ANSYS Mechanical simulations to determine load carrying capacity 750K elements, many contacts 12 core workstations / 24 GB RAM 6X speedup / results in 1 hour or less 5-10 design iterations per day Parallel processing makes it possible to evaluate five to 10 design iterations per day, enabling Cognity to rapidly improve their design ANSYS, Inc. November 14,
25 Case Study Desktop Productivity Cautionary Tale NVIDIA - Case study on the value of HW refresh and SW best-practice Deflection and bending of 3D glasses ANSYS Mechanical 1M DOF models Optimization of: Solver selection (direct vs iterative) Machine memory (in core execution) Multicore (8-way) parallel with GPU acceleration Before/After: 77x speedup from 60 hours per simulation to 47 minutes. Most importantly: HPC tuning added scope for design exploration and optimization. 25 ANSYS, Inc. November 14,
26 Take Home Points / Discussion ANSYS HPC performance enables scaling for high-fidelity What could you learn from a 10M (or 100M) cell / DOF model? What could you learn if you had time to consider 10 x more design ideas? Scaling applies to all physics, all hardware (desktop and cluster) ANSYS continually invests in software development for HPC Maximized value from your HPC investment This creates differentiated competitive advantage for ANSYS users Comments / Questions / Discussion 26 ANSYS, Inc. November 14,
HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationSolving Large Complex Problems. Efficient and Smart Solutions for Large Models
Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationWhy HPC for. ANSYS Mechanical and ANSYS CFD?
Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More information2008 International ANSYS Conference
28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationMaximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationTFLOP Performance for ANSYS Mechanical
TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de Engineering H. Güttler 19.06.2013 Seite 1 May 2009, Ansys12, 512
More informationComputer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.
Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior
More informationUnderstanding Hardware Selection to Speedup Your CFD and FEA Simulations
Understanding Hardware Selection to Speedup Your CFD and FEA Simulations 1 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 2 Agenda Why Talking
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationSIMPLIFYING HPC SIMPLIFYING HPC FOR ENGINEERING SIMULATION WITH ANSYS
SIMPLIFYING HPC SIMPLIFYING HPC FOR ENGINEERING SIMULATION WITH ANSYS THE DELL WAY We are an acknowledged leader in academic supercomputing including major HPC systems installed at the Cambridge University
More informationRecent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH
Recent Advances in ANSYS Toward RDO Practices Using optislang Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH 1 Product Development Pressures Source: Engineering Simulation & HPC Usage Survey
More informationThe Cray CX1 puts massive power and flexibility right where you need it in your workgroup
The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationIBM Information Technology Guide For ANSYS Fluent Customers
IBM ISV & Developer Relations Manufacturing IBM Information Technology Guide For ANSYS Fluent Customers A collaborative effort between ANSYS and IBM 2 IBM Information Technology Guide For ANSYS Fluent
More informationANSYS High. Computing. User Group CAE Associates
ANSYS High Performance Computing User Group 010 010 CAE Associates Parallel Processing in ANSYS ANSYS offers two parallel processing methods: Shared-memory ANSYS: Shared-memory ANSYS uses the sharedmemory
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationQLogic TrueScale InfiniBand and Teraflop Simulations
WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging
More informationHYCOM Performance Benchmark and Profiling
HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationReal Application Performance and Beyond
Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationOverview of Parallel Computing. Timothy H. Kaiser, PH.D.
Overview of Parallel Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu Introduction What is parallel computing? Why go parallel? The best example of parallel computing Some Terminology Slides and examples
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationHPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:
HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms Author: Stan Posey Panasas, Inc. Correspondence: Stan Posey Panasas, Inc. Phone +510 608 4383 Email sposey@panasas.com
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationHPC 2 Informed by Industry
HPC 2 Informed by Industry HPC User Forum October 2011 Merle Giles Private Sector Program & Economic Development mgiles@ncsa.illinois.edu National Center for Supercomputing Applications University of Illinois
More informationOptimizing LS-DYNA Productivity in Cluster Environments
10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationParticleworks: Particle-based CAE Software fully ported to GPU
Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation
More informationEnhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations
Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationIndustrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009
Industrial finite element analysis: Evolution and current challenges Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Dr. Chief Numerical Analyst Office of Architecture and
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationDELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)
DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster
More informationMPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA
MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationANSYS Fluent 14 Performance Benchmark and Profiling. October 2012
ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationGPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation
GPU-Acceleration of CAE Simulations Bhushan Desam NVIDIA Corporation bdesam@nvidia.com 1 AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationMSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures
MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures Presented By: Dr. Olivier Schreiber, Application Engineering, SGI Walter Schrauwen, Senior Engineer, Finite Element Development, MSC
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationDell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for ANSYS Mechanical, ANSYS Fluent, and
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationInvestigation and Feasibility Study of Linux and Windows in the Computational Processing Power of ANSYS Software
Science Arena Publications Specialty Journal of Electronic and Computer Sciences Available online at www.sciarena.com 2017, Vol, 3 (1): 40-46 Investigation and Feasibility Study of Linux and Windows in
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationIntel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*
Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationD036 Accelerating Reservoir Simulation with GPUs
D036 Accelerating Reservoir Simulation with GPUs K.P. Esler* (Stone Ridge Technology), S. Atan (Marathon Oil Corp.), B. Ramirez (Marathon Oil Corp.) & V. Natoli (Stone Ridge Technology) SUMMARY Over the
More informationLS-DYNA Productivity and Power-aware Simulations in Cluster Environments
LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationUsing an HPC Cloud for Weather Science
Using an HPC Cloud for Weather Science Provided By: Transforming Operational Environmental Predictions Around the Globe Moving EarthCast Technologies from Idea to Production EarthCast Technologies produces
More informationArchitectures for Scalable Media Object Search
Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation
More informationFEMAP/NX NASTRAN PERFORMANCE TUNING
FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationIndustrial achievements on Blue Waters using CPUs and GPUs
Industrial achievements on Blue Waters using CPUs and GPUs HPC User Forum, September 17, 2014 Seattle Seid Korić PhD Technical Program Manager Associate Adjunct Professor koric@illinois.edu Think Big!
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationImproving Your Structural Mechanics Simulations with Release 14.0
Improving Your Structural Mechanics Simulations with Release 14.0 1 What will Release 14.0 bring you? 2 Let s now take a closer look at some topics 3 MAPDL/WB Integration Finite Element Information Access
More informationStructural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc.
Structural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc. 1 ANSYS Structural Mechanics provides highend solver solutions within a highly productive user environment to reliably and
More informationGROMACS Performance Benchmark and Profiling. August 2011
GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationInfiniBand-based HPC Clusters
Boosting Scalability of InfiniBand-based HPC Clusters Asaf Wachtel, Senior Product Manager 2010 Voltaire Inc. InfiniBand-based HPC Clusters Scalability Challenges Cluster TCO Scalability Hardware costs
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationNAMD GPU Performance Benchmark. March 2011
NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory
More informationGPU PROGRESS AND DIRECTIONS IN APPLIED CFD
Eleventh International Conference on CFD in the Minerals and Process Industries CSIRO, Melbourne, Australia 7-9 December 2015 GPU PROGRESS AND DIRECTIONS IN APPLIED CFD Stan POSEY 1*, Simon SEE 2, and
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationHP GTC Presentation May 2012
HP GTC Presentation May 2012 Today s Agenda: HP s Purpose-Built SL Server Line Desktop GPU Computing Revolution with HP s Z Workstations Hyperscale the new frontier for HPC New HPC customer requirements
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationDell HPC System for Manufacturing System Architecture and Application Performance
Dell HPC System for Manufacturing System Architecture and Application Performance This Dell technical white paper describes the architecture of the Dell HPC System for Manufacturing and discusses performance
More informationS8901 Quadro for AI, VR and Simulation
S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationAlternative GPU friendly assignment algorithms. Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield
Alternative GPU friendly assignment algorithms Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield Graphics Processing Units (GPUs) Context: GPU Performance Accelerated
More informationAerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project
Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),
More information