Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Size: px
Start display at page:

Download "Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering"

Transcription

1 State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France April -5, Dr. Siemens PLM Software, USA PARENG-

2 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions and future work PARENG-

3 Industrial complexity constantly increasing Jet Engine, parts Engine block,, elements 3 3 Car 3, parts Factory, machines PARENG-

4 Computer hardware constantly changing Cray Computer Multi-core CPU $5 million $5 O() gigaflops O() gigaflops sold million sold 4 PARENG-

5 Lifecycle simulations Designer view Analyst view 5 PARENG-

6 Multidisciplinary solutions Designer view Analyst view 6 PARENG-

7 High performance requirements The constrained stiffness matrix of an analysis problem Number of rows: 35,734,79 Nonzero terms:,384,35,995 Nonzero terms in sparse factor matrix: 43,87,4, Memory used during factorization:,8,73, (4 byte) words Actual elapsed time of sparse factorization on a single high performance processor: 335 minutes 7 PARENG-

8 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions 8 PARENG-

9 Single level geometric domain decomposition Subdivide large geometry domains into limited number of partitions Proc Proc Proc k Computations in the geometry partitions are dependent Minimize the boundary size of each partition with respect to its interior Minimize the total boundary size as communication is needed 9 PARENG-

10 Multi-level geometry domain decomposition Single level Subdivide large geometry domains into limited number of partitions Subdivide the partitions into sub-partitions and dynamically reduce them to their collectors Assemble the multilevel substructures to obtain the engineering solution The total number of substructures may exceed the number of processors PARENG-

11 Finite element problem domain decomposition Based on model or matrices Graph Matrix FE model Vertices Diagonal Terms Node points Edges Off-diagonals Elements Undirected Symmetric Linear PARENG-

12 PARENG- Graphs and matrices Graph model and its Laplacian matrix Finite element model and its stiffness matrix = k k k k k k k k k k k k k k k k k K Membrane Element Membrane Element = 4 L

13 3 PARENG- Partitioning technology Spectral bisection method Vertex cut result : u Lu λ = = / / / / / / / /

14 Recursive graph partitioning Coarsening, partitioning and refining phases Coarsening 7 5 Partitioning Partition Partition Refining PARENG-

15 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions and future work 5 PARENG-

16 Distributed memory parallel architecture Cluster of high performance workstations Distributed memory work station Dedicated I/O devices High level parallelism Feasible number of nodes: PARENG-

17 Recursive matrix partitioning Geometric problem Partitioning hierarchy PARENG-

18 Distributed normal modes analysis Physical problem ( K λm ) Φ = Partitioned form,3,3 K oo λmoo Kot λm ot φ o,3,3 K oo λmoo Kot λm ot φ o 3 3 3,7 3,7 K 3 tt λmtt Ktt λm tt φ t 4 4 4,6 4,6 K 4 oo λmoo Kot λm ot φ o = 5 5 5,6 5,6 K 5 oo λmoo Kot λm ot φ o 6 6 6,7 6,7 K 6 tt λmtt Ktt λmtt φ t Ktt λm tt φ t 8 PARENG-

19 Phase Start Processor Processor Processor 3 Processor 4 Communicate 9 PARENG-

20 Phase Start Processors - Processors 3-4 Communicate PARENG-

21 PARENG- Phase 3 Processors Start ~ ) ~ ~ ( = Φ M K λ Solve reduced order problem Recover physical solution Φ = Φ = Φ = ~ ~ ~ ~ ~ ~ ~ ~ t t o o t o o t t o o t o o q q q q q q q

22 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions and future work PARENG-

23 Shared memory parallel architecture Multi-core processors Shared cache Shared memory Low level parallelism Feasible number of cores: -6 3 PARENG-

24 Sparse factorization Matrix connectivity Reordering Elimination tree Factorization 4 PARENG-

25 Multifrontal factorization Sparsity pattern Frontal steps Front amalgamation 5 PARENG-

26 Supernodal approach Symbolic reordering Consecutive columns Same sparsity pattern Cache fitting size 6 PARENG-

27 Matrix update Panel selection Downstream columns Different sparsity pattern BLAS.5 operation 7 PARENG-

28 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions and future work 8 PARENG-

29 High performance workstation cluster IBM P575 nodes with.9 GHz 4 dual-core POWER5 CPUs per node 3.5 Terabyte aggregate memory Terabyte total disk space IBM High Performance Switch (HPS) 8 GB/sec bidirectional bandwidth AIX OS Version 5.3 Parallel Environment (PE) V4. 9 PARENG-

30 Trimmed car body application Shell element model.3 M grid points. M shell elements 7.9 M degrees of freedom Normal modes analysis Frequency 3 Hz ~ normal modes 5 partitions 3 PARENG-

31 Shortening solution time Speed Up Serial Number of DMP processes 3 PARENG-

32 Increased fidelity of analysis.. Solution Time (Normalized) Number of Modes (Normalized) Frequency Range (Hz) 3 PARENG-

33 Distributed memory workstation HP Proliant DL3G5 server 64 dual core (.85 GHz) Xeon CPUs 5GB local SATA disks per node 4 GB memory per node GigE interconnect with HP MPI Suse Linux Version.3 33 PARENG-

34 Automotive engine application Solid element model 3.6 M grid points.3 M tetrahedral elements.8 M degrees of freedom Normal modes analysis Frequency:, Hz ~ 5 normal modes 56 partitions 34 PARENG-

35 Shortening solution time Speed up Serial Number of DMP processes 35 PARENG-

36 Increased fidelity of analysis 4.. Solution Time (Normalized).57. Number of Modes (Normalized) , -, -3, -4, -5, Frequency Range (Hz) 36 PARENG-

37 Scope or presentation Introduction to industrial analysis Geometric domain decomposition Distributed computational solutions Parallel computational kernels Application case studies Conclusions and future work 37 PARENG-

38 Conclusions Geometric domain decomposition technologies provide the basis for distributed solutions on modern hardware Recursive computational solutions can support a wide range of engineering analyses with practically acceptable accuracy The handling of the local matrix operations with multi-core processors contributes to the overall performance gain The performance advantages of distributed computational solutions are significant and tremendously accelerate the engineering work 38 PARENG-

39 Future work Extending the distributed finite element technology to a grid computing environment Overcoming the lack of node to node communication mechanism with a high speed network Minimizing the need for a high bandwidth connection between the local nodes and storage devices Synchronizing completion of similar computational complexity components on non-homogeneous grid environment 39 PARENG-

40 Thank you for your attention! Siemens and the Siemens logo are registered trademarks of Siemens AG. NX is a registered trademark of Siemens PLM Software Inc. in the United States and in other countries. NASTRAN is a registered trademark of the National Aeronautics and Space Administration. SpaceShip One pictures by courtesy and permission of Quartus Engineering Inc. 4 PARENG-

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Industrial finite element analysis: Evolution and current challenges Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Dr. Chief Numerical Analyst Office of Architecture and

More information

6 Implementation of Parallel FE Systems

6 Implementation of Parallel FE Systems 6 Implementation of Parallel FE Systems 6.1 Implementation of Domain Decomposition in MSC.NASTRAN V70.7 6.2 Further Parallel Features of MSC.NASTRAN V70.7 6.2.1 Parallel Normal Modes Analysis 6.2.2 Parallel

More information

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for

More information

GPU COMPUTING WITH MSC NASTRAN 2013

GPU COMPUTING WITH MSC NASTRAN 2013 SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology

More information

NX Nastran 11 PARALLEL PROCESSING GUIDE

NX Nastran 11 PARALLEL PROCESSING GUIDE NX Nastran 11 PARALLEL PROCESSING GUIDE 1 Proprietary & Restricted Rights Notice 2016 Siemens Product Lifecycle Management Software Inc. All Rights Reserved. This software and related documentation are

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Reckoning With The Limits Of FEM Analysis

Reckoning With The Limits Of FEM Analysis Special reprint from CAD CAM 9-10/2008 Reckoning With The Limits Of FEM Analysis 27. Jahrgang 11,90 N 9-10 September/Oktober 2008 TRENDS - TECHNOLOGIEN - BEST PRACTICE DIGITALE FABRIK: VIRTUELLE PRODUKTION

More information

Femap automatic meshing simplifies virtual testing of even the toughest assignments

Femap automatic meshing simplifies virtual testing of even the toughest assignments Femap automatic meshing simplifies virtual testing of even the toughest assignments fact sheet Siemens PLM Software www.siemens.com/plm/femap Summary Femap version 10 software is the latest release of

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Windows Hardware Performance Tuning for Nastran. Easwaran Viswanathan (Siemens PLM Software)

Windows Hardware Performance Tuning for Nastran. Easwaran Viswanathan (Siemens PLM Software) Windows Hardware Performance Tuning for Nastran By Easwaran Viswanathan (Siemens PLM Software) NX Nastran is a very I/O intensive application. It is important to select the proper hardware to satisfy expected

More information

Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis. Peter Schartz, Parallel Project Manager ClusterWorld Conference June 2003

Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis. Peter Schartz, Parallel Project Manager ClusterWorld Conference June 2003 Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis Peter Schartz, Parallel Project Manager Conference Outline Introduction Background Theory Case Studies Full Vehicle Dynamic Analysis

More information

THE application of advanced computer architecture and

THE application of advanced computer architecture and 544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,

More information

Simcenter 3D Engineering Desktop

Simcenter 3D Engineering Desktop Simcenter 3D Engineering Desktop Integrating geometry and FE modeling to streamline the product development process Benefits Speed simulation processes by up to 70 percent Increase product quality by rapidly

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

QLogic TrueScale InfiniBand and Teraflop Simulations

QLogic TrueScale InfiniBand and Teraflop Simulations WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging

More information

NX Advanced Simulation

NX Advanced Simulation Siemens PLM Software Integrating FE modeling and simulation streamlines product development process Benefits Speed simulation processes by up to 70 percent Perform accurate, reliable structural analysis

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Femap Version

Femap Version Femap Version 11.3 Benefits Easier model viewing and handling Faster connection definition and setup Faster and easier mesh refinement process More accurate meshes with minimal triangle element creation

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

2 Fundamentals of Serial Linear Algebra

2 Fundamentals of Serial Linear Algebra . Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals

More information

MD NASTRAN on Advanced SGI Architectures *

MD NASTRAN on Advanced SGI Architectures * W h i t e P a p e r MD NASTRAN on Advanced SGI Architectures * Olivier Schreiber, Scott Shaw, Joe Griffin** Abstract MD Nastran tackles all important Normal Mode Analyses utilizing both Shared Memory Parallelism

More information

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Performance Benefits of NVIDIA GPUs for LS-DYNA

Performance Benefits of NVIDIA GPUs for LS-DYNA Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA

More information

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,

More information

Simcenter 3D Structures

Simcenter 3D Structures Simcenter 3D Structures Integrating FE modeling and simulation streamlines product development Benefits Speed simulation processes by up to 70 percent Perform accurate, reliable structural analysis with

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Technical Report of ADVENTURE Project ADV-99-1 (1999) PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Hiroyuki TAKUBO and Shinobu YOSHIMURA School of Engineering University

More information

FEMAP/NX NASTRAN PERFORMANCE TUNING

FEMAP/NX NASTRAN PERFORMANCE TUNING FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble

More information

Early Experiences with the Naval Research Laboratory XD1. Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr.

Early Experiences with the Naval Research Laboratory XD1. Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr. Your corporate logo here Early Experiences with the Naval Research Laboratory XD1 Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr. Jeanie Osburn Who we are Naval Research Laboratory Navy

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

Acoustic Prediction Made Practical: Process Time Reduction with Pre/SYSNOISE, a recent joint development by MSC & LMS ABSTRACT

Acoustic Prediction Made Practical: Process Time Reduction with Pre/SYSNOISE, a recent joint development by MSC & LMS ABSTRACT Acoustic Prediction Made Practical: Process Time Reduction with Pre/SYSNOISE, a recent joint development by MSC & LMS L. Cremers, O. Storrer and P. van Vooren LMS International NV Interleuvenlaan 70 B-3001

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Optimizing the operations with sparse matrices on Intel architecture

Optimizing the operations with sparse matrices on Intel architecture Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Predictive Engineering: FEA Consulting Femap and NX Nastran PSD Analysis of Advanced R&D Satellite

Predictive Engineering: FEA Consulting Femap and NX Nastran PSD Analysis of Advanced R&D Satellite Revolutionary satellite solution features modular design SpaceWorks provides unique and advanced satellite solutions. One of the company s current projects involves the development of the next-generation

More information

Future Trends in Hardware and Software for use in Simulation

Future Trends in Hardware and Software for use in Simulation Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock

More information

Teamcenter Installation on Windows Clients Guide. Publication Number PLM00012 J

Teamcenter Installation on Windows Clients Guide. Publication Number PLM00012 J Teamcenter 10.1 Installation on Windows Clients Guide Publication Number PLM00012 J Proprietary and restricted rights notice This software and related documentation are proprietary to Siemens Product Lifecycle

More information

NX Fixed Plane Additive Manufacturing Help

NX Fixed Plane Additive Manufacturing Help NX 11.0.2 Fixed Plane Additive Manufacturing Help Version #1 1 NX 11.0.2 Fixed Plane Additive Manufacturing Help June 2, 2017 Version #1 NX 11.0.2 Fixed Plane Additive Manufacturing Help Version #1 2 Contents

More information

Linux Compute Cluster in the German Automotive Industry

Linux Compute Cluster in the German Automotive Industry Linux Compute Cluster in the German Automotive Industry Clusterworld, San Jose, June 24-26 Dr. Karsten Gaier Altreia Solutions Linux Compute Cluster are... Fast in Computation Cost-effective Perfect in

More information

Teamcenter Installation on Linux Clients Guide. Publication Number PLM00010 J

Teamcenter Installation on Linux Clients Guide. Publication Number PLM00010 J Teamcenter 10.1 Installation on Linux Clients Guide Publication Number PLM00010 J Proprietary and restricted rights notice This software and related documentation are proprietary to Siemens Product Lifecycle

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Accelerating Finite Element Analysis in MATLAB with Parallel Computing

Accelerating Finite Element Analysis in MATLAB with Parallel Computing MATLAB Digest Accelerating Finite Element Analysis in MATLAB with Parallel Computing By Vaishali Hosagrahara, Krishna Tamminana, and Gaurav Sharma The Finite Element Method is a powerful numerical technique

More information

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy

More information

Coupled Finite Element Method Based Vibroacoustic Analysis of Orion Spacecraft

Coupled Finite Element Method Based Vibroacoustic Analysis of Orion Spacecraft Coupled Finite Element Method Based Vibroacoustic Analysis of Orion Spacecraft Lockheed Martin Space Systems Company (LMSSC) Spacecraft and Launch Vehicle Dynamic Environments Workshop June 21 23, 2016

More information

NX Advanced Simulation: FE modeling and simulation

NX Advanced Simulation: FE modeling and simulation Advanced Simulation: FE modeling and simulation NX CAE Benefits Speed simulation processes by up to 70 percent Increase product quality by rapidly simulating design trade-off studies Lower overall product

More information

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Scalability Analysis on Cray Supercomputers 13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,

More information

Maximizing Memory Performance for ANSYS Simulations

Maximizing Memory Performance for ANSYS Simulations Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance

More information

Cost-Effective Parallel Computational Electromagnetic Modeling

Cost-Effective Parallel Computational Electromagnetic Modeling Cost-Effective Parallel Computational Electromagnetic Modeling, Tom Cwik {Daniel.S.Katz, cwik}@jpl.nasa.gov Beowulf System at PL (Hyglac) l 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,

More information

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

EMC SYMMETRIX VMAX 40K STORAGE SYSTEM

EMC SYMMETRIX VMAX 40K STORAGE SYSTEM EMC SYMMETRIX VMAX 40K STORAGE SYSTEM The EMC Symmetrix VMAX 40K storage system delivers unmatched scalability and high availability for the enterprise while providing market-leading functionality to accelerate

More information

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version : IBM 000-742 IBM Open Systems Storage Solutions Version 4 Download Full Version : https://killexams.com/pass4sure/exam-detail/000-742 Answer: B QUESTION: 156 Given the configuration shown, which of the

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Chapter 1: Introduction Dr. Ali Fanian. Operating System Concepts 9 th Edit9on

Chapter 1: Introduction Dr. Ali Fanian. Operating System Concepts 9 th Edit9on Chapter 1: Introduction Dr. Ali Fanian Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 1.2 Silberschatz, Galvin and Gagne 2013 Organization Lectures Homework Quiz Several homeworks

More information

Creating Mold Bases with NX Expressions

Creating Mold Bases with NX Expressions Creating Mold Bases with NX Expressions By Murat Ugur, April 29, 2013 In this article, I introduce Siemens PLM Systems NX expressions, part families, and the visual parameter editor. I show how to create

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

Topology Optimization for Designers

Topology Optimization for Designers TM Topology Optimization for Designers Siemens AG 2016 Realize innovation. Topology Optimization for Designers Product Features Uses a different approach than traditional Topology Optimization solutions.

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

MSC Software: Release Overview - MSC Nastran MSC Nastran 2014 RELEASE OVERVIEW

MSC Software: Release Overview - MSC Nastran MSC Nastran 2014 RELEASE OVERVIEW MSC Nastran 2014 Welcome to MSC Nastran 2014! Welcome to MSC Nastran 2014! The MSC Nastran 2014 release is focused on delivering new capabilities and performance required to solve multidisciplinary problems.

More information

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures Presented By: Dr. Olivier Schreiber, Application Engineering, SGI Walter Schrauwen, Senior Engineer, Finite Element Development, MSC

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Exploring unstructured Poisson solvers for FDS

Exploring unstructured Poisson solvers for FDS Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests

More information

Parallel Unstructured Mesh Generation by an Advancing Front Method

Parallel Unstructured Mesh Generation by an Advancing Front Method MASCOT04-IMACS/ISGG Workshop University of Florence, Italy Parallel Unstructured Mesh Generation by an Advancing Front Method Yasushi Ito, Alan M. Shih, Anil K. Erukala, and Bharat K. Soni Dept. of Mechanical

More information

Generic Topology Mapping Strategies for Large-scale Parallel Architectures

Generic Topology Mapping Strategies for Large-scale Parallel Architectures Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous

More information

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES A Thesis by XUYANG GUO Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

Additive manufacturing with NX

Additive manufacturing with NX Additive manufacturing with processes. By using you have the power to drive the latest additive manu facturing equipment, including powder bed 3D printers. Delivering design, simulation and manufacturing

More information

NX CAM 9.0.2: Contact Tool Position on Area Milling Boundaries

NX CAM 9.0.2: Contact Tool Position on Area Milling Boundaries Siemens PLM Software NX CAM 9.0.2: Contact Tool Position on Area Milling Boundaries Using a contact tool position for trim boundaries. Answers for industry. About NX CAM NX TM CAM software has helped many

More information

Hierarchical Multi level Approach to graph clustering

Hierarchical Multi level Approach to graph clustering Hierarchical Multi level Approach to graph clustering by: Neda Shahidi neda@cs.utexas.edu Cesar mantilla, cesar.mantilla@mail.utexas.edu Advisor: Dr. Inderjit Dhillon Introduction Data sets can be presented

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

ANSYS HPC Technology Leadership

ANSYS HPC Technology Leadership ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables

More information

Kofax Capture. Technical Specifications. Version: Date:

Kofax Capture. Technical Specifications. Version: Date: Kofax Capture Technical Specifications Version: 11.0.0 Date: 2017-10-31 2017 Kofax. All rights reserved. Kofax is a trademark of Kofax, Inc., registered in the U.S. and/or other countries. All other trademarks

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Partitioning Effects on MPI LS-DYNA Performance

Partitioning Effects on MPI LS-DYNA Performance Partitioning Effects on MPI LS-DYNA Performance Jeffrey G. Zais IBM 138 Third Street Hudson, WI 5416-1225 zais@us.ibm.com Abbreviations: MPI message-passing interface RISC - reduced instruction set computing

More information

Behavioral Data Mining. Lecture 12 Machine Biology

Behavioral Data Mining. Lecture 12 Machine Biology Behavioral Data Mining Lecture 12 Machine Biology Outline CPU geography Mass storage Buses and Networks Main memory Design Principles Intel i7 close-up From Computer Architecture a Quantitative Approach

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach P. Amestoy I.S. Duff A. Guermouche Tz. Slavova April 25, 200 Abstract We consider the parallel solution of sparse linear systems

More information

Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture

Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture K. Akbudak a, C.Aykanat

More information