University of Quebec at Chicoutimi (UQAC) Friction Stir Welding Research Group. A CUDA Fortran. Story

Similar documents
Thermal Coupling Method Between SPH Particles and Solid Elements in LS-DYNA

Applications of ICFD /SPH Solvers by LS-DYNA to Solve Water Splashing Impact to Automobile Body. Abstract

Necking and Failure Simulation of Lead Material Using ALE and Mesh Free Methods in LS-DYNA

Acknowledgements. Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn. SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar

1.2 Numerical Solutions of Flow Problems

FINITE ELEMENT MODELING OF THICK PLATE PENETRATIONS IN SMALL CAL MUNITIONS

Thermal Coupling Method Between SPH Particles and Solid Elements in LS-DYNA

LS-DYNA Smooth Particle Galerkin (SPG) Method

LS-DYNA 980 : Recent Developments, Application Areas and Validation Process of the Incompressible fluid solver (ICFD) in LS-DYNA.

Simulation of Automotive Fuel Tank Sloshing using Radioss

Smoothed Particle Galerkin Method with a Momentum-Consistent Smoothing Algorithm for Coupled Thermal-Structural Analysis

14 Dec 94. Hydrocode Micro-Model Concept for Multi-Component Flow in Sediments Hans U. Mair

CHAPTER 1. Introduction

by Mahender Reddy Concept To Reality / Summer 2006

A meshfree weak-strong form method

Introduction to C omputational F luid Dynamics. D. Murrin

Shape of Things to Come: Next-Gen Physics Deep Dive

Recent Developments and Roadmap Part 0: Introduction. 12 th International LS-DYNA User s Conference June 5, 2012

Orbital forming of SKF's hub bearing units

Lagrangian methods and Smoothed Particle Hydrodynamics (SPH) Computation in Astrophysics Seminar (Spring 2006) L. J. Dursi

GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method

C. A. D. Fraga Filho 1,2, D. F. Pezzin 1 & J. T. A. Chacaltana 1. Abstract

Simulation of Fuel Sloshing Comparative Study

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

APPLICATIONS OF NUMERICAL ANALYSIS IN SIMULATION ENGINEERING AND MANUFACTURING TECHNOLOGIES

Numerical Simulation of Temperature Distribution and Material Flow During Friction Stir Welding 2017A Aluminum Alloys

Fluid-Structure-Interaction Using SPH and GPGPU Technology

Recent Developments in LS-DYNA II

Computational Fluid Dynamics (CFD) using Graphics Processing Units

A Novel Approach to High Speed Collision

Modeling Methodologies for Assessment of Aircraft Impact Damage to the World Trade Center Towers

Modeling of Fuel Sloshing Phenomena Considering Solid-Fluid Interaction

Metafor FE Software. 2. Operator split. 4. Rezoning methods 5. Contact with friction

Numerical Simulation of Temperature Distribution and Material Flow During Friction Stir Welding 2017A Aluminum Alloys

Analysis Comparison between CFD and FEA of an Idealized Concept V- Hull Floor Configuration in Two Dimensions

2.11 Particle Systems

Example 13 - Shock Tube

GALAXY ADVANCED ENGINEERING, INC. P.O. BOX 614 BURLINGAME, CALIFORNIA Tel: (650) Fax: (650)

computational Fluid Dynamics - Prof. V. Esfahanian

Simulating Underbelly Blast Events using Abaqus/Explicit - CEL

SIMULATION OF A DETONATION CHAMBER TEST CASE

Finite Element Simulation using SPH Particles as Loading on Typical Light Armoured Vehicles

Support for Multi physics in Chrono

Fluid Simulation. [Thürey 10] [Pfaff 10] [Chentanez 11]

OVERVIEW OF ALE METHOD IN LS-DYNA. Ian Do, LSTC Jim Day, LSTC

CHAPTER-10 DYNAMIC SIMULATION USING LS-DYNA

SPH METHOD IN APPLIED MECHANICS

MANUFACTURING OPTIMIZING COMPONENT DESIGN

A Particle Cellular Automata Model for Fluid Simulations

"The real world is nonlinear"... 7 main Advantages using Abaqus

SPH: Why and what for?

VALIDATE SIMULATION TECHNIQUES OF A MOBILE EXPLOSIVE CONTAINMENT VESSEL

How TMG Uses Elements and Nodes

Preliminary Spray Cooling Simulations Using a Full-Cone Water Spray

ALE and AMR Mesh Refinement Techniques for Multi-material Hydrodynamics Problems

Abstract. Die Geometry. Introduction. Mesh Partitioning Technique for Coextrusion Simulation

It has been widely accepted that the finite element

Application of Finite Volume Method for Structural Analysis

A Coupled 3D/2D Axisymmetric Method for Simulating Magnetic Metal Forming Processes in LS-DYNA

Recent Advances on Higher Order 27-node Hexahedral Element in LS-DYNA

New Release of the Welding Simulation Suite

NUMERICAL SIMULATION OF STRUCTURAL DEFORMATION UNDER SHOCK AND IMPACT LOADS USING A COUPLED MULTI-SOLVER APPROACH

Three-dimensional Simulation of Robot Path and Heat Transfer of a TIG-welded Part with Complex Geometry

PTC Creo Simulate. Features and Specifications. Data Sheet

ANSYS/LS-Dyna. Modeling. Prepared by. M Senior Engineering Manager

3D Simulation of Dam-break effect on a Solid Wall using Smoothed Particle Hydrodynamic

CUDA Particles. Simon Green

A new approach to interoperability using HDF5

3D simulations of concrete penetration using SPH formulation and the RHT material model

Available online at ScienceDirect. Procedia Engineering 136 (2016 ) Dynamic analysis of fuel tank

Fluid-Structure Interaction in LS-DYNA: Industrial Applications

CGT 581 G Fluids. Overview. Some terms. Some terms

SPH: Towards the simulation of wave-body interactions in extreme seas

CFD modelling of thickened tailings Final project report

1.6 Smoothed particle hydrodynamics (SPH)

Coastal impact of a tsunami Review of numerical models

A Comparison of the Computational Speed of 3DSIM versus ANSYS Finite Element Analyses for Simulation of Thermal History in Metal Laser Sintering

Coupled analysis of material flow and die deflection in direct aluminum extrusion

Chapter 1 - Basic Equations

Locomotion on soft granular Soils

Realistic Animation of Fluids

Dynamic Computational Modeling of the Glass Container Forming Process

ALE Adaptive Mesh Refinement in LS-DYNA

Simulating Reinforced Concrete Beam-Column Against Close-In Detonation using S-ALE Solver

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

Using ANSYS and CFX to Model Aluminum Reduction Cell since1984 and Beyond. Dr. Marc Dupuis

Scalability Study of Particle Method with Dynamic Load Balancing

Modelling of Impact on a Fuel Tank Using Smoothed Particle Hydrodynamics

2-D Tank Sloshing Using the Coupled Eulerian- LaGrangian (CEL) Capability of Abaqus/Explicit

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Transfer and pouring processes of casting by smoothed particle. hydrodynamic method

Print Depth Prediction in Hot Forming Process with a Reconfigurable Die

Design, Modification and Analysis of Two Wheeler Cooling Sinusoidal Wavy Fins

Simulation of engraving process of large-caliber artillery using coupled Eulerian-Lagrangian method

LS-Dyna CAE A ssociates Associates

Modelling Flat Spring Performance Using FEA

How to Handle Irregular Distribution of SPH Particles in Dynamic Fracture Analysis

Analysis of holes and spot weld joints using sub models and superelements


Module 1: Introduction to Finite Element Analysis. Lecture 4: Steps in Finite Element Analysis

Transcription:

University of Quebec at Chicoutimi (UQAC) Friction Stir Welding Research Group A CUDA Fortran Story

Summary At the University of Quebec at Chicoutimi (UQAC) friction stir welding (FSW) research group, we were looking for a way to simulate the FSW process with improved realism and functionality. The welding process, which is ideal for high strength aluminum alloys, is becoming mainstream since its inception in the early 90 s. Simulating the process is complicated by the large amount of material movement and the importance of keeping track of the thermal and mechanical history of the aluminum. We have chosen to use an advanced modelling approach using the smoothed particle hydrodynamics (SPH) method. Although the numerical algorithm is very computationally expensive, excellent speed ups can be achieved using NVIDIA s graphics processing units (GPU). For our research work on the FSW process, we have implemented a fully coupled thermo-mechanical large deformation solid mechanics code using SPH on the GPU. We have achieved full code speed ups of over 45x compared to the same code on a central processing unit. All this was accomplished without any previous knowledge of CUDA Fortran. The Research Group The friction stir welding research group (FSWRG) is focused on developing important advancements for the aluminum industry. UQAC is at the center of the aluminum valley in the heart of the province of Quebec. This region is one of the largest producers of aluminum in North America and the World. We have access to three different FSW machines, the biggest being able to perform full penetration welds up to 38mm thick on an 18 meter long beam. The FSWRG works closely with industry through a technology transfer center. The main goal of this center is to allow companies and entrepreneurs to have access to world class FSW machines as well as the knowledge of the members of the FSWRG. Our branch of the FSWRG is working on simulating the welding process. The Challenge The welding method is well suited for high strength aluminum alloys. The process is capable of producing full penetration welds in aluminum plates in a fraction of the time. The process affords fewer defects compared to conventional MIG and TIG welding. A hardened steel welding tool is used to form the weld (see adjacent process image, http://www.caranddriver.com). The tool rotates and is forced (~1-10 tons of force) into the aluminum plates to be welded. Friction and plastic deformation cause the aluminum to heat up, making the material highly plastic and easy to deform. Once the material is hot enough (about 80% of the melting temperature), the tool will start to advance and join the plates together. The FSW process is very difficult to simulate. The main complications are:

Enormous levels of plastic deformation (materials physically mix during welding) Complicated friction behavior between the welding tool and the aluminum Material heating and softening Complex microstructure changes The first of these challenges turns out to be the greatest of all. Because of the mixing behavior, most mesh based simulation approach cannot easily simulate the whole FSW process. There has been a number of research groups focused on simulating the FSW process with other numerical methods. But so far none has been able to simulate all the physics of the process from the start to the end of the weld, including cool down to predict residual stresses. The simulation approach should be driven by the underling phenomena without making an inordinate number of assumptions that compromise the simulation results. Most importantly the model should be able to predict: Material mixing and allow for changes in the free surface of the aluminum work pieces The temperature distribution Macroscopic post weld defect such as incomplete weld penetration Residual stresses and distortions Microstructure changes All these criteria must be accounted for during the entire welding process. Not an easy feat by any means. You can imagine that taking all these aspects into consideration leads to very complex simulation models with long computational times. The Solution In order to simulate the full FSW process, an advanced simulation method was needed. We had previously tried many different techniques such as: finite element method (FEM), arbitrary Lagrangian Eulerian (ALE), element free Galerkin (EFG), computational fluid dynamics (CFD) and even the discrete element method (DEM). Each method showed promising results for certain aspects of the welding process. However, none of the methods allowed us to accomplish all our simulation goals. Smoothed Particle Hydrodynamics We then turned our attention towards the smoothed particle hydrodynamics (SPH) method to solve these difficulties. SPH is a meshfree collocation method (a bit like finite difference, but much more powerful and versatile). Since there is no mesh, the large plastic deformation difficulties can easily be handled. The method works by weakening a set of partial differential equations into a set of ordinary differential equations (ODE). This is done by using an SPH interpolation technique. In the above image, the set of ODEs are solved at the calculation point (red point) by interpolating to the points within its neighborhood (all the back points). The interpolation is accomplished using a smoothing function (also called an SPH kernel). The yellow points

are outside the neighborhood and are not used in the calculation for the red point. For solid mechanics problems, the set of field equations that we deal with are those from continuum mechanics, namely; conservation of mass, momentum and energy: The left bank of equations are the continuum form. The right bank is the SPH approximation (more details on SPH theory from Liu and Liu [1], Violeau [2] and Hoover [3]). The description of the variables is not of key importance. What is important however is to recognize that for each SPH element (subscript i) a summation is performed over all the neighboring elements (subscript j). In 3D, a typical element will have ~56 neighbors; therein lies the computational complexity of the SPH method. Implementation on the GPU We needed a simple way to improve the performance of the simulation code. We started testing CUDA Fortran and quickly saw the incredible power of the parallelization strategy for SPH. The natural fine grained parallelism of the GPU is a perfect fit for SPH. The FSWRG s background is in simulation, we are not computer programmers. Choosing Fortran was a natural choice for us due to the relative simplicity of the programming model (see Ruetsch and Fatica [4]). In order to really take advantage of the GPU, we needed to have all the calculations being performed on the device. This way, the only data transfer between the host and device is during initialization and occasionally to send results to the host for post processing. The main idea behind the GPU implementation is that each SPH element is assigned to a CUDA thread. The sums over the neighbors are performed in sequence within the thread. This has proven to be a very efficient and straight forward approach. The flow of the SPH code is: 1 - Initialize the geometry and all parameters on the host 2 - Transfer all data to the GPU 3 - Move any boundaries that have a prescribed motion 4 - Calculate the neighbor lists for all the elements 5 - Conservation of mass 6 - Calculate stresses from material model 7 - Conservation of momentum 8 - Calculate contact forces 9 - Conservation of energy Steps 3 to 9 are repeated until the simulation end time is reached. The heaviest part of the code is the neighbors search (step 4). We have used a fixed size cell search (commonly called a bucket search), this

algorithm is ideal for solid mechanics SPH. We have also developed an adaptive search method that improves on the standard cell search algorithm (details in Fraser [5]). In our SPH program, one of the least complex subroutines is conservation of mass. The algorithm in Fortran for a CPU implementation would be: The first loop ranges over all the elements in the model (ntotal). The second loop ranges over all the neighbors (Neib) of i. The equivalent device kernel for the conservation of mass is very similar to the CPU code: The only change is that the first loop has been replaced with a parallel thread descriptor for i. It s really not any more complicated than that. There are certainly other parallelization approaches, but this one is simple and efficient.

Memory Model We have kept the code as simple as possible. Global memory is used exclusively throughout the whole program. We did some testing using textures, but unfortunately, there was no performance improvement and some kernels were actually slower. Switching to a shared memory model requires a completely different parallelization strategy. Instead of assigning each SPH element to a thread. In 3D, we would have to instead assign a group of 3x3 cells of SPH elements (from the cell search) to a block of threads. Each cell typically holds 8 elements (optimally there would be 216 elements in the 27 cells). The problem is that although most cells will have 8 elements, there would be some cells with more or less than that. We could use 256 threads per block and store all the required variables in the block to shared memory. This approach is MUCH more complicated than the global memory approach. It requires very complicated indexing and we run the risk of not having enough threads in a block to handle cases where there is more than 8 elements within each of the 27 cells. All of the parameters that do not change throughout the simulation (such as mass, material and thermal properties, etc.) are assigned to constant memory on the device side and are assigned as parameters on the host side. The Results and Benefits While developing the SPH code on the GPU we have performed a number of tests to see just how much faster the code is. In all cases, comparisons are with an equivalent commercial SPH (LS-DYNA in this case) code that runs on the CPU. The GPU used for the test cases is a GeForce Titan Black. One of the first tests that we look at is a simple heat transfer example. A block of aluminum is heated (500 C) on the left most face and the temperature distribution is calculated using the SPH code. This is a special case since the neighbor search only has to be performed once at the start of the simulation (elements do not move, so the neighbors do not change). Furthermore, we don t need to calculate conservation of mass and momentum. For this case, we were able to achieve speed ups of over 120x. But this is only a small part of the whole code.

SPEED UP Story For the next example, a fully coupled thermo-mechanical problem is tested. An aluminum cylinder is compressed to the point of plastic deformation (shown on the left), this causes the temperature in the cylinder to increase (energy is dissipated as heat). The aluminum is initially at 20 C, through the process of plastic deformation, the maximum temperature reaches 46 C. This is a good test case for the adaptive search algorithm and allows us to achieve speed ups of 36.8x. The cylinder compression problem allows us to use the total Lagrangian approach. In this method, the SPH neighbors are only calculated at the start of the simulation (similar to the heat transfer example). We are able to get a speed up of 46.9x using the total Lagrangian technique. Of course, the main goal of our research work is to simulate the FSW process. The case that we have tested is a bobbin tool weld. Details of the model setup can be found in Fraser et al. [6] (the model is run in LS-DYNA on the CPU in the publication). The numerical model gives excellent results in comparison with experimental data. We are able to predict temperature history (shown to the right), stresses and defects throughout the entire welding procedure. Using the GPU code, the FSW model achieves speed ups of 25x. A simulation that took 6 days before can now be run in under 6 hours. As far as success stories go, that is a complete success by just about any measure! Although we have not yet tried to the adaptive search method for the FSW simulation, we expect speed ups over 32x based on experience with other models. A graph summarizing the speed ups for the different models is shown below. Solid Mechanics SPH Code on the GPU - Performance 150 120 90 60 30 0 120 46.9 25 Thermal Model Cylinder Compression FSW Model

Acknowledgements The FSWRG would like to acknowledge the support of PGI for providing a license for the CUDA Fortran compiler and NVIDIA for providing the GeForce Titan Black GPU. Funding for the research work is provided by GRIPS, CURAL, REGAL, CQRDA and FQRNT. A special thank you to Mathew Colgrove at PGI for all his help and guidance. - Kirk Fraser, P.Eng UQAC, FSWRG, Chicoutimi Quebec Canada kirk.fraser1@uqac.ca https://www.researchgate.net/profile/kirk_fraser2 https://ca.linkedin.com/pub/kirk-fraser/b8/3a9/533 References [1] Liu GR, Liu MB. Smoothed particle hydrodynamics : a meshfree particle method. Hackensack, New Jersey: World Scientific; 2003. [2] Violeau D. Fluid Mechanics and the SPH Method. United Kingdom: Oxford University Press; 2014. [3] Hoover WG. Smooth Particle Applied Mechanics: The State of the Art (Advanced Series in Nonlinear Dynamics). Singapore: World Scientific Publishing; 2006. [4] Ruetsch G, Fatica M. CUDA Fortran for Scientists and Engineers. Waltham, MA, USA: Elsevier Inc.; 2014. [5] Fraser K. Adaptive smoothed particle hydrodynamics neighbor search algorithm for large plastic deformation computational solid mechanics. 13th International LS-DYNA Users Conference. Dearborn Michigan: LSTC; 2014. [6] Fraser K, St-Georges L, Kiss LI. Smoothed Particle Hydrodynamics Numerical Simulation of Bobbin Tool Friction Stir Welding. 10th International Friction Stir Welding Symposium. Beijing China: TWI; 2014.