User Guide for DualSPHysics code

Size: px

Start display at page:

Download "User Guide for DualSPHysics code"

Gloria Norton
5 years ago
Views:

User Guide for DualSPHysics code DualSPHysics_v2.

es) J.M. Dominguez (jmdominguez@uvigo.es) M.G.

es) B.D. Rogers (benedict.rogers@manchester.ac.

1 User Guide for DualSPHysics code DualSPHysics_v2.0 March 2012 A.J.C. Crespo J.M. Dominguez M.G. Gesteira A. Barreiro B.D. Rogers DualSPHysics is part of the SPHysics project: University of Vigo, The University of Manchester and Johns Hopkins University 1

2 2

3 Acknowledgements The development of DualSPHysics was partially supported by: Xunta de Galicia under project PGIDIT06PXIB383285PR. Xunta de Galicia under project Programa de Consolidación e Estructuración de Unidades de Investigación Competitivas (Grupos de Referencia Competitiva) and also financed by European Regional Development Fund (FEDER). ESPHI (An European Smooth Particle Hydrodynamics Initiative) project supported by the Commission of the European Communities (Marie Curie Actions, contract number MTKI-CT ). Research Councils UK (RCUK) Research Fellowship. EPSRC Grant EP/H003045/1. We want to acknowledge Orlando Garcia Feal for his technical help. 3

4 4

5 Abstract This guide documents the DualSPHysics code based on the Smoothed Particle Hydrodynamics model named SPHysics. This manuscript describes how to compile and run the DualSPHysics code (a set of C++ and CUDA files). New pre-processing tools are implemented to create more complex geometries and new post-processing tools are developed to analyse easily numerical results. Several working examples are documented to enable the user to use the codes and understand how they work. 5

6 6

7 Contents 1. Introduction 9 2. SPH formulation CPU and GPU implementation CPU optimizations GPU implementation GPU optimizations DualSPHysics open-source code CPU source files GPU source files Compiling DualSPHysics Windows compilation Linux compilation Running DualSPHysics Format Files Processing Pre-processing Post-processing Visualization of boundaries Visualization of all particles output data Analysis of numerical measurements Surface representation Testcases CaseDambreak CaseWavemaker CaseRealface CasePump CaseDambreak2D CaseWavemaker2D CaseFloating How to modify DualSPHysics for your application Creating new cases Source files FAQ DualSPHysics future References Licenses 81 7

8 8

9 1. Introduction Smoothed Particle Hydrodynamics is a Lagrangian meshless method that has been used in an expanding range of applications within the field of Computation Fluid Dynamics [Gómez-Gesteira et al. 2010a] where fluids present discontinuities in the flow, interact with structures, and exhibit large deformation with moving boundaries. The SPH model is approaching a mature stage with continuing improvements and modifications such that the accuracy, stability and reliability of the model are reaching an acceptable level for practical engineering applications. SPHysics is an open-source SPH model developed by researchers at the Johns Hopkins University (US), the University of Vigo (Spain), the University of Manchester (UK) and the University of Rome, La Sapienza. The software is available to free download at A complete guide of the FORTRAN code is found in [Gómez- Gesteira et al. 2010b]. The SPHysics Fortran code was validated for different problems of wave breaking [Dalrymple and Rogers, 2006], dam-break behaviour [Crespo et al. 2008], interaction with coastal structures [Gómez-Gesteira and Dalrymple, 2004] or with moving breakwater [Rogers et al. 2010]. Although SPHysics allows us to model problems using fine description, the main problem for the application to real engineering problems is the long computational runtime, meaning that SPHysics is rarely applied to large domains. Hardware acceleration and parallel computing are required to make SPHysics more useful and versatile. Graphics Processing Units (GPUs) appear as a cheap alternative to handle High Performance Computing (HPC) for numerical modelling. GPUs are designed to manage huge amounts of data and their computing power has developed in recent years much faster than conventional central processing units (CPUs). Compute Unified Device Architecture (CUDA) is a parallel programming framework and language for gpu computing using some extensions to the C/C++ language. Researchers and engineers of different fields are achieving high speedups implementing their codes with the CUDA language. Thus, the parallel power computing of GPUs can be also applied for SPH methods where the same loops for each particle along the simulation can be parallelised. The first work where a classical SPH approach was executed on the GPU belongs to Harada et al. [2007]. A remarkable element of their work was the implementation before the appearance of CUDA. The first GPU model based on the SPHysics 9

10 formulation was developed by Hérault et al. [2010] where they applied SPH to study free-surface flows. The code DualSPHysics has been developed starting from the SPH formulation implemented in SPHysics. This FORTRAN code is robust and reliable but is not properly optimised for huge simulations. DualSPHysics is implemented in C++ and CUDA language to carry out simulations on the CPU and GPU respectively. The new CPU code presents some advantages, such as more optimised use of the memory. The object-oriented programming paradigm provides means to develop a code that is easy to understand, maintain and modify with a sophisticated control of errors available. Furthermore, better approaches are implemented, for example particles are reordered to give faster access to memory, symmetry is considered in the force computation to reduce the number of particle interactions and the best approach to create the neighbour list is implemented [Dominguez et al. 2010]. The CUDA language manages the parallel execution of threads on the GPUs. The best approaches were considered to be implemented as an extension of the C++ code, so the best optimizations to parallelise particle interaction on GPU were implemented [Crespo et al. 2009]. Preliminary results were presented in Crespo et al. [2010] and the first rigorous validations were presented in Crespo et al. [2011]. DualSPHysics code has been developed to simulate real-life engineering problems using SPH models. In the following sections we will describe the SPH formulation available in DualSPHysics, the implementation and optimization techniques, how to compile and run the different codes of the DualSPHysics package and future developments. The user can download from the following files: - DUALSPHYSICS DOCUMENTATION: o DualSPHysics_v2.0_GUIDE.pdf o ExternalModelsConversion_GUIDE.pdf o GenCase_XML_GUIDE.pdf - DUALSPHYSICS PACKAGE: o DualSPHysics_v2.0_linux_32bit.zip o DualSPHysics_v2.0_linux_64bit.zip o DualSPHysics_v2.0_windows_32bit.zip o DualSPHysics_v2.0_windows_64bit.zip - DUALSPHYSICS SOURCE FILES: o DualSPHysics_v2.0_SourceFiles_linux.zip o DualSPHysics_v2.0_SourceFiles_windows.zip 10

11 2. SPH formulation All the SPH theory implemented in DualSPHysics is taken directly from the SPHysics code. Therefore a more detailed description, equations and references are collected in [Gómez-Gesteira et al. 2012a, 2012b]. Here we only summarise the SPH formulation already available on the new DualSPHysics code: Time integration scheme: - Verlet [Verlet, 1967]. - Symplectic [Leimkhuler, 1997]. Variable time step [Monaghan and Kos, 1999]. Kernel functions: - Cubic Spline kernel [Monaghan and Lattanzio, 1985]. - Quintic Wendland kernel [Wendland, 1995]. Kernel gradient correction [Bonet and Lok, 1999]. Shepard density filter [Panizzo, 2004]. Viscosity treatments: - Artificial viscosity [Monaghan, 1992]. - Laminar viscosity + SPS turbulence model [Dalrymple and Rogers, 2006]. Weakly compressible approach using Tait s equation of state [Monaghan et al., 1999]. Dynamic boundary conditions [Crespo et al. 2007]. Floating objects [Monaghan et al. 2003] (not implemented for Laminar+SPS viscosity and Kernel Gradient Correction). Features that will be integrated soon on the CPU-GPU solver as future improvements: Periodic open boundaries. SPH-ALE with Riemann Solver [Rogers et al. 2010]. Primitive-variable Riemann Solver. Variable particle resolution [Omidvar et al. 2012]. Multiphase (gas-solid-water). Inlet/outlet flow conditions. Modified virtual boundary conditions [Vacondio et al. 2011]. 11

12 12

13 3. CPU and GPU implementation The DualSPHysics code is the result of an optimised implementation using the best approaches for CPU and GPU with the accuracy, robustness and reliability shown by the SPHysics code. SPH simulations such as those in the SPHysics and DualSPHysics codes can be split in three main steps; (i) generation of the neighbour list, (ii) computation of the forces between particles (solving momentum and continuity equations) and (iii) the update of the physical quantities at the next time step. Thus, running a simulation means executing these steps in an iterative manner: 1 st STEP: Neighbour list (Cell-linked list described in Dominguez et al. [2010]): - Domain is divided in square cells of side 2h (or the size of the kernel domain). - Only a list of particles, ordered according to the cell they belong to, is generated. - All the physical variables of the particles are reordered. 2 nd STEP: Force computation: - Particles of the same cell and adjacent cells are candidates to be neighbours. - Each particle interacts with all its neighbouring particles (at a distance < 2h). 3 rd STEP: System Update: - New time step is computed. - Physical quantities are updated in the next step starting from the current magnitudes, the interaction forces and the new time step value. - Particle data are stored occasionally. 3.1 CPU optimizations The optimizations applied to the CPU implementation of DualSPHysics are: Applying symmetry to particle-particle interaction When the force, f ab, exerted by a particle, a, on a neighbour particle, b, is computed, the force exerted by the neighbouring particle on the first one can be known since it has the same magnitude but opposite direction (f ba =-f ab ). Thus, the number of interactions to be evaluated can be reduced by two, which decreases the computational time. Splitting the domain into smaller cells The domain is split into cells of size (2h 2h 2h) to reduce the neighbour search to only the adjacent cells. However, only about 30% of the particles of the adjacent cells are real neighbours (at a distance < 2h). A suitable technique to diminish the number of false neighbours would be to reduce the volume of the cell. Thus, the domain can be split into cells of size (h h h). 13

Figure 3-1. The volume searched using cells of side 2h (left panels) is bigger than using cells of side h (right panels). Note that symmetry is applied in the interaction.

14 Figure 3-1. The volume searched using cells of side 2h (left panels) is bigger than using cells of side h (right panels). Note that symmetry is applied in the interaction. Multi-core programming with OpenMP Current CPUs have several cores or processing units, so it is essential to distribute the computation load among them to maximize the CPU performance and to accelerate the SPH code. OpenMP is a portable and flexible programming interface whose implementation does not involve major changes in the code. Using OpenMP, multiple threads for a process can be easily created. These threads are distributed among all the cores of the CPU sharing the memory. So, there is no need to duplicate data or to transfer information between threads. 3.2 GPU Implementation A GPU implementation initially focused on the force computation since following Dominguez et al. [2010] this is the most consuming part in terms of runtime. However the most efficient technique consists of minimising the communications between the CPU and GPU for the data transfers. If neighbour list and system update are also implemented on the GPU only one CPU-GPU transfer is needed at the beginning of the simulation while relevant data will be transferred to the CPU when saving output data is required (usually infrequently). Crespo et al. [2011] used an execution of DualSPHysics performed entirely on the GPU to run a numerical experiment where the results are in close agreement with the experimental results. The GPU implementation presents some key differences in comparison to the CPU version. The main difference is the parallel execution of all tasks that can be parallelised such as all loops regarding particles. One GPU execution thread computes the resulting force of one particle performing all the interactions with its neighbours. The symmetry of the particle interaction is employed on the CPU reducing the runtime, but is not applied in the GPU implementation since it is not efficient due to memory coalescence issues. DualSPHysics is unique where the same application can be run using either the CPU or GPU implementation; this facilitates the use of the code not only on workstations with a Nvidia GPU but also on machines without a CUDA-enabled GPU. The main code has a 14

common core for both the CPU and GPU implementations with only minor source code differences implemented for the two devices applying the specific optimizations for CPU and GPU.

Figure 3-2 shows a flow diagram to represent the differences between the CPU and GPU implementations and the different steps involved in a complete execution. Figure 3-2.

15 common core for both the CPU and GPU implementations with only minor source code differences implemented for the two devices applying the specific optimizations for CPU and GPU. Thus, debugging or maintenance is easier and comparisons of results and computational time are more direct. Figure 3-2 shows a flow diagram to represent the differences between the CPU and GPU implementations and the different steps involved in a complete execution. Figure 3-2. Flow diagram of the CPU (left) and total GPU implementation (right). 3.3 GPU Optimizations An efficient and full use of all the capabilities of the GPU is not straightforward and on the other hand, the GPU implementation presents several limitations mainly due to the Lagrangian nature of the SPH method. Thus, the CPU-GPU transfers must be minimised, code divergence must be reduced, no coalescent memory accesses gives rise to less performance and no balanced workload must be avoided. Therefore the following optimizations were developed to avoid or minimize these problems: Full implementation on GPU To minimize the CPU-GPU data transfers, the entire SPH computation can be implemented on the GPU keeping data in the GPU memory (see right panel of Fig. 3-2). Therefore, the CPU-GPU communications are drastically reduced and only specific results will be recovered from GPU at some time steps. Moreover, when the other two main stages of the SPH method (neighbour list generation and system update) are implemented on GPU, the computational time devoted to these processes decreases. Maximizing the occupancy of GPU Since the access to the GPU global memory is irregular during the particle interaction, it is essential to have the largest number of active warps in order to hide the latencies of 15

16 memory access and keep the hardware as busy as possible. The number of active warps depends on the registers required for the CUDA kernel (defined in the file DualSPHysics_ptxasinfo), the GPU specifications and the number of threads per block. Using this optimization, the size of the block is adjusted according to the registers of the kernel and the hardware specifications to maximise the occupancy. Simplifying the neighbour search During the GPU execution of the interaction kernel, each thread has to look for the neighbours of its particle sweeping through the particles that belong to its own cell and to the surrounding cells, a total of 27 cells in 3-D since symmetry cannot be applied. However, this procedure can be optimized simplifying the neighbour search. This process of neighbour searching can be removed from the interaction kernel if the range of particles that could interact with the target particle is previously known. Since particles are reordered according to the cells and cells follow the order of X, Y and Z axis, the range of particles of three consecutive cells in the X-axis (cell x,y,z, cell x+1,y,z y cell x+2,y,z ) is equal to the range from the first particle of cell x,y,z to the last of cell x+2,y,z. Thus, the 27 cells can be defined as 9 ranges of particles. The interaction kernel is significantly simplified, when these ranges are known in advance. Thus, the memory accesses decrease and the number of divergent warps is reduced. Figure 3-3. Interaction cells using 9 ranges of three consecutive cells (right) instead of 27 cells (left). Note that symmetry is not applied in the GPU interaction. Division of the domain into smaller cells As described in the optimization applied in the CPU implementation, the procedure consists in dividing the domain into cells of size h instead of size 2h in order to increase the percentage of real neighbours. Using cells of size h on the GPU implementation, the number of pair-wise interactions decreases. 16

17 4. DualSPHysics open-source code This section reports a brief description of the source files of the SPH solver that have been released with DualSPHysics v2.0. The source code is freely redistributable under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation. Thus, users can download the files from DUALSPHYSICS SOURCE FILES. First, note that a more complete documentation is provided in directory DOXY. This documentation has been created using the documentation system Doxygen ( To navigate through the full documentation open the HTML file index.html as it is shown in Figure 4-1. Figure 4-1. Documentation for DualSPHysics code generated with Doxygen. 17

18 Open source files are in directory SOURCE and a complete list of the code files are summarised in Table 4-1. JException (.h.cpp) JLog (.h.cpp) JObject (.h.cpp) JPartData (.h.cpp) JPtxasInfo (.h.cpp) JSpaceCtes (.h.cpp) JSpaceEParms (.h.cpp) JSpaceParts (.h.cpp) JVarsAscii (.h.cpp) Common SPH on CPU SPH on GPU TypesDef.h main.cpp Functions.h JCfgRun (.h.cpp) JTimer.h JSph (.h.cpp) JTimerCuda.h JFormatFiles2.lib libjformatfiles2.a JSphMotion.lib libjsphmotion.a JXml.lib libjxml.a Types.h JSphCpu (.h.cpp) JSphTimersCpu.h JDivideCpu (.h.cpp) JSphGpu (.h.cpp) JSphTimersGpu.h CudaSphApi (.h.cu) CudaSphFC.cu CudaSphNL.cu CudaSphSU.cu Table 4-1. List of source files of DualSPHysics code. Common files are also used for other codes such as GenCase, BoundaryVTK, PartVTK, MeasureTool and IsoSurface. The rest of the files implements the SPH solver, some of them are used both for CPU/GPU executions and others are specific: COMMON FILES: JException.h & JException.cpp Declares/implements the class that defines exceptions with the information of the class and method. JLog.h & JLog.cpp Declares and implements the class that allows creating a log file with info of the execution. JObject.h & JObject.cpp Declares/implements the parent class that defines objects with methods that throws exceptions. JPartData.h & JPartData.cpp Declares/implements the class that allows reading/writing files with particles data in different formats. JPtxasInfo.h & JPtxasInfo.cpp Declares/implements the class that returns the number of registers of each cuda kernel. JSpaceCtes.h & JSpaceCtes.cpp Declares/implements the class that manages the info of constants from the input XML file. JSpaceEParms.h & JSpaceEParms.cpp Declares/implements the class that manages the info of execution parameters from the input XML file. JSpaceParts.h & JSpaceParts.cpp Declares/implements the class that manages the info of particles from the input XML file. JVarsAscii.h & JVarsAscii.cpp Declares/implements the class that reads variables from a text file in ASCII format. TypesDef.h Declares general types and functions for the entire application. Functions.h Declares basic/general functions for the entire application. JTimer.h Declares the class that defines a class to measure short time intervals. 18

19 JTimerCuda.h Declares the class that defines a class to measure short time intervals using cudaevent. JFormatFiles2.lib (libjformatfiles2.a) Precompiled library that provides functions to store particle data in formats VTK, CSV, ASCII. JSphMotion.lib (libjsphmotion.a) Precompiled library that provides the displacement of moving objects during a time interval. JXml.lib (libjxml.a) Precompiled library that class that helps to manage the XML document using library TinyXML. SPH SOLVER: main.cpp Main file of the project that executes the code on CPU or GPU. JCfgRun.h & JCfgRun.cpp Declares/implements the class that defines the class responsible of collecting the execution parameters by command line. JSph.h & JSph.cpp Declares/implements the class that defines all the attributes and functions that CPU and GPU simulations share. Types.h Defines specific types for the SPH application. SPH SOLVER ONLY FOR CPU EXECUTIONS: JSphCpu.h & JSphCpu.cpp Declares/implements the class that defines the attributes and functions used only in CPU simulations. JSphTimersCpu.h Measures time intervals during CPU execution. JDivideCpu.h & JDivideCpu.cpp Declares/implements the class that defines the class responsible of computing the Neighbour List. SPH SOLVER ONLY FOR GPU EXECUTIONS: JSphGpu.h & JSphGpu.cpp Declares/implements the class that defines the attributes and functions used only in GPU simulations. JSphTimersGpu.h Measures time intervals during GPU execution. CudaSphApi.h & CudaSphApi.cu Declares/implements all the functions used on GPU execution. CudaSphFC.cu Functions to compute forces on GPU. CudaSphNL.cu Functions to compute the neighbor list on GPU. CudaSphSU.cu Functions to update system on GPU. 19

20 4.1 CPU source files The source file JSphCpu.cpp can be better understood with the help of the outline represented in Figure 4-2: RUN Interaction_Forces PreInteraction_Forces AllocMemory DtVariable CallInteractionCells DivideConfig ComputeStep_Ver ComputeVerletVars SPSCalcTau LoadPartBegin ComputeRhop InitVars RunFloating PrintMemoryAlloc RunMotion RunDivideBoundary ComputeStepDivide RunDivideBoundary RunDivideFluid SaveData RunShepard RunDivideFluid MAIN LOOP SaveData RUN AllocMemory DivideConfig LoadPartBegin InitVars PrintMemoryAlloc RunDivideBoundary RunDivideFluid SaveData MAIN LOOP ComputeStep_Ver Interaction_Forces PreInteraction_Forces CallInteractionCells SPSCalcTau DtVariable ComputeVerletVars ComputeRhop RunFloating RunMotion ComputeStepDivide RunDivideBoundary RunDivideFluid RunShepard SaveData Starts simulation. Allocates memory of main data. Configuration for neighbour list. Loads PART to restart simulation. Initialization of arrays and variables for the execution. Prints out the allocated memory. Computes neighbour list of boundary particles. Computes neighbour list of fluid particles. Creates files with output data. Main loop of the simulation. Computes particle interaction and updates system using Verlet. Computes particle interaction. Prepares variables for particle interaction. Interaction between cells. Computes sub-particle stress tensor for SPS model. Computes a variable time step. Computes new values of position and velocity using Verlet. Computes new values of density. Processes movement of particles of floating objects. Processes movement of moving boundary particles. Computes new neighbour list. Computes neighbour list of boundary particles. Computes neighbour list of fluid particles. Applies Shepard density filter. Creates files with output data. Figure 4-2. Outline of JSphCpu.cpp when using Verlet time algorithm. 20

21 The previous outline corresponds to Verlet algorithm, but if Symplectic is used the step is split in predictor and corrector. Thus, Figure 4-3 shows the structure of the CPU code using this time scheme and correcting forces using kernel gradient correction (KGC): Interaction_Forces Interaction_MatrixKgc CallInteractionCells <INTER_MatrixKgc> DtVariable PreInteraction_Forces RUN Predictor ComputeVars CallInteractionCells <INTER_ForcesKgc> AllocMemory ComputeRhop SPSCalcTau DivideConfig RunFloating LoadPartBegin InitVars PrintMemoryAlloc ComputeStep_Sym Corrector RunDivideFluid Interaction_Forces Interaction_MatrixKgc PreInteraction_Forces CallInteractionCells <INTER_MatrixKgc> RunDivideBoundary RunDivideFluid RunMotion DtVariable ComputeVars CallInteractionCells <INTER_ForcesCorrKgc> SPSCalcTau SaveData MAIN LOOP ComputeStepDivide RunDivideBoundary RunDivideFluid ComputeRhopEpsilon RunFloating RunShepard CallInteractionCells <INTER_Shepard> SaveData ComputeStep_Sym PREDICTOR Interaction_Forces Interaction_MatrixKgc CallInteractionCells <INTER_MatrixKgc> PreInteraction_Forces CallInteractionCells <INTER_ForcesKgc> SPSCalcTau DtVariable ComputeVars ComputeRhop RunFloating CORRECTOR RunDivideFluid Interaction_Forces Interaction_MatrixKgc CallInteractionCells <INTER_MatrixKgc> PreInteraction_Forces CallInteractionCells <INTER_ForcesCorrKgc> SPSCalcTau DtVariable ComputeVars ComputeRhopEpsilon RunFloating Computes particle interaction and updates system with Symplectic. Predictor step. Computes particle interaction. Computes matrix for kernel gradient correction. Interaction between cells to compute elements of the matrix KGC. Prepares variables for particle interaction. Interaction between cells using corrected force in predictor. Computes sub-particle stress tensor for SPS model. Computes a variable time step. Computes new values of position and velocity Computes new values of density. Processes movement of particles of floating objects. Corrector step. Computes neighbour list of fluid particles. Computes particle interaction. Computes matrix for kernel gradient correction. Interaction between cells to compute elements of the matrix KGC. Prepares variables for particle interaction. Interaction between cells using corrected force in corrector. Computes sub-particle stress tensor for SPS model. Computes a variable time step. Computes new values of position and velocity Computes new values of density in corrector. Processes movement of particles of floating objects. Figure 4-3. Outline of JSphCpu.cpp when using Symplectic time algorithm and KGC. 21

22 Note that JSphCpu::CallInteractionCells is implemented using a template that is declared in the file JSphCpu.h, that calls the function InteractionCells: CallInteractionCells <INTER_ForcesKgc> <INTER_MatrixKgc> <INTER_ForcesCorrKgc> <INTER_Shepard> <INTER_Forces> <INTER_ForcesCorr> InteractionCells InteractionCells_Single InteractionCells_Dynamic InteractionCells_Static InteractCelij InteractSelf ComputeMatrixKgc ComputeForces ComputeForcesShepard Figure 4-4. Call graph for the function CallInteractionCells. The interaction between cells can be performed with a single execution thread (InteractionCells_Single), or using different threads thanks to the OpenMP implementation (InteractionCells_Dynamic & InteractionCells_Static). Furthermore, particles of a cell can interact with particles of the cell itself (InteractCelij) or with particles of the adjacent cells (InteractSelf). Thus, depending on the parameter <INTER_XXXX> the interaction between cells is carried out to compute different options as it can be seen in Table 4-2: CallInteractionCells<INTER_MatrixKgc> ->ComputeMatrixKgc CallInteractionCells<INTER_ForcesKgc> ->ComputeForces CallInteractionCells<INTER_ForcesCorrKgc> ->ComputeForces CallInteractionCells<INTER_Forces> ->ComputeForces CallInteractionCells<INTER_ForcesCorr> ->ComputeForces CallInteractionCells<INTER_Shepard> ->ComputeForcesShepard Interaction between cells to compute elements of matrix KGC. Interaction between cells using corrected force in predictor. Interaction between cells using corrected force in corrector. Interaction between cells without corrected force in predictor. Interaction between cells without corrected force in corrector. Interaction between cells for Shepard filter. Table 4-2. Different options can be computed during cells interactions on CPU executions. 22

23 As mentioned before, a more complete documentation was generated using Doxygen. For example the call graph of ComputeStep_Ver function can be seen in Figure 4-5: Figure 4-5. Call graph for the function ComputeStep_Ver in Doxygen. And the call graph of ComputeStepDivide function can be seen in Figure 4-6: Figure 4-6. Call graph for the function ComputeStepDivide in Doxygen. 23

24 4.2 GPU source files The source file JSphGpu.cpp can also be understood better with the outline represented in Figure 4-7 that includes the functions implemented in the CUDA files: RUN CsCallInteraction_Forces CsInteraction_Forces CsPreInteraction_Forces KerComputeForcesFluid KerPreInteraction_Forces KerComputeForcesFluidBox KerComputeForcesBound KerComputeForcesBoundBox CsInitCuda CsDtVariable KerCalcFa2 AllocMemory CsCallComputeStep KerComputeStepVerlet LoadPartBegin CsComputeStep_Ver CsRunFloating CsCalcRidpft KerCalcRidp InitVars KerCalcFtOmega PrintMemoryAlloc KerFtUpdate CsCallDivide CsDivide RunMotion CsCalcRidpmv KerCalcRidp SaveData CsMoveLinBound KerMoveLinBound MAIN LOOP GetVarsDevice CsCallComputeStepDivide CsDivide KerLimitZ ReduMaxF KerReduMaxF KerPreSort KerRhopOut KerSortData CsCallRunShepard CsRunShepard KerCellsBeg KerCellBegNv KerPreShepard KerComputeForcesShepard CudaSphApi.cu CudaSphNL.cu CudaSphFC.cu CudaSphSU.cu SaveData RUN CsInitCuda AllocMemory LoadPartBegin InitVars PrintMemoryAlloc CsCallDivide SaveData MAIN LOOP CsCallComputeStep RunMotion CsCallComputeStepDivide CsCallRunShepard SaveData GetVarsDevice Starts simulation. Initialisation of CUDA device. Memory allocation. DivideConfig is now called in AllocMemory since it depends on the allocated memory in GPU. Loads PART to restart simulation. Initialisation of arrays and variables for the execution. Prints out the allocated memory. Calls CsDivide() to compute the initial neighbour list. Creates files with output data. Main loop of the simulation. Calls CsComputeStep() depending on the time algorithm. Processes movement of moving boundary particles. Calls CsDivide() to computes the new Neighbour List. Calls CsRunShepard() to apply Shepard density filter. Creates files with output data. Recovers values of variables used in CUDA files. Figure 4-7. Outline of JSphGpu.cpp when using Verlet time algorithm. 24

25 The colour of the boxes indicates the CUDA file where the functions are implemented: CudaSphApi.cu implements all the functions used on GPU execution, and includes: CudaSphNL.cu implements all the functions to compute the neighbour list on GPU; CudaSphFC.cu implements all the functions to compute forces on GPU; CudaSphSU.cu implements all the functions to update system on GPU. And dashed boxes indicates that the functions inside are CUDA kernels. The following table summarises the calls of the CsCallComputeStep function: CsCallComputeStep CsComputeStep_Ver CsCallInteraction_Forces CsInteraction_Forces CsPreInteraction_Forces ->KerPreInteraction_Forces KerComputeForcesFluid ->KerComputeForcesFluidBox KerComputeForcesBound ->KerComputeForcesBoundBox CsDtVariable ->KerCalcFa2 KerComputeStepVerlet CsRunFloating CsCalcRidpft ->KerCalcRidp KerCalcFtOmega KerFtUpdate Computes particle interaction and updates system. Updates system using Verlet algorithm. Interaction between particles to compute forces. Prepares variables for force computation and its CUDA Kernel. CUDA Kernel to compute interaction between particles (Fluid-Fluid & Fluid-Bound). CUDA Kernel to compute interaction of a particle with a set of particles. CUDA Kernel to compute interaction between particles (Bound-Fluid). CUDA Kernel to compute interaction of a particle with a set of particles. Computes a variable DT. CUDA Kernel to compute Fa2. CUDA Kernel to updates pos, vel and rhop using Verlet. Processes movement of particles of floating objects. Computes position of particles of floating objects and its CUDA Kernel. CUDA Kernel to compute values for a floating body. CUDA Kernel to update particles of a floating body. Table 4-3. Outline of CsCallComputeStep when using Verlet time algorithm. Two different types of CUDA Kernels can be launched to compute interactions between Fluid-Fluid and Fluid-Bound on the GPU executions: - KerComputeForcesFluid: This kernel is launched in the basic configuration where artificial viscosity is used. - KerComputeForcesFullFluid: This kernel is used with Laminar+SPS viscosity treatment and/or when Kernel Gradient Correction is applied. This CUDA Kernel is less optimised and its performance is killed by the higher number of registers. 25

26 26

27 5. Compiling DualSPHysics The code can be compiled for either CPU executions or GPU executions (but not both simultaneously). All computations have been implemented both in C++ for CPU simulations and in CUDA for the GPU simulations. Most of the source code is common to CPU and GPU which allows the code to be run on workstations without a CUDAenabled GPU, using only the CPU implementation. To run DualSPHysics on GPU, only an Nvidia CUDA-enabled GPU card is needed and the last version of the driver must be installed. However to compile the source code the GPU programming language CUDA must be installed on your computer. CUDA Toolkit 4.0 can be downloaded from Nvidia website: Once the compiler nvcc has been installed in your machine, you can download the relevant files from the section DUALSPHYSICS SOURCE FILES: o DualSPHysics_v2.0_SourceFiles_linux.zip o DualSPHysics_v2.0_SourceFiles_windows.zip 5.1 Windows compilation Unzipping DualSPHysics_v2.0_SourceFiles_windows.zip the project file DualSPHysics_R80.sln is provided to be opened with Visual Studio Also different configurations can be chosen for compilation: a) Release for CPU and GPU b) ReleaseCPU only for CPU c) x64 / Win32 choose between two windows platforms The project is created including always the libraries for OpenMP in the executable. To not include them, user can modify Props config -> C/C++ -> Language -> OpenMp. The use of OpenMP can be also deactivated by commenting the code line in Types.h: #define USE_OPENMP ///<Enables/Disables OpenMP. In the.zip there are also several folders: -Source: contains all the source files; -Libs: precompiled libraries for x64 and Win32; -Doxy: including the settings for Doxygen and the generated documentation; -Run: will be created once the solution is built. 27

28 5.2 Linux compilation Unzipping DualSPHysics_v2.0_SourceFiles_linux.zip different Makefiles can be used to compile the code in linux: a) make ( f Makefile) full compilation just using make command b) make f Makefile_CPU only for CPU To choose your correct architecture you must modify the variable ARCH= 32 or 64 (for 32bit and 64bit respectively) in the Makefile. To include the OpenMP libraries you must modify the variable WITHOMP= 1 or 0 (if WITHOMP= 0 then code line in Types.h: #define USE_OPENMP must be commented) Figure 5-1. Content of the file Makefile for linux. Other compilation parameters for both Windows and Linux are: sm_10,compute_10 that compiles for GPUs with compute capability 1.0; sm_12,compute_12 that compiles for GPUs with compute capability 1.2; sm_20,compute_20 that compiles for GPUs with compute capability 2.0; -use_fast_math allows using mathematical functions with less accuracy but faster. By default, the output from compiling the CUDA code is stored in the file DualSPHysics_ ptxasinfo. Thus, any possible error in the compilation can be identified in this file. 28

29 6. Running DualSPHysics To start using DualSPHysics, users have to follow these instructions: 1) First, download and read the DUALSPHYSICS_DOCUMENTATION: - DualSPHysics_v2.0_GUIDE. This manuscript. - GenCase_XML_GUIDE: Helps to create a new case. - ExternalModelsConversion_GUIDE: Describes how to convert the file format of any external geometry of a 3D model to VTK, PLY or STL using open-source codes. 2) Download the DUALSPHYSICS_PACKAGE (See Figure 6-1): - EXECS - HELP - MOTION - RUN_DIRECTORY: o CASEDAMBREAK & CASEDAMBREAK2D o CASEWAVEMAKER & CASEWAVEMAKER2D o CASEREALFACE o CASEPUMP 3) The user has to choose the package depending on the operative system. There are versions for Windows/Linux and 32/64 bits of the following codes: - GenCase - DualSPHysics - BoundaryVTK - PartVTK - MeasureTool - IsoSurface 4) The appropriate batch BAT files in the directories CASES must be used depending on a CPU or GPU execution. 5) Paraview open-source software ( is recommended to be used to visualise the results. In Figure 8-1, the following directories can be observed: EXECS: - Contains all the executable codes. - Some libraries needed for the codes are also included. - The file DualSPHysics_ptxasinfo is used to optimise the block size for the different CUDA kernels on GPU executions. 29

30 HELP: - Contains CaseTemplate.xml, a XML example with all the different labels and formats that can be used in the input XML file. - The folder XML_examples includes more examples of the XML files used to run some of the simulations shown in - The information about the execution parameters of the different codes is presented in HELP_NameCode.out. MOTION: - Contains the bat file Motion.bat to perform the examples with the different type of movements that can be described with DualSPHysics. Eight examples can be carried out (Motion01.xml, Motion08.xml). - The text file motion08mov_f3.out describes the prescribed motion used in the eighth example. RUN_DIRECTORY: - It is the directory created to run the working cases where the output files will be written. DualSPHysics EXECS HELP MOTION GenCase DualSPHysics + dll s or lib s BoundaryVTK PartVTK MeasureTool DualSPHysics_ptxasinfo CaseTemplate.xml XML_examples HELP_GenCase.out, HELP_DualSPHysics.out, HELP_BoundaryVTK.out, HELP_PartVTK.out, HELP_MeasureTool.out Motion.bat Motion01.xml, Motion02.xml, Motion08.xml motion08mov_f3.out CASEDAMBREAK CaseDambreak.bat CaseDambreak_Def.xml FastCaseDambreak.bat FastCaseDambreak_Def.xml PointsVelocity.txt RUN_DIRECTORY CASEWAVEMAKER CASEREALFACE CaseWavemaker.bat CaseWavemaker_Def.xml FastCaseWavemaker.bat FastCaseWavemaker_Def.xml PointsHeights.txt CaseRealface.bat CaseRealface_Def.xml FastCaseRealface.bat FastCaseRealface_Def.xml face.vtk PointsPressure.txt 30 CASEPUMP Figure 6-1. Directory tree and provided files. CasePump.bat CasePump_Def.xml pump_fixed.vtk pump_moving.vtk

31 7. Format Files The codes provided with this guide present some improvements in comparison to the codes available within SPHysics. One of them is related to the format of the files that are used as input and output data along the execution of DualSPHysics and the preprocessing and post-processing codes. Three new formats are introduced now in comparison to SPHysics: XML, binary and VTK-binary. XML File The EXtensible Markup Language is textual data format compatible with any hardware and software. It is based on a set of labels to organise the information and they can be loaded or written easily using any standard text editor. This format will be used for the input files for the pre-processing code and the SPH solver. BINARY File The output data in the SPHysics code is written in text files, so ASCII format is used. ASCII files present some interesting advantages such as visibility and portability, however they also present important disadvantages: data stored in text format consumes at least six times more memory than the same data stored in binary format, precision is reduced when figures are converted from real numbers to text and read or write data in ASCII is more expensive (two orders of magnitude). Since DualSPHysics allows performing simulations with a high number of particles, a binary file format is necessary to avoid these problems. Binary format reduces the volume of the files and the time dedicated to generate them. The file format used in DualSPHysics is named BINX2 (.bi2). These files contain the minimum information of particle properties. For instance, some variables can be removed, e.g., the pressure is not stored since it can be calculated starting from the density using the equation of state. The mass values are constant for fluid particles and for boundaries so only two values are used instead of an array. The position of fixed boundary particles is only stored in the first file since they remain unchanged throughout the simulation. Data for particles that leave the limits of the domain are stored in an independent file which leads to an additional saving. 31

32 Advantages of BINX2: - Memory storage reduction: an important reduction in the file size is obtained using the binary format by saving only the values of the particles that change along the simulation. - Fast Access: read-write timing is decreased since binary conversion is faster than the ASCII one. - No precision lost. - Portability - Additional information: a header is stored at the beginning of each file with information about number of particles, mass value, the smoothing length, interparticle spacing, time VTK File This format is basically used for visualization. In the case of the tools for DualSPHysics, this format is used for the results; that is, not only the particle positions, but also physical magnitudes obtained numerically for the particles involved in the simulations. VTK supports many data types, such as scalar, vector, tensor, texture, and also supports different algorithms such as polygon reduction, mesh smoothing, cutting, contouring and Delaunay triangulation. Essentially, VTK consists of a header that describes the data and includes any other useful information, the dataset structure with the geometry and topology of the dataset and the dataset attributes. Here we use VTK of POLYDATA type with legacy-binary format that is easy for read-write operations. 32

33 8. Processing In this section and the two following ones, the running and execution of the different codes in the DualSPHysics package will be described. The main code which performs the SPH simulation is named DualSPHysics. The input files to run DualSPHysics code include one XML file (Case.xml) and a binary file (Case.bi2). Case.xml contains all the parameters of the system configuration and its execution such as key variables (smoothing length, reference density, gravity, coefficient to calculate pressure, speed of sound ), the number of particles in the system, movement definition of moving boundaries and properties of moving bodies. The binary file Case.bi2 contains the particle data; arrays of position, velocity and density and headers with mass values, time instant... The output files consist of binary format files with the particle information at different instants of the simulation (Part0000.bi2, Part0001.bi2, Part0002.bi2 ). The configuration of the execution is defined in the XML file, but it can be also defined or changed using execution parameters. Furthermore, new options and possibilities for the execution can be imposed using more execution parameters. Typing DualSPHysics.exe h in the command window will display a brief HELP manual: DualSPHysics ================================ Information about execution parameters: DualSPHysics [name_case [dir_out]] [options] Options: -h Shows information about parameters -opt <file> Load a file configuration -cpu Execution on Cpu (option by default) -gpu[:id] Execucion on Gpu and id of the device -ompthreads:<int> Only for Cpu execution, indicates the number of threads by host for parallel execution, it takes the number of cores of the device by default (or using zero value) -ompdynamic Parallel execution with symmetry in interaction and dynamic load balancing -ompstatic Parallel execution with symmetry in interaction and static load balancing -symplectic Symplectic algorithm as time step algorithm -verlet[:steps] Verlet algorithm as time step algorithm and number of time steps to switch equations -cubic Cubic spline kernel -wendland Wendland kernel -kgc:<0/1> Kernel Gradient Correction -viscoart:<float> Artifitical viscosity [0-1] -viscolamsps:<float> Laminar+SPS viscosity [order of 1E-6] -shepard:steps Shepard filter and number of steps to be applied -dbc:steps Hughes and Graham correction and number of steps to update the density of the boundaries 33

34 -sv:[formas,...] Specify the output formats. none No particles files are generated binx2 Bynary files (option by default) ascii ASCII files (PART_xxxx of SPHysics) vtk VTK files csv CSV files -svdt:<0/1> Generate file with information about the time step dt -svres:<0/1> Generate file that summarizes the execution process -svtimers:<0/1> Obtain timing for each individual process -name <string> Specify path and name of the case -runname <string> Specify name for case execution -dirout <dir> Specify the out directory -partbegin:begin[:first] dir Specify the beginning of the simulation starting from a given PART (begin) and located in the directory (dir), (first) indicates the number of the first PART to be generated -incz:<float> Allowed increase in Z+ direction -rhopout:min:max Exclude fluid particles out of these density limits -tmax:<float> -tout:<float> Maximum time of simulation Time between output files -cellmode:<mode> Specify the cell division mode. By default, the fastest approach is chosen hneigs fastest and the most expensive in memory (only for gpu) h intermediate 2h slowest and the least expensive in memory -ptxasfile <file> Indicate the file with information about the compilation of kernels in CUDA to adjust the size of the blocks depending on the needed registers for each kernel (only for gpu) Examples: DualSPHysics case out_case -sv:binx2,csv DualSPHysics case -gpu -svdt:1 This help provides information about the execution parameters that can be changed: time stepping algorithm specifying Symplectic or Verlet (-symplectic, -verlet[:steps]), choice of kernel function which can be Cubic or Wendland (-cubic, -wendland), whether kernel gradients are corrected or not (-kgc:<0/1>), the value for artificial viscosity (- viscoart:) or laminar+sps viscosity treatment (-viscolamsps:), activation of the Shepard density filter and how often must be applied it (-shepard:steps), Hughes and Graham [2010] correction and the number of steps to update the density of the boundaries (- dbc:steps), the maximum time of simulation and time intervals to save the output data (- tmax:, -tout:>). To run the code, it is also necessary to specify whether the simulation is going to run in CPU or GPU mode (-cpu, -gpu[:id]), the format of the output files (- sv:[formats,...], none, binx2, ascii, vtk, csv), whether to generate files with information of the time steps (-svdt:<0/1>), that summarises the execution process (-svres:<0/1>) with the computational time of each individual process (-svtimers:<0/1>), information about density (-svrhop:<0/1>). It is also possible to exclude particles out of limits of some given values of density (-rhopout:min:max). For CPU executions, a multi-core implementation using OpenMP enables executions in parallel using the different cores of the machine. It takes the maximum number of cores of the device by default or users can specify it (-ompthreads:<int>). In addition, the parallel execution can use dynamic (-ompdynamic) or static (-ompstatic) load balancing. On the other hand, for GPU executions, different cell divisions of the domain can be used (-cellmode:<mode>) that differ in memory usage and efficiency. 34

9. Pre-processing A new C++ code named GenCase has been implemented to define the initial configuration of the simulation, movement description of moving objects and the parameters of the execution

35 9. Pre-processing A new C++ code named GenCase has been implemented to define the initial configuration of the simulation, movement description of moving objects and the parameters of the execution in DualSPHysics. All this information is contained in an input file in XML format; Case_Def.xml. Two output files are created after running GenCase: Case.xml and Case.bi2 (the input files for DualSPHysics code). GenCase employs a 3D mesh to locate particles. The idea is to build any object using particles. These particles are created in the nodes of the 3D mesh. Firstly, the mesh nodes around the object are defined and then particles are created only in the nodes needed to draw the desired geometry. Figure 9-1 illustrates how this mesh is used, in this case a triangle is generated in 2D. First the nodes of a mesh are defined starting from the maximum dimensions of the desired triangle, then the three lines with the three points of the triangle are defined and finally particles in the available points under the three lines are created. Figure 9-1. Generation of a 2-D triangle formed by particles using GenCase. 35

36 Independent of the complexity of the case geometry, all particles will be placed with an equidistant spacing between them. The geometry of the case is defined independently on the inter-particle distance. This allows the number of particles to be varied while maintaining the size of the problem defining a different distance among particles. Furthermore, GenCase is able to generate millions of particles only in seconds on CPU. Very complex geometries are created with an easy implementation since a wide variety of commands (labels in the XML file) are used to create different objects; points, lines, triangles, quadrilateral, polygons, pyramids, prisms, boxes, beaches, spheres, ellipsoids, cylinders, waves (<drawpoint />, <drawpoints />, <drawline />, <drawlines />, <drawtriangle />, <drawquadri />, <drawtrianglesstrip />, <drawtrianglesfan />, <drawpolygon>, <drawtriangles />, <drawpyramid />, <drawprism />, <drawbox />, <drawbeach />, <drawsphere />, <drawellipsoid />, <drawcylinder />, <drawwave />). Once the mesh nodes that represent the desired object are selected, these points are stored as a matrix of nodes. The shape of the object can be transformed using a translation (<move />), a scaling (<scale />) or a rotation (<rotate />, <rotateline />). The main process is creating particles in the nodes, different type of particles can be created; a fluid particle (<setmkfluid />), a boundary particle (<setmkbound />) or none (<setmkvoid />). So, mk is the value used to mark a set of particles with a common feature in the simulation. Only the bound nodes of the object can be used to locate particles (<setdrawmode mode="face" />), the nodes inside the bounds (<setdrawmode mode="solid" />) or both (<setdrawmode mode="full" />). The set of fluid particles can be labelled with features or special behaviours (<initials />). For example, initial velocity (<velocity />) can be imposed for fluid particles or a solitary wave can be defined (<velwave />). Furthermore, particles can be defined as part of a floating object (<floatings />). Having defined the boundaries filling a region with particles is particularly useful working with complex geometries (<fillpoint />, <fillbox />, <fillfigure />, <fillprism />). In case of more complex geometries, objects can be imported from 3DS files (Figure 9-2) or CAD files (Figure 9-3). This allows combining the use of realistic geometries generated by 3D designing application with the drawing commands of GenCase. These files (3Ds or CAD) must be converted to STL format (<drawfilestl />), PLY format (<drawfileply />) or VTK format (<drawfilevtk />), formats that are easily loaded by GenCase. Any object in STL, PLY or VTK can be split in different triangles and any triangle can be converted in particles using the GenCase code. 36

37 Figure 9-2. Example of a 3D file that can be imported by GenCase and converted into particles. Figure 9-3. Example of a CAD file that can be imported by GenCase and converted into particles. 37

Thus, the directory MOTION includes: - Motion01: uniform rectilinear motion (<mvrect />) that also includes pauses (<wait />) - Motion02: combination of two uniform rectilinear motion (<mvrect />) -

38 Different kinds of movements can be imposed to a set of particles; linear, rotational, circular, sinusoidal... To help users to define movements, a directory with some examples is also included in the DualSPHysics package. Thus, the directory MOTION includes: - Motion01: uniform rectilinear motion (<mvrect />) that also includes pauses (<wait />) - Motion02: combination of two uniform rectilinear motion (<mvrect />) - Motion03: movement of an object depending on the movement of another (hierarchy of objects) - Motion04: accelerated rectilinear motion (<mvrectace />) - Motion05: rotational motion (<mvrot />). See Figure Motion06: accelerated rotation motion (<mvrotace />) and accelerated circular motion (<mvcirace />). See Figure Motion07: sinusoidal movement (<mvrectsinu />, <mvrotsinu />, <mvcirsinu />) - Motion08: prescribed movement with data from an external file (<mvpredef />) Figure 9-4. Example of rotational motion. Figure 9-5. Example of accelerated rotation motion and accelerated circular motion. 38

39 Typing GenCase.exe h in the command window generates a brief HELP manual: GenCase =========================== Execution parameters information: GenCase config_in config_out [options] Options: -h Shows information about the different parameters -template Generates the example file 'CaseTemplate.xml' -dp:<float> Distance between particles -save:<values> Indicates the format of output files +/-all: To choose or reject all options +/-binx2: Binary format for the initial configuration +/-vtkall: VTK with all particles of the initial configuration +/-vtkbound: VTK with boundary particles of the initial configuration +/-vtkfluid: VTK with fluid particles of the initial configuration (preselected formats: binx2) -debug:<int> Debug level (-1:Off, 0:Explicit, n:debug level) Examples: GenCase case_def case GenCase case case -dp:0.01 The user is invited to execute GenCase.exe template and read carefully the generated CaseTemplate.xml. The CaseTemplate.xml includes all the different labels with the corresponding format that can be used in the input XML file. It is readily available in the files to be downloaded. A more complete description of the code can be found at the PDF files; GenCase_XML_GUIDE.pdf and ExternalModelsConversion_GUIDE.pdf already available to be downloaded from the DualSPHysics website. GenCase code facilitates generating arbitrary complex geometries in a more user friendly way and make DualSPHysics useful to users to examine realistic cases. 39

40 40

41 10. Post-processing 10.1 Visualization of boundaries In order to visualise the boundary shapes formed by the boundary particles, different geometry files can be generated using the BoundaryVTK code. As input data, shapes can be loaded from a VTK file (-loadvtk), a PLY file (-loadply) or an STL file (-loadstl) while boundary movement can be imported from an XML file (- loadxml file.xml) using the timing of the simulation (-motiontime) or with the exact instants of output data (-motiondatatime). The movement of the boundaries can also be determined starting from the particle positions (-motiondata). The output files consist of VTK files (-savevtk), PLY files (-saveply) or STL files (-savestl) with the loaded information and the moving boundary positions at different instants. For example the output files can be named motion_0000.vtk, motion_0001.vtk, motion_0002.vtk... These files can be also generated by only a selected object defined by mk (-onlymk:), by the id of the particles (-onlyid:). Typing BoundaryVTK.exe h in the command window a brief HELP manual is available with the different parameters of execution: BoundaryVtk =============================== Information about parameters of execution: BoundaryVtk <options> Basic options: -h Shows information about parameters -opt <file> Load a file configuration Load shapes: -loadvtk <file.vtk> Load shapes of vtk files (PolyData) -onlymk:<values> Indicates the mk of the shapes to be loaded, affects to the previous -loadvtk and can not be used with -onlyid -onlyid:<values> Indicates the code of object of the shapes to be loaded, only affects to the previous -loadvtk -changemk:<value> Changes the mk of the loaded shapes to a given value, only affects to the previous -loadvtk -loadply:<mkvalue> <file.ply> Load shapes of ply files with a indicated mk -loadstl:<mkvalue> <file.stl> Load shapes of stl files with a indicated mk Load configuration for a mobile boundary: -filexml file.xml Load xml file with information of the movement and kind of particles -motiontime:<time>:<step> Configure the duration and time step to simulate the movement defined on the xml file, can not be used with -motiondatatime and -motiondata 41

42 -motiondatatime <dir> Indicates the directory where real times of the simulation are loaded for the mobile boundary can not be used with -motiontime and -motiondata -motiondata <dir> Indicates the directory where position of particles are loaded to generate the mobile boundary can not be used with -motiontime and -motiondatatime Define output files: -savevtk <file.vtk> Generates vtk(polydata) files with the loaded shapes -saveply <file.ply> Generates ply file with the loaded information -savestl <file.stl> Generates stl file with the loaded information -savevtkdata <file.vtk> Generates vtk(polydata) file with the loaded shapes including mk and shape code -onlymk:<values> Indicates the mk of the shapes to be stored, affects to the previous out option and can not be used with -onlyid -onlyid:<values> Indicates the code of the object of the shapes to be stored, affects to the previous out option Examples: BoundaryVtk -loadvtk bound.vtk -savevtk box.vtk -onlymk:10,12 BoundaryVtk -loadvtk bound.vtk -filexml case.xml -motiondata. -saveply motion.ply 10.2 Visualization of all particles output data The BoundaryVTK code allows creating triangles or planes to represent the boundaries. Conversely, the PartVTK code is used to tackle all particles (fixed boundaries, moving boundaries or fluid particles) and to work with the different physical quantities computed in DualSPHysics. Thus, the output files of DualSPHysics, i.e. the binary files (.bi2), are now the input files for the post-processing code PartVTK. Thus, it generates files in different formats of the selected particles. The output files can be VTK-binary (-savevtk), CSV (-savecsv) or ASCII (-saveascii). For example; PartVtkBin_0000.vtk, PartVtkBin_0001.vtk, PartVtkBin_0002.vtk... These files can be generated by only a selected set of particles defined by mk (-onlymk:), by the id of the particles (-onlyid:) or by the type of the particle (-onlytype:) so we can check or uncheck all the particles (+/-all), the boundaries (+/-bound), the fixed boundaries (+/-fixed), the moving boundaries (+/-moving), the floating bodies (+/-floating), the fluid particles (+/-fluid) or the excluded fluid particles during the simulation (+/-fluidout). The output files can contain different particle data (- vars:); all the magnitudes (+/-all), velocity (+/-vel), density (+/-rhop), pressure (+/-press), mass (+/-mass), acceleration (+/-ace), vorticity (+/-vor), the id of the particle (+/-id), the type (+/-type:) and mk (+/-mk). Acceleration values can be positive (AcePos) or negative (AceNeg) according to the direction of the axis. To compute acceleration of particles or vorticity, some parameters can be also defined, such as the kernel to compute the interactions (-acekercubic, -acekerwendland) the value of the viscosity (-aceviscoart:) or even the gravity (-acegravity:). 42

43 Typing PartVTK.exe h in the command window the HELP manual is visible with the available parameters of execution: PartVtk =========================== Information about parameters of execution: PartVtk <options> Basic options: -h Shows information about parameters -opt <file> Load configuration of a file Define input file : -dirin <dir> Directory with particle data -filein <file> File with particle data -filexml file.xml Load xml file with information of mk and type of particles, this is needed for the filter -onlymk and for the variable -vars:mk -first:<int> Indicates the first file to be computed -last:<int> Indicates the last file to be computed -jump:<int> Indicates the number of files to be skipped -threads:<int> Indicates the number of threads for parallel execution of the interpolation, it takes the number of cores of the device by default (or using zero value). Define parameters for acceleration or vorticity calculation: -acekercubic Use Cubic spline kernel -acekerwendland Use Wendland kernel -aceviscoart:<float> Artifitical viscosity [0-1] (only for acceleration) -acegravity:<float:float:float> Gravity value (only for acceleration) Define output file : -savevtk <file.vtk> Generates vtk(polydata) files with particles according to given filters with the options onlymk, onlyid and onlytype Generates CSV files to use with calculation sheets -savecsv <file.csv> -saveascii <file.asc> Generates ASCII files without headers Configuration for each output file: -onlymk:<values> Indicates the mk of selected particles -onlyid:<values> Indicates the id of selected particles -onlytype:<values> Indicates the type of selected particles +/-all: To choose or reject all options +/-bound: Boundary particles (fixed, moving and floating) +/-fixed: Boundary fixed particles +/-moving: Boundary moving particles +/-floating: Floating body particles +/-fluid: Fluid particles (no excluded) +/-fluidout: Excluded fluid particles (Preselected types: all) -vars:<values> Indicates the stored variables of each particle +/-all: To choose or reject all options +/-vel: Velocity +/-rhop: Density +/-press: Pressure +/-mass: Mass +/-id: Id of particle +/-type: Type (fixed,moving,floating,fluid,fluidout) +/-mk: Value of mk associated to the particles +/-ace: Acceleration +/-vor: Vorticity (Preselected variables: id,vel,rhop,type) Examples: PartVtk -savevtk partfluid.vtk -onlytype:-bound -savevtk partbound.vtk -onlytype:-all,bound -vars:+mk 43

44 10.3 Analysis of numerical measurements To compare experimental and numerical values, a tool to analyse these numerical measurements is needed. The MeasureTool code allows different physical quantities at a set of given points to be computed. The binary files (.bi2) generated by DualSPHysics are the input files of the MeasureTool code and the output files are again VTK-binary (-savevtk) or CSV (- savecsv) or ASCII (-saveascii). The numerical values are computed as an interpolation of the values of the neighbouring particles around a given position. The interpolation can be computed using different kernels (-kercubic, -kerwendland). Kernel correction is also applied when the summation of the kernel values around the position is higher than a value (-kclimit:) defining a dummy value if the correction is not applied (-kcdummy:). The positions where the interpolation is performed are given in a text file (-points <file>) and the distance of interpolation can be 2h (the size of the kernel) or can be changed (- distinter_2h:, -distinter:). The computation can only be carried out by a selected set of particles, so the same commands than PartVTK can be used (-onlymk:, -onlyid:, - onlytype:). Different interpolated variables (-vars) can be calculated numerically; all available ones (+/-all), velocity (+/-vel), density (+/-rhop), pressure (+/-press), mass (+/- mass), vorticity (+/-vor), acceleration (+/-ace), the id of the particle (+/-id), the type (+/- type:) and mk (+/-mk). The maximum water depth can be also computed. Height values (-height:) are calculated according to the interpolated mass, if the nodal mass is higher than a given reference mass, that Z-position will be considered as the maximum height. The reference value can be calculated in relation to the mass values of the selected particles (-height:0.5, half the mass by default) or can be given in an absolute way (- heightlimit:). Typing MeasureTool.exe h in the command window a brief HELP manual is available with the different parameters of execution: MeasureTool =============================== Information about parameters of execution: MeasureTool <options> Basic options: -h Shows information about parameters -opt <file> Load configuration of a file Define input file : -dirin <dir> Directory with particle data -filein <file> File with particle data -filexml file.xml Load xml file with information of mk and type of particles, this is needed for the filter -onlymk and for the variable -vars:mk -first:<int> Indicates the first file to be computed -last:<int> Indicates the last file to be computed -jump:<int> Indicates the number of files to be skipped -threads:<int> Indicates the number of threads for parallel execution of the interpolation, it takes the number of cores of the device by default (or using zero value). 44

45 Define parameters for acceleration or vorticity calculation: -acekercubic Use Cubic spline kernel -acekerwendland Use Wendland kernel -aceviscoart:<float> Artifitical viscosity [0-1] -acegravity:<float:float:float> Gravity value Set the filters that are applied to particles: -onlymk:<values> Indicates the mk of selected particles for interpolation -onlyid:<values> Indicates the id of selected particles -onlytype:<values> Indicates the type of selected particles +/-all: To choose or reject all options +/-bound: Boundary particles (fixed, moving and floating) +/-fixed: Boundary fixed particles +/-moving: Boundary moving particles +/-floating: Floating body particles +/-fluid: Fluid particles (no excluded) +/-fluidout: Excluded fluid particles (preselected types: all) Set the configuration of interpolation: -pointstemplate Generates an example file to be used with -points -points <file> Defines the points where interpolated data will be computed (each value separated by an space or a new line) -particlesmk:<values> Indicates the mk of selected particles for define the points where interpolated data will be computed -kercubic Use Cubic spline kernel for interpolation -kerwendland Use Wendland kernel for interpolation (by default) -kclimit:<float> Defines the minimum value of sum_wab_vol to apply the Kernel Correction (0.5 by default) -kcdummy:<float> Defines the dummy for the interpolated quantity if Kernel Correction is noy applied (0 by default) -kcusedummy:<0/1> Defines whether or not to use the dummy value (1 by default) -distinter_2h:<float> Defines the maximum distance for the interaction among particles depending on 2h (1 by default) -distinter:<float> Defines the maximum distance for the interaction among particles in an absolute way. Set the values to be calculated: -vars[:<values>] Defines the variables or magnitudes that are going to be computed as an interpolation of the selected particles around a given position +/-all: To choose or reject all options +/-vel: Velocity +/-rhop: Density +/-press: Pressure +/-mass: Mass +/-id: Id of particles +/-type: Type (fixed,moving,floating,fluid,fluidout) +/-mk: Value of mk associated to the particles +/-ace: Acceleration +/-vor: Vorticity (preselected values: vel,rhop,id) -height[:<float>] Height value is calculated starting from mass values for each point x,y The reference mass to obtain the height is calculated according to the mass values of the selected particles; 0.5 by default(half the mass) -heightlimit:<float> The same than -height but the reference mass is given in an absolute way. 45

asc> Generates one ASCII file without headers with the time history of the obtained values Examples: MeasureTool -points fpoints.txt -onlytype:-all,fluid -savecsv dataf MeasureTool -points fpoints.

46 Define output files format: -savevtk <file.vtk> Generates vtk(polydata) with the given interpolation points -savecsv <file.csv> Generates one CSV file with the time history of the obtained values -saveascii <file.asc> Generates one ASCII file without headers with the time history of the obtained values Examples: MeasureTool -points fpoints.txt -onlytype:-all,fluid -savecsv dataf MeasureTool -points fpoints.txt -vars:press -savevtk visupress 10.4 Surface representation Using a large number of particles, the visualization of the simulation can be improved by representing surfaces instead of particles. To create the surfaces, the marching cubes algorithm is used [Lorensen and Cline, 1987]. This computer graphics technique extracts a polygonal mesh (set of triangles) of an isosurface from a 3D scalar field. Figure 10-1, represents a 3D dam-break simulation using 300,000 particles. The first snapshot shows the particle representation. Values of mass are interpolated at the points of 3D mesh that covers the entire domain. Thus, if a value of mass is chosen, a polygonal mesh is created where the vertexes of the triangles has the same value of mass. The triangles of this isosurface of mass are represented in the second frame of the figure. The last snapshots correspond to the surface representation where the colour corresponds to the interpolated velocity value of the triangles. Figure Conversion of points to surfaces. 46

47 The output binary files of DualSPHysics are the input files of the IsoSurface code and the output files are VTK files (-saveiso[:<var>]) with the isosurfaces calculated using a variable (<var>) or can be structured points (-savegrid) with data obtained after the interpolation. The interpolation of the values at the nodes of the scalar field can be performed using different distances. Thus, the distance between the nodes can be defined depending on dp (-distnode_dp:) or in an absolute way (-distnode:). On the other hand, the maximum distance for the interaction between particles to interpolate values on the nodes can be also defined depending on 2h (-distinter_dp:) or in an absolute way (- distinter:). The limits of mesh can be calculated starting from particle data of all part files (-domain_static) or adjusted to the selected particles of each part file (- domain_dynamic). IsoSurface ============================= Information about parameters of execution: IsoSurface <options> Basic options: -h Shows information about parameters -opt <file> Load configuration of a file Define input files: -dirin <dir> Directory with particle data -filein <file> File with particle data -filexml file.xml Load xml file with information of mk and type of particles, this is needed for the filter -onlymk and for the variable -vars:mk -dirgridin <filesname> Directory and name of the VTK files with data of the nodes interpolated starting from the particles (structed_points). -filegridin <file> VTK file with data of the nodes interpolated starting from the particles (structed_points). -first:<int> Indicates the first file to be computed -last:<int> Indicates the last file to be computed -jump:<int> Indicates the number of files to be skipped Define parameters for acceleration or vorticity calculation: -acekercubic Use Cubic spline kernel -acekerwendland Use Wendland kernel -aceviscoart:<float> Artifitical viscosity [0-1] -acegravity:<float:float:float> Gravity value Set the filters that are applied to particles: -onlymk:<values> Indicates the mk of selected particles for interpolation -onlyid:<values> Indicates the id of selected particles -onlytype:<values> Indicates the type of selected particles +/-all: To choose or reject all options +/-bound: Boundary particles (fixed, moving and floating) +/-fixed: Boundary fixed particles +/-moving: Boundary moving particles +/-floating: Floating body particles +/-fluid: Fluid particles (no excluded) +/-fluidout: Excluded fluid particles (preselected types: fluid) 47

48 Set the configuration of interpolation: -domain_static_all Mesh limits are calculated starting from particle data selected in all part files -domain_static_limits:xmin:ymin:zmin:xmax:ymax:zmax Mesh limits are given. -domain_dynamic Mesh limits are adjusted to the selected particles of each part file (option by default) -domain_dynamic_limits:xmin:ymin:zmin:xmax:ymax:zmax Mesh limits are adjusted to the selected particles within the given limits. -kercubic -kerwendland -kclimit:<float> -kcdummy:<float> -kcusedummy:<0/1> Use Cubic spline kernel for interpolation Use Wendland kernel for interpolation (by default) Defines the minimum value of sum_wab_vol to apply the Kernel Correction (deactivated by default) Defines the dummy for the interpolated quantity if Kernel Correction is noy applied (0 by default) Defines whether or not to use the dummy value (1 by default) -distinter_2h:<float> Defines the maximum distance for the interaction among particles depending on 2h (1 by default) -distinter:<float> Defines the maximum distance for the interaction among particles in an absolute way. -distnode_dp:<float> Defines the distance between nodes by multiplying dp and the given value (option by default) -distnode:<float> Distance between nodes is given -threads:<int> Indicates the number of threads for parallel execution of the interpolation, it takes the number of cores of the device by default (or using zero value). Set the values to be calculated: -vars[:<values>] Defines the variables or magnitudes that are going to be computed as an interpolation of the selected particles around a given position +/-all: To choose or reject all options +/-vel: Velocity +/-rhop: Density +/-press: Pressure +/-mass: Mass +/-id: Id of particles +/-type: Type (fixed,moving,floating,fluid,fluidout) +/-mk: Value of mk associated to the particles +/-ace: Acceleration +/-vor: Vorticity (preselected values: vel,rhop,mass,id) Define output files format: -savegrid <file.vtk> Generates VTK files (structed_points) with data obtained after interpolation -saveiso[:<values>[:<var>]] <file.vtk> Generates VTK files (polydata) with the isosuface calculated starting from a variable (mass, rhop or press) and using several limit values. The variable by default is mass. When limit values are not given, the intermediate value of the first data file is considered Examples: IsoSurface -onlytype:-all,fluid -domain_static_all -savegrid filegrid IsoSurface -saveiso:0.04,0.045,0.05 fileiso 48

water within a storage tank (d). Figure 11-1. Test cases: dam break (a), wave maker (b), human face (c) and pump (d). The different test cases are described in the following sections.

49 11. Testcases Some demonstration cases are included with the DualSPHysics package (11.1); a dam break interacting with an obstacle in a tank (a), a wave maker in a beach (b), the real geometry of a human face interacting with impinging fluid (c) and a pump mechanism that translates water within a storage tank (d). Figure Test cases: dam break (a), wave maker (b), human face (c) and pump (d). The different test cases are described in the following sections. The BAT files are presented and the different command lines to execute the different programs are described. The XML files are also explained in detail to address the execution parameters and the different labels to plot boundaries, fluid volumes, movements, etc. For each Case.bat there are 4 options; -Case_win_CPU.bat (windows on CPU) -Case_win_GPU.bat (windows on GPU) -Case_linux_CPU.bat (linux on CPU) -Case_linux_GPU.bat (linux on GPU) Two different versions of some test cases were performed, a full version case and a faster version with less particles. Since many 3-D applications can be reduced to a 2-D problem, the new executable of DualSPHysics can now perform 2-D simulations, however this options is not fully optimized. Thus, cases for 2-D simulations are also provided (CASEDAMBREAK2D and CASEWAVEMAKER2D). A 2-D simulation can be achieved easily by imposing the same values along Y-direction (<pointmin> = <pointmax> in the input XML file). Thus, the initial configuration is a 2-D system and DualSPHysics will be automatically detected that the simulation must be carried out using the 2-D formulation. 49

50 The following tables summarise the computational runtimes and the performance of DualSPHysics carrying out the different testcases on a 64-bit LINUX system with the CPU Intel Xeon E5620 with 2.4GHz and the GPU GTX 480 at 1.40 GHz. Better speedups are obtained increasing the number of particles. This performance also depends on the percentage of fixed boundaries, moving boundaries and fluid particles. Table 11-1 summarises computational times of using Intel Xeon E5620 with one core and using 8 threads. DualSPHysics v2.0 np Runtime (min) CPU single-core Runtime (min) CPU multi-core Speed-up FastCaseDambreak 40, FastCaseWavemaker 20, FastCaseRealface 102, CaseDambreak 342, CaseWavemaker 202, CaseRealface 1,010, CasePump 1,009,607 >27 days >7.3 CaseDambreak2D 20, CaseWavemaker2D 19, CaseFloating 226, Table Total runtimes (mins) and CPU performance of DualSPHysics v2.0. Table 11-2 summarises computational times of using Intel Xeon E5620 and GTX 480. DualSPHysics v2.0 np Runtime (min) CPU single-core Runtime (min) GPU Speed-up FastCaseDambreak 40, FastCaseWavemaker 20, FastCaseRealface 102, CaseDambreak 342, CaseWavemaker 202, CaseRealface 1,010, CasePump 1,009,607 >27 days >68 CaseDambreak2D 20, CaseWavemaker2D 19, CaseFloating 226, Table Total runtimes (mins) and CPU/GPU performance of DualSPHysics v

51 11.1 CASEDAMBREAK The first test case consists of a dam-break flow impacting on a structure inside a numerical tank. No moving boundary particles are involved in this simulation and no movement is defined. CaseDambreak.bat file summarises the tasks to be carried out using the different codes in a given order: CaseDambreak.bat GenCase.exe CaseDambreak_Def CaseDambreak_out/CaseDambreak DualSPHysics.exe CaseDambreak_out/CaseDambreak svres gpu PartVTK.exe dirin CaseDambreak_out savevtk CaseDambreak_out/PartFluid -onlytype:-all,+fluid MeasureTool.exe dirin CaseDambreak_out -points PointsVelocity.txt -onlytype:-all,+fluid -vars:-all,+vel -savevtk CaseDambreak_out/PointsVelocity -savecsv CaseDambreak_out/PointsVelocity Pre-processing GenCase.exe CaseDambreak_Def CaseDambreak_out/CaseDambreak The CaseDambreak.xml is described in detail: Different constants are defined: lattice=staggered grid (2) to locate particles gravity=gravity acceleration cflnumber=coefficient in the Courant condition hswl=maximum still water level, automatically calculated with TRUE coefsound=coefficient needed to compute the speed of sound coefficient=coefficient needed to compute the smoothing length ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS dp=distance between particles WHEN CHANGING THIS PARAMETER, THE TOTAL NUMBER OF PARTICLES IS MODIFIED x,y and z values are used to defined the limits of the domain Volume of fluid: setmkfluid mk=0, solid to plot particles under the volume limits drawbox to plot a rectangular box defining a corner and its size in the three directions Boundary Tank: setmkbound mk=0, face to plot particles only in the sides of the defined box where the top is removed To create CaseDambreak_Box_Dp.vtk setmkvoid is used to remove particles in the given box, to create the hole to place then the structure 51

Boundary Building: setmkbound mk=1, face to plot particles only in the sides of the defined box where the bottom is removed To create CaseDambreak_Building_Dp.

52 Boundary Building: setmkbound mk=1, face to plot particles only in the sides of the defined box where the bottom is removed To create CaseDambreak_Building_Dp.vtk Parameters of configuration: Kind of step algorithm, kernel, viscosity, density filter, boundaries correction, simulation timings Final number of particles: np=total number of particles nb=boundary particles nbf=fixed boundary particles and final mk of the objects Summary of the computed constants PARAVIEW is used to visualise all the VTK files. In this case, two VTK files can be opened with it; CaseDambreak_Box_Dp.vtk and CaseDambreak_Building_Dp.vtk. Figure VTK files generated during the pre-processing for CaseDambreak. 52

53 Processing (choices depending on device cpu or gpu) DualSPHysics.exe CaseDambreak_out/CaseDambreak svres cpu DualSPHysics.exe CaseDambreak_out/CaseDambreak svres gpu TABLE 11-3 summarises the different computational times after running the test case on CPU (single-core and 8 threads) and GPU: np CPU single-core CPU multi-core GPU Speed-up (CPU vs. GPU) FastCaseDambreak 40, CaseDambreak 342, Table Total runtimes (mins) and CPU/GPU performance for CaseDambreak. Post-Processing PartVTK.exe dirin CaseDambreak_out savevtk CaseDambreak_out/PartFluid -onlytype:-all,+fluid Different instants of the CaseDambreak simulation are depicted in Figure 11-3 where the PartFluid_xxxx.vtk of the fluid particles are represented. Figure Instants of the CaseDambreak simulation. Colour represents velocity of the particles. 53

54 Vx (m/s) Post-processing MeasureTool.exe dirin CaseDambreak_out -points PointsVelocity.txt -onlytype:-all,+fluid -vars:-all,+vel -savevtk CaseDambreak_out/PointsVelocity -savecsv CaseDambreak_out/PointsVelocity The file PointsVelocity.txt contains a list of points where numerical velocity can be computed using the post-processing codes. Eight positions from (0.754,0.31,0.0) m to (0.754,0.31,0.16) m with dz = 0.02 m are used as control points. The generated PointsVelocity_xxxx.vtk files can be visualised to study the velocity behaviour along the column as Figure 11-4 shows. Figure Numerical velocity computed at the given points. On the other hand, the obtained PointsVelocity_Vel.csv collects all the numerical values of the velocity in three directions. The x-component of the velocity is depicted in Figure 11-5 at different numerical heights. Numerical Velocity 2.5 z=0 z=0.02 z=0.04 z= t (s) Figure Numerical velocity at different experimental points. 54

55 11.2 CASEWAVEMAKER CaseWavemaker simulates several waves breaking on a numerical beach. A wavemaker is performed to generate and propagate the waves. In this testcase, a sinusoidal movement is imposed to the boundary particles of the wavemaker. CaseWavemaker.bat file describes the execution task to be carried out in this directory: CaseWavemaker.bat GenCase.exe CaseWavemaker_Def CaseWavemaker_out/CaseWavemaker DualSPHysics.exe CaseWavemaker_out/CaseWavemaker svres gpu BoundaryVTK.exe -loadvtk CaseWavemaker_out/CaseWavemaker Real.vtk -filexml CaseWavemaker_out/CaseWavemaker.xml -motiondatatime CaseWavemaker_out -savevtkdata CaseWavemaker_out/MotionPiston -onlymk:21 -savevtkdata CaseWavemaker_out/Box.vtk -onlymk:11 PartVTK.exe dirin CaseWavemaker_out savevtk CaseWavemaker_out/PartFluid -onlytype:-all,+fluid IsoSurface.exe dirin CaseWavemaker_out saveiso CaseWavemaker_out/Surface -onlytype:-all,+fluid Pre-processing GenCase.exe CaseWavemaker_Def CaseWavemaker_out/CaseWavemaker The CaseWavemaker.xml is now described in detail: Different constants are defined: lattice= staggered grid (2) for boundaries and cubic grid (1) for fluid particles gravity=gravity acceleration cflnumber=coefficient in the Courant condition hswl=maximum still water level, automatically calculated with TRUE coefsound=coefficient needed to compute the speed of sound coefficient=coefficient needed to compute the smoothing length ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS To create particles in the order that grows the x-direction, then y-direction and finally z-direction. This can be changed to plot ID dp=distance between particles WHEN CHANGING THIS PARAMETER, THE TOTAL NUMBER OF PARTICLES IS MODIFIED x,y and z values are used to defined the limits of the domain Volume of fluid: setmkfluid mk=0, full to plot particles under the volume limits and the faces drawprism to plot a figure that mimics a beach setmkvoid is used to remove particles, to define the maximum water level at z=0.75m since all particles above are removed 55

Piston Wavemaker: setmkbound mk=10, face to plot particles only in the sides of the defined box that formed the wavemaker Boundary Tank: setmkbound mk=0, drawprism to plot the beach face to plot only

56 Piston Wavemaker: setmkbound mk=10, face to plot particles only in the sides of the defined box that formed the wavemaker Boundary Tank: setmkbound mk=0, drawprism to plot the beach face to plot only in the sides, except face with mask 96 boundary particles will replace the fluid ones in the sides of the beach To create CaseWavemaker Dp.vtk and CaseWavemaker Real.vtk Piston movement: Particles associated to mk=10 move following a sinusoidal movement The movement is the combination of three different movements with different frequencies, amplitudes and phases The duration and the order of the moves are also indicated Parameters of configuration Final number of particles: np=total number of particles nb=boundary particles nbf=fixed boundary particles nb-nbf=moving boundaries and final mk of the objects Summary of the computed constants Summary of the movement 56

57 Processing (choices depending on device cpu or gpu) DualSPHysics.exe CaseWavemaker_out/CaseWavemaker svres cpu DualSPHysics.exe CaseWavemaker_out/CaseWavemaker svres gpu TABLE 11-4 summarises the different computational times after running the test case on CPU (single-core and 8 threads) and GPU: np CPU single-core CPU multi-core GPU Speed-up (CPU vs. GPU) FastCaseWavemaker 20, CaseWavemaker 202, Table Total runtimes (mins) and CPU/GPU performance for CaseWavemaker. Post-processing BoundaryVTK.exe -loadvtk CaseWavemaker_out/ CaseWavemaker Real.vtk -filexml CaseWavemaker_out/CaseWavemaker.xml -motiondatatime CaseWavemaker_out -savevtkdata CaseWavemaker_out/MotionPiston -onlymk:21 -savevtkdata CaseWavemaker_out/Box.vtk -onlymk:11 IsoSurface.exe dirin CaseWavemaker_out saveiso CaseWavemaker_out/Surface -onlytype:-all,+fluid The files MotionPiston_xxxx.vtk with the shape of the wavemaker at different instants and Surface_xxxx.vtk with the surface representation are depicted in Figure 11-6: Figure Different instants of the CaseWavemaker simulation. Colour represents velocity. 57

11.3 CASEREALFACE The third test case combines the feature of defining an initial movement for the fluid particles with importing a real geometry from a 3d studio file (face.vtk). CaseRealface.

58 11.3 CASEREALFACE The third test case combines the feature of defining an initial movement for the fluid particles with importing a real geometry from a 3d studio file (face.vtk). CaseRealface.bat GenCase.exe CaseRealface_Def CaseRealface_out/CaseRealface DualSPHysics.exe CaseRealface_out/CaseRealface svres gpu PartVTK.exe dirin CaseRealface_out savevtk CaseRealface_out/PartFluid -onlytype:-all,+fluid MeasureTool.exe dirin CaseRealface_out -points PointsPressure.txt -onlytype:-all,+bound -vars:-all,+press -savevtk CaseRealface_out/PointsPressure -savecsv CaseRealface_out/PointsPressure Pre-processing GenCase.exe CaseRealface_Def CaseRealface_out/CaseRealface The CaseRealface.xml is here described in detail: Different constants are defined: lattice=cubic grid to locate particles gravity=gravity acceleration cflnumber=coefficient in the Courant condition hswl=maximum still water level, automatically calculated with TRUE coefsound=coefficient needed to compute the speed of sound coefficient=coefficient needed to compute the smoothing length ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS To create particles in the order that grows the x-direction, then y-direction and finally z-direction. This can be changed to plot ID dp=distance between particles WHEN CHANGING THIS PARAMETER, THE TOTAL NUMBER OF PARTICLES IS MODIFIED x,y and z values are used to defined the limits of the domain Real Geometry: Load face.vtk and conversion to boundaries with mk=0 The eyes of the face are created as boundary particles to prevent particle penetration through the eye holes. Firstly, two spheres are created using boundary particles and then two new spheres with a displaced center are created using void to remove particles. The 3D matrix used to locate particles is here rotated to create the next object with a different orientation. Volume of fluid: setmkfluid mk=0, full to plot particles under the volume limits and the faces drawellipsoid to plot an ellipsoid of fluid particles Two boundary points to fix the limits of the domain. To create CaseRealface Dp.vtk and CaseRealface Real.vtk 58

59 Fluid particles associated to mk=0 will move with an initial velocity Parameters of configuration Final number of particles: np=total number of particles nb=boundary particles nbf=fixed boundary particles and final mk of the objects Summary of the computed constants Processing (choices depending on device cpu or gpu) DualSPHysics.exe CaseRealface_out/CaseRealface svres cpu DualSPHysics.exe CaseRealface_out/CaseRealface svres gpu TABLE 11-5 summarises the different computational times after running the test case on CPU (single-core and 8 threads) and GPU: np CPU single-core CPU multi-core GPU Speed-up (CPU vs. GPU) FastCaseRealface 102, CaseRealface 1,010, Table Total runtimes (mins) and CPU/GPU performance for CaseRealface. 59

at different instants of the simulation.

60 Post-processing PartVTK.exe dirin CaseRealface_out savevtk CaseRealface_out/PartFluid -onlytype:-all,+fluid In Figure 11-7, face.vtk and PartFluid_xxxx.vtk of the fluid particles are represented at different instants of the simulation. Figure Different instants of the CaseRealface simulation. Colour represents density of the particles. 60

61 Post-processing MeasureTool.exe dirin CaseRealface_out -points PointsPressure.txt -onlytype:-all,+bound -vars:-all,+press -savevtk CaseRealface_out/PointsPressure -savecsv CaseRealface_out/PointsPressure The position of the numerical point to compute the numerical pressure starting from the pressures of the boundary particles is provided in the file PointsPressure.txt. The numerical position is also represented in the left panel of Figure Time history of numerical pressure values are depicted in the right panel of the figure Pressure (Pa) time (s) Figure Numerical pressure stored in PointsPressure.csv 61

62 11.4 CASEPUMP The last 3D testcase loads an external model of a pump with a fixed (pump_fixed.vtk) and a moving part (pump_moving.vtk). The moving part describes a rotational movement and the reservoir is pre-filled with fluid particles. CasePump.bat GenCase.exe CasePump_Def CasePump_out/CasePump DualSPHysics.exe CasePump_out/CasePump svres gpu BoundaryVTK.exe -loadvtk CasePump_out/ CasePump Real.vtk -loadxml CasePump_out/CasePump.xml -motiondatatime CasePump_out -savevtkdata CasePump_out/MotionPump onlymk:13 -savevtkdata CasePump_out/Pumpfixed.vtk onlymk:11 PartVTK.exe dirin CasePump_out savevtk CasePump_out/PartFluid -onlytype:-all,+fluid Pre-processing GenCase.exe CasePump_Def CasePump_out/CasePump The CasePump.xml is here described in detail: Different constants are defined: lattice=cubic grid to locate particles gravity=gravity acceleration cflnumber=coefficient in the Courant condition hswl=maximum still water level, automatically calculated with TRUE coefsound=coefficient needed to compute the speed of sound coefficient=coefficient needed to compute the smoothing length ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS dp=distance between particles WHEN CHANGING THIS PARAMETER, THE TOTAL NUMBER OF PARTICLES IS MODIFIED x,y and z values are used to defined the limits of the domain Real Geometry: Load pump_fixed.vtk and conversion to boundaries with mk=0. Load pump_moving.vtk and conversion to boundaries with mk=2 Filling a region with fluid. Fluid particles are created while points have type void, starting from the seed point and within the limits of a box. To create CasePump Dp.vtk and CasePump Real.vtk 62

63 Pump movement: Particles associated to mk=2 move with an accelerated rotational movement during 2 seconds and then a uniform rotation Parameters of configuration Summary of the computed constants Final number of particles: np=total number of particles nb=boundary particles nbf=fixed boundary particles nb-nbf=moving boundaries and final mk of the objects Summary of the movement Processing Choices depending on Device (cpu or gpu) DualSPHysics.exe CasePump_out/CasePump svres cpu DualSPHysics.exe CasePump_out/CasePump svres gpu TABLE 11-6 summarises the different computational times after running the test case on CPU (single-core and 8 threads) and GPU: np CPU CPU GPU Speed-up single-core multi-core (CPU vs. GPU) CasePump 1,009,607 >27 days >68 Table Total runtimes (mins) and CPU/GPU performance for CasePump. 63

64 Post-processing BoundaryVTK.exe -loadvtk CasePump_out/ CasePump Real.vtk -loadxml CasePump_out/CasePump.xml -motiondatatime CasePump_out -savevtkdata CasePump_out/MotionPump onlymk:13 -savevtkdata CasePump_out/Pumpfixed.vtk onlymk:11 PartVTK.exe dirin CasePump_out savevtk CasePump_out/PartFluid -onlytype:-all,+fluid In Figure 11-9, the fixed Pumpfixed.vtk, the different MotionPump.vtk and the PartFluid_xxxx.vtk files of the fluid particles are represented at different instants of the simulation. Figure Different instants of the CasePump simulation with 2.2M particles. Colour represents velocity of the particles. 64

11.5 CASEDAMBREAK2D The CaseDambreak2D is a 2D dam-break flow evolving inside a numerical tank. CaseDambreak2D.bat file summarises the tasks to be carried out using the different codes in a given order: CaseDambreak.

exe dirin CaseDambreak2D_out savevtk CaseDambreak_out/PartFluid -onlytype:-all,+fluid -vars:+id,+vel,+rhop,+press,+vor Pre-processing GenCase.

65 11.5 CASEDAMBREAK2D The CaseDambreak2D is a 2D dam-break flow evolving inside a numerical tank. CaseDambreak2D.bat file summarises the tasks to be carried out using the different codes in a given order: CaseDambreak.bat GenCase.exe CaseDambreak2D_Def CaseDambreak2D_out/CaseDambreak2D DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D svres gpu PartVTK.exe dirin CaseDambreak2D_out savevtk CaseDambreak_out/PartFluid -onlytype:-all,+fluid -vars:+id,+vel,+rhop,+press,+vor Pre-processing GenCase.exe CaseDambreak2D_Def CaseDambreak2D_out/CaseDambreak2D The CaseDambreak2D.xml is described in detail: 65

66 Processing Choices depending on Device (cpu or gpu) DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D svres cpu DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D svres gpu TABLE 11-7 summarises the different computational times: np CPU single-core CPU multi-core GPU Speed-up (CPU vs. GPU) CaseDambreak2D 20, Table Total runtimes (mins) and CPU/GPU performance for CaseDambreak2D. Post-processing PartVTK.exe dirin CaseDambreak2D_out savevtk CaseDambreak_out/PartFluid -onlytype:-all,+fluid -vars:+id,+vel,+rhop,+press,+vor Different instants of the CaseDambreak2D simulation are depicted in Figure where the PartFluid_xxxx.vtk files of the fluid particles are represented: Figure Different instants of the CaseDambreak2D simulation. For each time instant, top panel represents velocity and lower panel shows particles vorticity. 66

67 11.6 CASEWAVEMAKER2D CaseWavemaker2D simulates breaking waves on a numerical beach. CaseWavemaker2D.bat file summarises the tasks to be carried out using the different codes in a given order: CaseWavemaker2D.bat GenCase.exe CaseWavemaker2D_Def CaseWavemaker2D_out/CaseWavemaker2D DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D svres gpu PartVTK.exe dirin CaseWavemaker2D_out savevtk CaseWavemaker2D_out/PartAll -onlytype:+all,-fluidout MeasureTool dirin CaseWavemaker2D_out points PointsHeights.txt -onlytype:-all,+fluid -height -savevtk CaseWavemaker2D_out/PointsHeight -savecsv CaseWavemaker2D_out/PointsHeight Pre-processing GenCase.exe CaseWavemaker2D_Def CaseWavemaker2D_out/CaseWavemaker2D The CaseWavemaker2D.xml is described in detail: 67

68 Processing Choices depending on Device (cpu or gpu) DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D svres cpu DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D svres gpu TABLE 11-7 summarises the different computational times: np CPU single-core CPU multi-core GPU Speed-up (CPU vs. GPU) CaseWavemaker2D 19, Table Total runtimes (mins) and CPU/GPU performance for CaseWavemaker2D. 68

-onlytype:+all,-fluidout dirin CaseWavemaker2D_out points PointsHeights.

depicted in Figure 11-11 where the PartAll_xxxx.vtk files and PointsHeights_h_xxxx.

69 Post-processing PartVTK.exe MeasureTool dirin CaseWavemaker2D_out savevtk CaseWavemaker2D_out/PartAll -onlytype:+all,-fluidout dirin CaseWavemaker2D_out points PointsHeights.txt -onlytype:-all,+fluid -height Different instants of the CaseWavemaker2D simulation are depicted in Figure where the PartAll_xxxx.vtk files and PointsHeights_h_xxxx.vtk are represented: Figure Different instants of the CaseWavemaker2D simulation. For each time instant, top panel represents the particles density and lower panel displays maximum wave height. 69

11.7 CASEFLOATING Finally, a new case named CaseFloating is also provided in the directory RUN_DIRECTORY. In this case, a floating box moves in waves generated with a piston.

70 11.7 CASEFLOATING Finally, a new case named CaseFloating is also provided in the directory RUN_DIRECTORY. In this case, a floating box moves in waves generated with a piston. This example has been added since many users study problems that involve fluid-structure interaction. The floating objects in DualSPHysics are treated as rigid bodies and the evolution of the boundary particles of the fluid-driven objects are obtained from the equations described in the SPH literature (see [Monaghan et al. 2003]). However, this basic formulation does not reproduce the correct sliding movement of rigid bodies when interacting with other boundaries and frictional forces must be introduced (see Rogers et al. 2010). The next section explains how to modify in the DualSPHysics code to perform your own applications adding new features. Figure shows different instants of the simulation of this test case. The floating object is represented using VTK files generated using the provided BAT file that can help users for their own post-processing: Figure Different instants of the CaseFloating simulation. 70

User Guide for DualSPHysics code

User Guide for DualSPHysics code DualSPHysics_v3.0 December 2013 dualsphysics@gmail.com A.J.C. Crespo (alexbexe@uvigo.es) J.M. Domínguez (jmdominguez@uvigo.es) M.G. Gesteira (mggesteira@uvigo.es) A. Barreiro