Helsinki University of Technology Laboratory of Applied Thermodynamics. Parallelization of a Multi-block Flow Solver

Size: px
Start display at page:

Download "Helsinki University of Technology Laboratory of Applied Thermodynamics. Parallelization of a Multi-block Flow Solver"

Transcription

1 Helsinki University of Technology Laboratory of Applied Thermodynamics Parallelization of a Multi-block Flow Solver Patrik Rautaheimo 1, Esa Salminen 2 and Timo Siikonen 3 Helsinki University of Technology, Espoo, Finland Report No. 98 January 30, 1997 Otaniemi ISBN ISSN Research Scientist, Laboratory of Applied Thermodynamics 2 Research Scientist, Laboratory of Applied Thermodynamics 3 Associate Professor, Laboratory of Applied Thermodynamics

2 Abstract A parallelization of a Navier Stokes solver is presented. The flow solver is based on a multi-block structured grid. The parallelization is performed over the blocks, and the data on the block boundaries is exchanged using the MPI Standard. The parallelized code can also be run in a single-processor mode or on a shared memory machine. In order to facilitate the pre- and post-processing a separate program for a domain decomposition has been written. The first tests indicate a good scaleability of the parallelization approach.

3 1 Contents Nomenclature 2 1 Introduction 3 2 Basic Features of the Flow Solver Numerical method Treatment of boundary conditions Computation and data arrangement Parallelization Main principles Parallelization using MPI Communication during iteration Automatic Grid Block Splitting Main principles Grid splitting Redefinition of the boundary conditions Sample case Test Runs Scaling Blocking Single-processor performance Conclusions 21

4 / 0 2 Nomenclature total time of the calculation flux vectors in -, - and -directions flux in a given direction in space Mach number number of processes number of grid points on one edge source term Reynolds number vector of the conservative variables total communication per total calculation time ( total communication time time Cartesian coordinates angle of attack constant "!$#&%(' efficiency of parallelization ( *),+(#-%"' ) ) Subscripts.. -index; summation index viscous coordinate directions free-stream value

5 3 1 Introduction For over a decade parallelization has been used to enhance the efficiency of flow solvers. The simplest method of parallelization, which can be used with shared memory machines, takes place on the DO-loop level. DO-loop level parallelization is ineffective for a large number of processors. Better performance from a large number of processors can be obtained by dividing the space into smaller subdomains. With a shared memory machine like the Cray C94, the parallelization over the sub-domains is a trivial task, but with a massively parallel system like the Cray T3E things get more complicated. A common approach, applied e.g in [1] and [2], is to divide the computational domain into equally sized blocks and to apply message passing between the blocks. In this paper, the parallelization of a multi-block Navier Stokes software is described. The parallelization is based on the Message Passing Interface (MPI) Standard [3]. The computational domain is divided into blocks and the block boundaries are updated using MPI. In order to get a balance between the processes, the blocks should be of equal size. However, the code can handle several (smaller) blocks in one process. This property can be utilized especially with small cases and with a small number of processors, when a good load balance is not so critical. In addition to the changes in the flow solver, a separate preprocessor has been written to make the domain decomposition. This is because during the pre- and postprocessing the grid and the results can be more easily handled in a few larger blocks. In the domain decomposition the most difficult task is to handle the definition of boundary conditions automatically from the original boundary file. In the following, the flow solver and the changes required for the parallelization are briefly described. Next, the principles of the domain decomposition are given. Test runs have been performed with the T3E and T3D machines, and with a cluster of SGI Indigo workstations.

6 4 2 Basic Features of the Flow Solver 2.1 Numerical method The flow simulation is based on the solution of the Reynolds averaged Navier Stokes equations: (2.1) where is the vector of dependent variables, and represent the inviscid and viscous parts of the fluxes, and is a possible source term. The flow solver utilizes a structured multiblock grid. For the solution Eq. (2.1) is written in a finite-volume form ) )! ) ) (2.2) where ) is a cell volume, and are the inviscid and viscous parts of the flux on the cell surface. The sum is taken over the faces of the computational cell. The solution proceeds blockwise after explicitly defined boundary conditions. The boundary conditions between the blocks are defined only on the highest grid level. In each block an implicit LU-factored solution with a multigrid acceleration of convergence is performed [4]. The underlying solution method is based either on the flux-difference [5] or flux-vector [6] splitting. The flux calculation utilizes a MUSCL-type differencing with a second- or third-order accuracy. The code has been applied for external [7] and internal [8] flows. Turbulence is modelled either by an algebraic model or a two-equation model. A Reynolds-stress model is under development and has been applied for simple test cases [9]. 2.2 Treatment of boundary conditions The boundary conditions are handled using two layers of ghost cells on the block boundaries to preserve second-order accuracy, as seen in Fig The list of boundary conditions is given in Table 2.1. For connectivity, cyclic and sliding mesh boundary conditions information is exchanged between the blocks. Since the blocks can be connected in an arbitrary way, the block boundaries are divided into patches. A patch is formed by the common boundary shared by two adjacent

7 5 TWO LAYERS OF GHOST CELLS Fig The blocks are surrounded by two layers of ghost cells. BC PATCHES Fig Different boundary conditions can be applied on the patches of the block surfaces. blocks. Thus the block surface can contain several patches with different boundary conditions, as shown in Fig The definition of the patches and the corresponding boundary condition types are given as input data in a specific boundary condition file. With a shared memory machine the patch data is written on a boundary array after an iteration cycle. The receiving block reads the information from this array at a given position. This procedure is performed block by block: firstly, every block writes the values of the dependent variables on all its patches into the array and then the data is substituted from the array to the ghost cells of the appropriate blocks. Only the central memory of the machine is utilized in this approach. 2.3 Computation and data arrangement The computer code is programmed using standard Fortran77. The variables are stored in one-dimensional arrays. Starting addresses are utilized in calling routines

8 6 Table 2.1. Boundary condition types. Boundary condition Exchange of data Connectivity External Inlet Mirror Outlet Cyclic Singularity Solid Rotating solid Moving solid Sliding mesh yes no no no no yes no no no no yes to separate data for different blocks and for different multigrid levels. Computation is performed using three kinds of DO-loops. For some variables, e.g. for the calculation of the equation of state, there is a single loop over the entire block including the ghost cells. In order to exclude the ghost cells, a three-level loop over the. -, - and - directions is utilized. Most of the calculation, including the evaluation of the fluxes and making the implicit sweeps, is done slabwise as shown in Fig In this approach the ghostcells at the beginning and at the end of the row are included in the calculation, which increases the amount of calculation typically by a few per cent depending on the grid size. This treatment was originally designed for a vector computer, where the increased amount of computation is more than compensated by the enhanced efficiency owing to a longer vector. The treatment has been retained in the parallelization in order to maintain the original structure of the code and to facilitate portability between the vector and RISC machines. DIRECTION OF COMPUTATION } COMPUTATIONAL DOMAIN Fig Computation proceeds slabwise, including two parts of the ghost cell layers.

9 7 3 Parallelization 3.1 Main principles The code structure forms an ideal base for the parallelization. All the essential procedures are treated block by block including the updating of boundary conditions. By using block sizes like and several multigrid levels can be used in each block. If all the blocks are of an equal size, the work between the processors is balanced. With the current RISC processors and the block sizes given above, the calculation takes in the order of 10 seconds per iteration cycle. Since the majority of the calculation takes place slabwise, it is impractical to use very small block sizes owing to the useless computation of ghost cells. Even more important is to obtain a suitable balance between the times spent on the computation and the communication, which with current fast processors requires that blocks have a sufficient size. There are some general requirements for the parallelization. Firstly, we wish that the same software can be used in a single-processor mode or with a shared memory multiprocessor machine like the C94. In practice, it is difficult to always use the specified block sizes. Because of this the possibility to compute several blocks per processor has to be retained. This property is important, especially in small cases, where a good load balance is not so critical, and which can be calculated using a few processors. In large cases with complicated geometries, the division into equally sized blocks may also be difficult or impractical. Then small blocks can be situated in the same processor or, if the number of small blocks is small in comparison with the standard equally sized blocks, the idle time of the processors occupied by the small blocks does not decrease the overall efficiency significantly. 3.2 Parallelization using MPI The parallelization is based on the Message Passing Interface (MPI) Standard [3]. MPI routines are implemented so that the program also runs in an environment where MPI is not implemented.the updating of boundaries between different processes is done using the basic MPI SSEND and MPI RECV commands instead of using the array for boundary data, as in the case of a shared memory calculation. Also MPI BCAST and MPI GATHER are used to give input parameters to the processes, and to gather convergence histories.

10 8 PE 0 PE 1 PE 2 read and send input, boudary conditions and mesh receive input, boudary conditions and mesh from PE 0 receive input, boudary conditions and mesh from PE 0 initialization initialization initialization block connectivity block connectivity block connectivity advance computation one time step advance computation one time step advance computation one time step block connectivity block connectivity block connectivity receive and write global residuals send global residuals to PE 0 send global residuals to PE 0 continue continue Yes iteration? Yes iteration? Yes continue iteration? No No No write solution write solution write solution Fig A flow diagram of the parallelized code. Communication between the processors is depicted by horizontal dashed lines. The computational cycle is described in Fig One processing element (PE 0) is the master and the others are slaves. Parallelization is done so that only the master process reads in the input parameters, but all the processes write the output files. The master process reads input parameters, including files where boundary conditions are specified and the grid is defined, and sends the desired input parameters and the appropriate parts of the grid to the slaves. After every iteration cycle slave processes send the convergence parameters (global residuals etc.) to the master. Convergence parameters are printed on a screen and stored in a convergence-monitoring file. This is accomplished using the MPI GATHER command. Because processes are highly independent of each other, the memory requirement per process comes from the size of the block(s) that a process simulates. Since the possibility to calculate a different number of differently sized blocks has been retained, a dynamic memory allocation is performed in each process separately. Inside the process the communication can be done by using the central memory of the machine (default) or MPI subroutines can be utilized for a possible debugging. 3.3 Communication during iteration At the beginning of the calculation the connective patches are gone through and the order of communication is decided. Exchanged data is rearranged into vectors that are sent to other processes where the data is rearranged back into the desired order.

11 Fig Example of the parallel communication. By deciding the order of communication, communication is also made parallel. This means that most of the processes can communicate at the same time. As an example, four processes that are connected to each other in the way as presented in Fig. 3.2 is considered. If the order of the connection is from smaller to larger face numbers, where faces are numbered from the left counter-clockwise, we need 4 different communication time levels: 1. level processes 0-1 (2 would also like to communicate with 0 and 3 with 2), 2. level 0-2 (1 would like to communicate with 3 and 3 with 2), 3. level 2-3(0 is done, 1 would like to communicate with 3) and finally 4. level 1-3. For a more complicated case, a deadlock situation could also happen. This means that every process is waiting for some other one. Clearly a better way to do the communication is in two time levels as: 1. level 0-1;2-3 and 2. level 0-2;1-3. So every process is doing work at the same time. When one process is sending the data another one is receiving, no buffer is needed. This is achieved by numbering the connections at the beginning of computation in the master process. During the iteration all processes are communicating from the smallest number to the largest one. Numbering the connections is done so that every process can have only one connection in one communication time level. Some future development could be made to the numbering, for example by giving high numbers for the processor with the highest work load so an other can complete their communication regardless of the situation in the highest loaded processor. It was found that a synchronous send is faster than the standard one. Also nonblocking receive was tested, but no advantages were found. The best possible MPI commands were found to be MPI RECV and MPI SSEND. For performance testing extra communication is subtracted from the calculation. This is an interruption subroutine and collection of global residuals as well as global forces during the iteration. These do not have an effect on the calculation or the final result. Due to the development history of the code, in the test runs with the T3D, the communication between the boundaries is performed for each variable to be solved separately, instead of using a single pair of the MPI SEND and MPI RECV commands. These commands are performed in each block simultaneously using the standard communication mode of MPI. For the T3D the global residuals are also gathered. So parallelization is not fully comparable with the T3E.

12 10 4 Automatic Grid Block Splitting 4.1 Main principles In order to utilize the computing power of a massively parallel system, a program utilizing a simple domain decomposition algorithm has been developed for dividing large grid blocks into smaller ones. In addition to the grid splitting, the program also rewrites the boundary condition file and the computation control file. A good balance between processes is desirable. This can be achieved by dividing the space into equally sized sub-domains. However, from the point of view of grid generation, the above requirement increases the amount of manual work. This can be avoided by generating the original grid without considering too much the requirements of the parallel processing. With a separate tool, the grid can be divided into sub-blocks suitable for parallel computing. The goal is that the user does not need to work at all with the small blocks during the pre- and post-processing stages. In order to simplify the splitting algorithm, it is assumed that the original blocks are directly divided into sub-blocks, i.e. the subblocks can occupy only a single original block, as shown in Fig Because of this, the efficiency requires that the desired sub-block size, e.g., is taken into account in the grid generation. However, since the number of computational blocks per processor is arbitrary, the user can also deviate from the requirement of equal sub-blocks. The grid-splitting software keeps a record of the grid division so that after the simulation the sub-blocks can be merged into the original form. This is done for the grid itself and for the solution files in order to facilitate post-processing. 4.2 Grid splitting The splitting program can be run in two different modes. In the first mode, the user explicitly defines the grid sub-block boundaries. Then the only task of the program is to rewrite the boundary condition file. In the second mode, the program automatically splits the blocks. The only information required from the user in the latter case is the desired block edge dimension. In the automatic mode, the splitting strategy is as follows: The block is always fully cut. If the block edge dimension is smaller than the desired one, no cutting will take place. If the block edge dimension is larger than the desired one, but smaller than twice the desired size, a cutting line from the middle of the block face will

13 11 Block 1 Block 2 => Fig An example of the division of the original grid blocks into equally sized subblocks. be chosen, as shown in Fig. 4.2a. In the future this could be improved by trying to leave a larger block on the possible solid wall side. This will make things more complicated, but improves the behaviour of the turbulence models, which require wall correction. If the block edge dimension is larger than twice the desired size, but cannot be equally distributed, the smaller block will be cut from the middle of the original block, i.e. the block next to the possible solid wall is always as large as possible. The resulting division is shown in Fig. 4.2b. 4.3 Redefinition of the boundary conditions The boundary condition patch splitting is a complex task in comparison with the grid block splitting described above. This is especially true in cases where two connected blocks are not cut at the same location. That is why the algorithm divides the boundary patches using the cutting lines from both blocks. First the limits of the new boundary condition patches are computed. As a separate step, the connectivity information is updated. The program does not assume anything about the grid topology. That is, the connections in C-type or O-type blocks do not need any special treatment. The most challenging task in the BC patch splitting process is the determination of the additional cutting lines from connected patches. If the original blocks 1 and 2 are divided as shown in Fig. 4.3, the boundary patches in the neighbouring blocks will be cut correspondingly. When we calculate these lines, we must consider which block faces are connected and what the orientation is between the blocks. An additional difficulty comes from the relative position of the blocks. Since there are six faces on both blocks and four possible orientations, we have combinations. The right case can be found by computing a magic number a) b) Fig An example of the division in two different cases.

14 12 Block cutting line Additional BC patch cutting line from block 1 Block 2 Block cutting line Additional BC patch cutting line from block 2 Block 1 Fig Additional BC patch cuttings caused by the connection between the blocks. (4.1) where the face numbers and can have values from one to six and the orientation can have values from zero to three. 4.4 Sample case This example illustrates the division of the boundary patches in complex connections between the blocks. This two-block grid is purely fictitious and does not present any reasonable CFD geometry. The original grid and the split grid are shown in Fig Note that, in order to increase the complexity of the splitting task, the larger block is rotated about the -axis so that its -direction coincides with the smaller block s -direction. y I J J 2 J K I 1 J 3 K x z K K New block boundaries Fig Original two-block grid and the split four-block grid.

15 13 Both blocks are divided in the -direction at one pre-determined location. In the larger block this location is the 9th node and in the smaller block the 3rd node. However, due to the rotation of the larger block, these cuttings are not parallel but perpendicularly crossing, similar to the case shown in Fig Further, the 9th node in the larger block s -direction coincides with the 5th node in the smaller block s -direction, and the smaller block s 3rd node in the -direction coincides with the larger block s 5th node in the -direction. This is why the connective patch on both blocks has to be divided into four pieces.

16 14 5 Test Runs There are two commonly used test methods in the parallelization. One is to keep the size of one process computation constant, so-called scaling. This means that total problem size rises as the number of processes increases. Another is to keep the total problem size constant and divide it between the processes. Hereafter this is called blocking. The theoretical speed-up is different for both cases. In the first case, if total time of the calculation is and total communication time, then the ideal time spent on computation is *) + #-%"' (5.1) where is the number of processes. When the communication is taken into account we get the true computation time and the speed-up *!$#&%(' Speed-up (5.2) (5.3) where total communication per total calculation time. This is roughly equal to the communication per calculation time in one process. Assuming that all processes have their own bandwidth for the message passing, the amount of communication in scaling is the same for every process and thus is constant. This means that the speed-up should be more or less linear in this kind of test. If the total size of the problem is kept constant (blocking) the factor is not constant but a function of the used processes and the original problem size. The calculation time in one process is roughly %(' (5.4) where is the length of one edge. For the communication time we get (5.5) and the ratio is %(' (5.6)

17 15 Fig Delta wing. The symmetry plane of the grid is shown only partly. Finally we obtain the speed-up for constant size problem as Speed-up Note that in both cases the commonly used Amdahl s law Speed-up (5.7) (5.8) is not valid. The program was run on the Cray T3E, T3D machines and also on a cluster of SGI Indigo (MIPS R4400SC) workstations. In the latter case the communication between the workstations was made through a standard, low-speed Ethernet. Also the calculations with the workstation cluster were performed when there was other communication in the Ethernet. More information of the T3D and the workstation cluster tests can be found from Rautaheimo et al. [10]. With the T3E, the test runs were made with 3-dimensional torus-type topology. Every block is connected to six other blocks. This means that all processors have an equal amount of work and perfect load balancing is achieved. In a real case, the perfect load balancing cannot be achieved. This is because the boundary conditions are different. For example, calculation of the wall boundary condition will take more time than symmetric boundary condition. The test runs with the T3D and cluster of workstations were performed with a delta wing at, and or. This case has also been calculated in [7]. 5.1 Scaling Grids were generated so that all the blocks had a size of. The computational domain was split into 1-64 different blocks and each block was calculated in a different process. Thus the coarsest grid has 32,768 and the densest grid 2,097,152 grid points. A coarse level of the delta grid is shown in Fig. 5.1, and a calculated pressure distribution on a wing surface in Fig. 5.2.

18 16 Fig Pressure distribution on the delta wing. Table 5.1. Performance of the parallelization in scaling. NPS N. of cells T3E T3D SGI N/A N/A The computation time was measured over 50 iteration cycles. The time spent on the initialization and termination of the run was not included. The efficiency is obtained directly from the absolute time spent on the calculation. The results are presented in Table 5.1. It can be seen that scaling is achieved in these test runs with the T3E up to processes. The estimated speed-up for processes is roughly. With the T3D and the SGI cluster the global residuals have been collected during iteration, and consequently parallelization is not so good as in the case of the T3E. Speed-up can be seen in Fig. 5.3 with the results obtained from different platforms. One must keep in mind that results with the T3E and T3D are not directly comparable because of changes in code, and also the test case is better balanced for the T3E. Speed-up is linear for the T3E, as it should be, according to Eq. (5.3). For every block face there are data points that must be changed during iteration steps. Every point contains flow variables: density, 3 momentums, total energy, turbulent viscosity, turbulent viscosity coefficient, pressure and pres-

19 17 Fig Speed-up of the parallelization in scaling. Fig Time spent in communication each processes in scaling. sure difference. Every block has 6 faces and finally we get variables Mbytes to be sent to the other processes and to be received from the others. Time spent in communication can be seen in Fig The largest time is seconds (total time is seconds) with processes. This makes a bandwidth of Mbytes/s Mbytes/s per processor. With a very simple MPI communication the authors have achieved Mbytes/s so Mbytes/s is a relatively good performance for a complex communication. 5.2 Blocking Another way of testing parallelization is to divide a big problem into small ones. With the T3E the grid size was. This was limited because of the processor memory size in the T3E. Block sizes for different cases can be seen

20 18 Table 5.2. Performance of the parallelization in blocking. NPS N. of cells/block T3E Fig Speed-up of the parallelization in blocking. in Table 5.2. As can been seen from Table 5.2, the scaling is not so good in this case. With processes the speed-up is roughly. This follows directly from Eq. (5.7). The smaller the block is, the larger the ratio is between the communication and the calculation per processor. With a very small block size, the fact that unnecessary ghost cells are calculated in each block also decreases the efficiency. Also the faces of the block are not of an equal size and the communication times between different processors are not in balance. The speed-up is closer to Eq. (5.7) than Amdahl s law Eq. (5.8), as can be seen in Fig In Fig. 5.5, function is from Eq. (5.7) (5.9) and the function is Amdahl s law (5.10) Since the boundary conditions are treated explicitly, splitting of the computational domain into smaller parts will also reduce the performance of the implicit stage. This was tested with the delta wing by dividing the original grid of size

21 19 Fig norm of -momentum residual. Table 5.3. Effect of the compiler directive on the performance. Directive # -Osplit2 -O2 -O3 -Oaggress -Ounroll 2 -O2 -Ounroll2 -O3 -Oaggress -O3 -Oaggress -Ounroll2 -Ounroll2 -Oscalar3 into pieces. Iteration histories of -norm of -momentum can be seen in Fig. 5.6 as calculated with different block sizes. It can be seen that the convergence is not much affected by the grid size. However, it should be noted that in this case the smallest block size is still relatively large,. 5.3 Single-processor performance Some effort has been put into getting the program run efficiently in single-process mode in the T3E. A summary of the compiler directives can be seen in Table 5.3. The compiler directives -Ounroll2 -Oscalar3 were found to be the best choice, although no big difference was found between the directives in Table 5.3. By using the T3E memory streams the code runs approximately faster. Level (1, 2 or 3) of the stream did not seem to have any effect. Also some recoding was done. Recoding was done in a conservative manner, so that no vector loop was removed. Eventually the program also runs a bit faster on the C94. Also one part of the pro-

22 20 Platform Table 5.4. Single-process performance in different platforms. Speed (Mflops) C94 (240MHz Vector processor) R4000 (200 MHz IP22 with 1 Mbyte secondary cash) R8000 ( 75 MHz IP21 with 4 Mbyte secondary cash) R10000 (195 MHz IP28 with 1 Mbyte secondary cash) Digital AlphaServer (440 MHz EV56 with 4 Mbyte secondary cash) T3D (150 MHz Digital alpha processor with 8 kbyte secondary cash) T3E (300 MHz EV5 with 96 kbyte secondary cash) unmodified code gram was inlined using SGI s compiler because the compiler of the T3E does not include inlining. The time of an iteration cycle decreased because of these optimization actions by, and the speed increased from to Mflops. A single-processor performance for different platforms can be seen in Table 5.4. In these tests, the grid size was. As can be seen, the performance with the T3E is Mflops. This means that performance for the biggest case with scaling is about Gflops. Note also that the performance with the C94 will be better ( Mflops) if a larger grid is used.

23 21 6 Conclusions The parallelization of a multi-block flow solver has been presented. The parallelization takes place over the blocks and the communication between the blocks utilizes the MPI Standard. The computer code is portable with different types of machines. The first test runs were made on a cluster of SGI workstations. The performance curve obtained is almost linear up to 16 processes. This is in spite of the fact that the workstations were connected to a standard, low-speed Ethernet. With the Cray T3D machine, test runs have been made with up to 64 processors. The performance of the code is satisfactory, but not excellent. With a constant block size ( ), the efficiency with 64 processors is about 91%. However, the obtained speed indicates that even this machine can be effectively utilized with processors even with this efficiency. With the Cray T3E machine, test runs indicate excellent parallelization. With scaling, the parallelization is almost and with blocking it is still when processors were used. The increased performance is due to optimizing the communication and also the calculation with the T3E is done with a torus type 3- dimensional grid that has good load balancing. Hence, the T3D and T3E results are not directly comparable. It is also shown that Amdahl s law gives too poor performance estimates for both test cases. If the size of the case is kept as constant and the parallelization is performed by dividing the grid into smaller and smaller blocks, the efficiency of the code decreases as the number of processors is increased. This is caused by a larger ratio between communication and the calculation as the blocks are getting smaller, and also the extra time spent on the calculation of the ghost cells. Because of this property, and the requirement of the multigrid, the block size should not be smaller than ( ) or ( ). This limits the effective use of the parallelization on the T3E to cases where the number of grid points is of the order of half a million or more. In practice this is not a limitation if one has access to a vector computer with a sufficient memory size. The vector computer will be faster than the T3E in mediumsized jobs with the order of grid points. Although the benefits of a massively parallel computation are achieved with smaller cases, in CFD that is a waste of resources. Smaller cases can be calculated with sufficient small computational times on vector machines or even on workstations. In this work, the computation and also the communication are made parallel. In test cases the work balance is very good and also the performance is good. With reasonable block size the ratio between communication and calculation is under and thus no future development in communication is needed at this point. With real

24 22 application, the problem is how to divide the work between processors so that good work loading is obtained. Acknowledgments We would like to thank Sami Saarinen for his advice. Access to the parallel computer systems T3E at the Center for Scientific Computing (CSC) in Finland and the T3D of Cray Research Inc. in Eagan is also gratefully acknowledged.

25 23 Bibliography [1] Bärwolff, G., Ketelsen, K., and Thiele, F., Parallelization of a Finite-Volume Navier Stokes Solver on a T3D Massively Parallel System, in Sixth International Symposium on Computational Fluid Dynamics, (Nevada), [2] Sawley, M. and Tegner, J., A Comparison of Parallel Programming Models for Multiblock Flow Computations, Journal of Computational Physics, Vol. 122, 1995, pp [3] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard., Computer Science Dept., U. of Tennessee, Knoxville,TN, [4] Siikonen, T., Hoffren, J., and Laine, S., A Multigrid Factorization Scheme for the Thin-Layer Navier Stokes Equations, in Proceedings of the 17th ICAS Congress, (Stockholm), pp , Sept ICAS Paper [5] Roe, P., Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes, Journal of Computational Physics, Vol. 43, 1981, pp [6] Van Leer, B., Flux-Vector Splitting for the Euler Equations, in Proceeding 8th International Conference on Numerical Mehods in Fluid Dynamics, (Aachen), (also Lecture Notes in Physics, Vol. 170, 1982). [7] Siikonen, T., Kaurinkoski, P., and Laine, S., Transonic Flow over a Delta Wing Using a Turbulence Model, in Proceedings of the 19th ICAS Congress, (Anaheim), pp , Sept ICAS Paper [8] Siikonen, T. and Pan, H., Application of Roe s Method for the Simulation of Viscous Flow in Turbomachinery, in Proceedings of the first European Computational Fluid Dynamics Conference (et al., C. H., ed.), (Brussels, Belgium), pp , Elsevier Science Publishers B.V., Sept [9] Rautaheimo, P. and Siikonen, T., Implementation of the Reynolds-stress Turbulence Model, in Proceedings of the ECCOMAS Congress, (Paris), Sept [10] Rautaheimo, P., Salminen, E., and Siikonen, T., Parallelization of a Multi- Block Navier Stokes Solver, in Proceedings of the ECCOMAS Congress, (Paris), Sept

1 1 General set up In this work the CRAY T3D computer is used for benchmark the FINFLO CFD code. Computer is located in Eagan, USA. One purpose of thi

1 1 General set up In this work the CRAY T3D computer is used for benchmark the FINFLO CFD code. Computer is located in Eagan, USA. One purpose of thi Helsinki University of Technology CFD-group/ Laboratory of Applied Thermodynamics MEMO No CFD/TERMO-7-96 DATE: April 17, 1996 TITLE Transportation of FINFLO to CRAY T3D AUTHOR(S) Patrik Rautaheimo ABSTRACT

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Introduction to ANSYS CFX

Introduction to ANSYS CFX Workshop 03 Fluid flow around the NACA0012 Airfoil 16.0 Release Introduction to ANSYS CFX 2015 ANSYS, Inc. March 13, 2015 1 Release 16.0 Workshop Description: The flow simulated is an external aerodynamics

More information

Axisymmetric Viscous Flow Modeling for Meridional Flow Calculation in Aerodynamic Design of Half-Ducted Blade Rows

Axisymmetric Viscous Flow Modeling for Meridional Flow Calculation in Aerodynamic Design of Half-Ducted Blade Rows Memoirs of the Faculty of Engineering, Kyushu University, Vol.67, No.4, December 2007 Axisymmetric Viscous Flow Modeling for Meridional Flow alculation in Aerodynamic Design of Half-Ducted Blade Rows by

More information

Strömningslära Fluid Dynamics. Computer laboratories using COMSOL v4.4

Strömningslära Fluid Dynamics. Computer laboratories using COMSOL v4.4 UMEÅ UNIVERSITY Department of Physics Claude Dion Olexii Iukhymenko May 15, 2015 Strömningslära Fluid Dynamics (5FY144) Computer laboratories using COMSOL v4.4!! Report requirements Computer labs must

More information

NUMERICAL 3D TRANSONIC FLOW SIMULATION OVER A WING

NUMERICAL 3D TRANSONIC FLOW SIMULATION OVER A WING Review of the Air Force Academy No.3 (35)/2017 NUMERICAL 3D TRANSONIC FLOW SIMULATION OVER A WING Cvetelina VELKOVA Department of Technical Mechanics, Naval Academy Nikola Vaptsarov,Varna, Bulgaria (cvetelina.velkova1985@gmail.com)

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

Computational Fluid Dynamics as an advanced module of ESP-r Part 1: The numerical grid - defining resources and accuracy. Jordan A.

Computational Fluid Dynamics as an advanced module of ESP-r Part 1: The numerical grid - defining resources and accuracy. Jordan A. Computational Fluid Dynamics as an advanced module of ESP-r Part 1: The numerical grid - defining resources and accuracy Jordan A. Denev Abstract: The present paper is a first one from a series of papers

More information

Application of Wray-Agarwal Turbulence Model for Accurate Numerical Simulation of Flow Past a Three-Dimensional Wing-body

Application of Wray-Agarwal Turbulence Model for Accurate Numerical Simulation of Flow Past a Three-Dimensional Wing-body Washington University in St. Louis Washington University Open Scholarship Mechanical Engineering and Materials Science Independent Study Mechanical Engineering & Materials Science 4-28-2016 Application

More information

EXPLICIT AND IMPLICIT TVD AND ENO HIGH RESOLUTION ALGORITHMS APPLIED TO THE EULER AND NAVIER-STOKES EQUATIONS IN THREE-DIMENSIONS RESULTS

EXPLICIT AND IMPLICIT TVD AND ENO HIGH RESOLUTION ALGORITHMS APPLIED TO THE EULER AND NAVIER-STOKES EQUATIONS IN THREE-DIMENSIONS RESULTS EXPLICIT AND IMPLICIT TVD AND ENO HIGH RESOLUTION ALGORITHMS APPLIED TO THE EULER AND NAVIER-STOKES EQUATIONS IN THREE-DIMENSIONS RESULTS Edisson Sávio de Góes Maciel, edissonsavio@yahoo.com.br Mechanical

More information

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer ENERGY-224 Reservoir Simulation Project Report Ala Alzayer Autumn Quarter December 3, 2014 Contents 1 Objective 2 2 Governing Equations 2 3 Methodolgy 3 3.1 BlockMesh.........................................

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

Stream Function-Vorticity CFD Solver MAE 6263

Stream Function-Vorticity CFD Solver MAE 6263 Stream Function-Vorticity CFD Solver MAE 66 Charles O Neill April, 00 Abstract A finite difference CFD solver was developed for transient, two-dimensional Cartesian viscous flows. Flow parameters are solved

More information

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering Debojyoti Ghosh Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering To study the Dynamic Stalling of rotor blade cross-sections Unsteady Aerodynamics: Time varying

More information

Modeling External Compressible Flow

Modeling External Compressible Flow Tutorial 3. Modeling External Compressible Flow Introduction The purpose of this tutorial is to compute the turbulent flow past a transonic airfoil at a nonzero angle of attack. You will use the Spalart-Allmaras

More information

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved.

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved. Workshop 2 Transonic Flow Over a NACA 0012 Airfoil. Introduction to CFX WS2-1 Goals The purpose of this tutorial is to introduce the user to modelling flow in high speed external aerodynamic applications.

More information

Verification and Validation of Turbulent Flow around a Clark-Y Airfoil

Verification and Validation of Turbulent Flow around a Clark-Y Airfoil Verification and Validation of Turbulent Flow around a Clark-Y Airfoil 1. Purpose 58:160 Intermediate Mechanics of Fluids CFD LAB 2 By Tao Xing and Fred Stern IIHR-Hydroscience & Engineering The University

More information

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids Proceedings of the International MultiConference of Engineers and Computer Scientists 213 Vol II, IMECS 213, March 13-15, 213, Hong Kong The Development of a Navier-Stokes Flow Solver with Preconditioning

More information

NUMERICAL INVESTIGATION OF THE FLOW BEHAVIOR INTO THE INLET GUIDE VANE SYSTEM (IGV)

NUMERICAL INVESTIGATION OF THE FLOW BEHAVIOR INTO THE INLET GUIDE VANE SYSTEM (IGV) University of West Bohemia» Department of Power System Engineering NUMERICAL INVESTIGATION OF THE FLOW BEHAVIOR INTO THE INLET GUIDE VANE SYSTEM (IGV) Publication was supported by project: Budování excelentního

More information

Navier-Stokes Computations on Commodity Computers

Navier-Stokes Computations on Commodity Computers Navier-Stokes Computations on Commodity Computers By Veer N. Vatsa NASA Langley Research Center, Hampton, VA v.n.vatsa@larc.nasa.gov And Thomas R. Faulkner MRJ Technology Solutions, Moffett Field, CA faulkner@nas.nasa.gov

More information

Strategies for Parallelizing a Navier-Stokes Code on the Intel Touchstone Machines

Strategies for Parallelizing a Navier-Stokes Code on the Intel Touchstone Machines Strategies for Parallelizing a Navier-Stokes Code on the Intel Touchstone Machines Jochem Häuser European Space Agency and Roy Williams California Institute of Technology Abstract The purpose of this paper

More information

PARALLEL DNS USING A COMPRESSIBLE TURBULENT CHANNEL FLOW BENCHMARK

PARALLEL DNS USING A COMPRESSIBLE TURBULENT CHANNEL FLOW BENCHMARK European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS Computational Fluid Dynamics Conference 2001 Swansea, Wales, UK, 4-7 September 2001 ECCOMAS PARALLEL DNS USING A COMPRESSIBLE

More information

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 First-Order Hyperbolic System Method If you have a CFD book for hyperbolic problems, you have a CFD book for all problems.

More information

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI François Thirifay and Philippe Geuzaine CENAERO, Avenue Jean Mermoz 30, B-6041 Gosselies, Belgium Abstract. This paper reports

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Semi-automatic domain decomposition based on potential theory

Semi-automatic domain decomposition based on potential theory Semi-automatic domain decomposition based on potential theory S.P. Spekreijse and J.C. Kok Nationaal Lucht- en Ruimtevaartlaboratorium National Aerospace Laboratory NLR Semi-automatic domain decomposition

More information

Modeling Unsteady Compressible Flow

Modeling Unsteady Compressible Flow Tutorial 4. Modeling Unsteady Compressible Flow Introduction In this tutorial, FLUENT s density-based implicit solver is used to predict the timedependent flow through a two-dimensional nozzle. As an initial

More information

A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS

A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS HEFAT2012 9 th International Conference on Heat Transfer, Fluid Mechanics and Thermodynamics 16 18 July 2012 Malta A STUDY ON THE UNSTEADY AERODYNAMICS OF PROJECTILES IN OVERTAKING BLAST FLOWFIELDS Muthukumaran.C.K.

More information

Compressible Flow in a Nozzle

Compressible Flow in a Nozzle SPC 407 Supersonic & Hypersonic Fluid Dynamics Ansys Fluent Tutorial 1 Compressible Flow in a Nozzle Ahmed M Nagib Elmekawy, PhD, P.E. Problem Specification Consider air flowing at high-speed through a

More information

Module D: Laminar Flow over a Flat Plate

Module D: Laminar Flow over a Flat Plate Module D: Laminar Flow over a Flat Plate Summary... Problem Statement Geometry and Mesh Creation Problem Setup Solution. Results Validation......... Mesh Refinement.. Summary This ANSYS FLUENT tutorial

More information

A DRAG PREDICTION VALIDATION STUDY FOR AIRCRAFT AERODYNAMIC ANALYSIS

A DRAG PREDICTION VALIDATION STUDY FOR AIRCRAFT AERODYNAMIC ANALYSIS A DRAG PREDICTION VALIDATION STUDY FOR AIRCRAFT AERODYNAMIC ANALYSIS Akio OCHI, Eiji SHIMA Kawasaki Heavy Industries, ltd Keywords: CFD, Drag prediction, Validation Abstract A CFD drag prediction validation

More information

Verification of Laminar and Validation of Turbulent Pipe Flows

Verification of Laminar and Validation of Turbulent Pipe Flows 1 Verification of Laminar and Validation of Turbulent Pipe Flows 1. Purpose ME:5160 Intermediate Mechanics of Fluids CFD LAB 1 (ANSYS 18.1; Last Updated: Aug. 1, 2017) By Timur Dogan, Michael Conger, Dong-Hwan

More information

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology

More information

Simulation of Flow Development in a Pipe

Simulation of Flow Development in a Pipe Tutorial 4. Simulation of Flow Development in a Pipe Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D turbulent fluid flow in a pipe. The pipe networks are common

More information

Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern

Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern Influence of mesh quality and density on numerical calculation of heat exchanger with undulation in herringbone pattern Václav Dvořák, Jan Novosád Abstract Research of devices for heat recovery is currently

More information

Backward facing step Homework. Department of Fluid Mechanics. For Personal Use. Budapest University of Technology and Economics. Budapest, 2010 autumn

Backward facing step Homework. Department of Fluid Mechanics. For Personal Use. Budapest University of Technology and Economics. Budapest, 2010 autumn Backward facing step Homework Department of Fluid Mechanics Budapest University of Technology and Economics Budapest, 2010 autumn Updated: October 26, 2010 CONTENTS i Contents 1 Introduction 1 2 The problem

More information

A B C D E. Settings Choose height, H, free stream velocity, U, and fluid (dynamic viscosity and density ) so that: Reynolds number

A B C D E. Settings Choose height, H, free stream velocity, U, and fluid (dynamic viscosity and density ) so that: Reynolds number Individual task Objective To derive the drag coefficient for a 2D object, defined as where D (N/m) is the aerodynamic drag force (per unit length in the third direction) acting on the object. The object

More information

A Test Suite for High-Performance Parallel Java

A Test Suite for High-Performance Parallel Java page 1 A Test Suite for High-Performance Parallel Java Jochem Häuser, Thorsten Ludewig, Roy D. Williams, Ralf Winkelmann, Torsten Gollnick, Sharon Brunett, Jean Muylaert presented at 5th National Symposium

More information

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck. To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science

More information

Verification and Validation of Turbulent Flow around a Clark-Y Airfoil

Verification and Validation of Turbulent Flow around a Clark-Y Airfoil 1 Verification and Validation of Turbulent Flow around a Clark-Y Airfoil 1. Purpose ME:5160 Intermediate Mechanics of Fluids CFD LAB 2 (ANSYS 19.1; Last Updated: Aug. 7, 2018) By Timur Dogan, Michael Conger,

More information

EVALUATION OF A GENERAL CFD-SOLVER FOR A MICRO-SCALE URBAN FLOW

EVALUATION OF A GENERAL CFD-SOLVER FOR A MICRO-SCALE URBAN FLOW EVALATION OF A GENERAL CFD-SOLVER FOR A MICRO-SCALE RBAN FLOW Jarkko Saloranta and Antti Hellsten Helsinki niversity of Technology, Laboratory of Aerodynamics, Finland INTRODCTION In this work we study

More information

Using the Eulerian Multiphase Model for Granular Flow

Using the Eulerian Multiphase Model for Granular Flow Tutorial 21. Using the Eulerian Multiphase Model for Granular Flow Introduction Mixing tanks are used to maintain solid particles or droplets of heavy fluids in suspension. Mixing may be required to enhance

More information

Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics

Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Direct Numerical Simulation and Turbulence Modeling for Fluid- Structure Interaction in Aerodynamics Thibaut Deloze a, Yannick

More information

Using a Single Rotating Reference Frame

Using a Single Rotating Reference Frame Tutorial 9. Using a Single Rotating Reference Frame Introduction This tutorial considers the flow within a 2D, axisymmetric, co-rotating disk cavity system. Understanding the behavior of such flows is

More information

STAR-CCM+: Wind loading on buildings SPRING 2018

STAR-CCM+: Wind loading on buildings SPRING 2018 STAR-CCM+: Wind loading on buildings SPRING 2018 1. Notes on the software 2. Assigned exercise (submission via Blackboard; deadline: Thursday Week 3, 11 pm) 1. NOTES ON THE SOFTWARE STAR-CCM+ generates

More information

Lisa Michelle Dahl June 04, 2008 CHEM E Finlayson Mixing Properties of an Optimized SAR Mixer (2D and 3D Models)

Lisa Michelle Dahl June 04, 2008 CHEM E Finlayson Mixing Properties of an Optimized SAR Mixer (2D and 3D Models) Mixing Properties of an Optimized SAR Mixer (2D and 3D Models) I. Introduction While mixing is the most common unit-operation in chemical industrial processes, optimizing mixing in microchemical processes

More information

Large-scale Gas Turbine Simulations on GPU clusters

Large-scale Gas Turbine Simulations on GPU clusters Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based

More information

Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways

Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways Three Dimensional Numerical Simulation of Turbulent Flow Over Spillways Latif Bouhadji ASL-AQFlow Inc., Sidney, British Columbia, Canada Email: lbouhadji@aslenv.com ABSTRACT Turbulent flows over a spillway

More information

AIR LOAD CALCULATION FOR ISTANBUL TECHNICAL UNIVERSITY (ITU), LIGHT COMMERCIAL HELICOPTER (LCH) DESIGN ABSTRACT

AIR LOAD CALCULATION FOR ISTANBUL TECHNICAL UNIVERSITY (ITU), LIGHT COMMERCIAL HELICOPTER (LCH) DESIGN ABSTRACT AIR LOAD CALCULATION FOR ISTANBUL TECHNICAL UNIVERSITY (ITU), LIGHT COMMERCIAL HELICOPTER (LCH) DESIGN Adeel Khalid *, Daniel P. Schrage + School of Aerospace Engineering, Georgia Institute of Technology

More information

Investigation of mixing chamber for experimental FGD reactor

Investigation of mixing chamber for experimental FGD reactor Investigation of mixing chamber for experimental FGD reactor Jan Novosád 1,a, Petra Danová 1 and Tomáš Vít 1 1 Department of Power Engineering Equipment, Faculty of Mechanical Engineering, Technical University

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

An Introduction to SolidWorks Flow Simulation 2010

An Introduction to SolidWorks Flow Simulation 2010 An Introduction to SolidWorks Flow Simulation 2010 John E. Matsson, Ph.D. SDC PUBLICATIONS www.sdcpublications.com Schroff Development Corporation Chapter 2 Flat Plate Boundary Layer Objectives Creating

More information

Numerical and theoretical analysis of shock waves interaction and reflection

Numerical and theoretical analysis of shock waves interaction and reflection Fluid Structure Interaction and Moving Boundary Problems IV 299 Numerical and theoretical analysis of shock waves interaction and reflection K. Alhussan Space Research Institute, King Abdulaziz City for

More information

STAR-CCM+ User Guide 6922

STAR-CCM+ User Guide 6922 STAR-CCM+ User Guide 6922 Introduction Welcome to the STAR-CCM+ introductory tutorial. In this tutorial, you explore the important concepts and workflow. Complete this tutorial before attempting any others.

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Parallel Computation of Industrial Flows on the Cray T3D

Parallel Computation of Industrial Flows on the Cray T3D Parallel Computation of Industrial Flows on the Cray T3D Olivier Byrde, David Cobut and Mark L. Sawley, Institut de Machines Hydrauliques et de Mécanique des Fluides, Ecole Polytechnique Fédérale de Lausanne,

More information

Cost-Effective Parallel Computational Electromagnetic Modeling

Cost-Effective Parallel Computational Electromagnetic Modeling Cost-Effective Parallel Computational Electromagnetic Modeling, Tom Cwik {Daniel.S.Katz, cwik}@jpl.nasa.gov Beowulf System at PL (Hyglac) l 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,

More information

Tutorial 2. Modeling Periodic Flow and Heat Transfer

Tutorial 2. Modeling Periodic Flow and Heat Transfer Tutorial 2. Modeling Periodic Flow and Heat Transfer Introduction: Many industrial applications, such as steam generation in a boiler or air cooling in the coil of an air conditioner, can be modeled as

More information

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Objective: The objective of this laboratory is to introduce how to use FLUENT to solve both transient and natural convection problems.

More information

FEMLAB Exercise 1 for ChE366

FEMLAB Exercise 1 for ChE366 FEMLAB Exercise 1 for ChE366 Problem statement Consider a spherical particle of radius r s moving with constant velocity U in an infinitely long cylinder of radius R that contains a Newtonian fluid. Let

More information

ANSYS FLUENT. Airfoil Analysis and Tutorial

ANSYS FLUENT. Airfoil Analysis and Tutorial ANSYS FLUENT Airfoil Analysis and Tutorial ENGR083: Fluid Mechanics II Terry Yu 5/11/2017 Abstract The NACA 0012 airfoil was one of the earliest airfoils created. Its mathematically simple shape and age

More information

SolidWorks Flow Simulation 2014

SolidWorks Flow Simulation 2014 An Introduction to SolidWorks Flow Simulation 2014 John E. Matsson, Ph.D. SDC PUBLICATIONS Better Textbooks. Lower Prices. www.sdcpublications.com Powered by TCPDF (www.tcpdf.org) Visit the following websites

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Tutorial 17. Using the Mixture and Eulerian Multiphase Models

Tutorial 17. Using the Mixture and Eulerian Multiphase Models Tutorial 17. Using the Mixture and Eulerian Multiphase Models Introduction: This tutorial examines the flow of water and air in a tee junction. First you will solve the problem using the less computationally-intensive

More information

Three dimensional meshless point generation technique for complex geometry

Three dimensional meshless point generation technique for complex geometry Three dimensional meshless point generation technique for complex geometry *Jae-Sang Rhee 1), Jinyoung Huh 2), Kyu Hong Kim 3), Suk Young Jung 4) 1),2) Department of Mechanical & Aerospace Engineering,

More information

computational Fluid Dynamics - Prof. V. Esfahanian

computational Fluid Dynamics - Prof. V. Esfahanian Three boards categories: Experimental Theoretical Computational Crucial to know all three: Each has their advantages and disadvantages. Require validation and verification. School of Mechanical Engineering

More information

Optimizing Bio-Inspired Flow Channel Design on Bipolar Plates of PEM Fuel Cells

Optimizing Bio-Inspired Flow Channel Design on Bipolar Plates of PEM Fuel Cells Excerpt from the Proceedings of the COMSOL Conference 2010 Boston Optimizing Bio-Inspired Flow Channel Design on Bipolar Plates of PEM Fuel Cells James A. Peitzmeier *1, Steven Kapturowski 2 and Xia Wang

More information

Calculate a solution using the pressure-based coupled solver.

Calculate a solution using the pressure-based coupled solver. Tutorial 19. Modeling Cavitation Introduction This tutorial examines the pressure-driven cavitating flow of water through a sharpedged orifice. This is a typical configuration in fuel injectors, and brings

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics July 11, 2016 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

µ = Pa s m 3 The Reynolds number based on hydraulic diameter, D h = 2W h/(w + h) = 3.2 mm for the main inlet duct is = 359

µ = Pa s m 3 The Reynolds number based on hydraulic diameter, D h = 2W h/(w + h) = 3.2 mm for the main inlet duct is = 359 Laminar Mixer Tutorial for STAR-CCM+ ME 448/548 March 30, 2014 Gerald Recktenwald gerry@pdx.edu 1 Overview Imagine that you are part of a team developing a medical diagnostic device. The device has a millimeter

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver Juan J. Alonso Department of Aeronautics & Astronautics Stanford University CME342 Lecture 14 May 26, 2014 Outline Non-linear

More information

SHOCK WAVES IN A CHANNEL WITH A CENTRAL BODY

SHOCK WAVES IN A CHANNEL WITH A CENTRAL BODY SHOCK WAVES IN A CHANNEL WITH A CENTRAL BODY A. N. Ryabinin Department of Hydroaeromechanics, Faculty of Mathematics and Mechanics, Saint-Petersburg State University, St. Petersburg, Russia E-Mail: a.ryabinin@spbu.ru

More information

McNair Scholars Research Journal

McNair Scholars Research Journal McNair Scholars Research Journal Volume 2 Article 1 2015 Benchmarking of Computational Models against Experimental Data for Velocity Profile Effects on CFD Analysis of Adiabatic Film-Cooling Effectiveness

More information

Network Bandwidth & Minimum Efficient Problem Size

Network Bandwidth & Minimum Efficient Problem Size Network Bandwidth & Minimum Efficient Problem Size Paul R. Woodward Laboratory for Computational Science & Engineering (LCSE), University of Minnesota April 21, 2004 Build 3 virtual computers with Intel

More information

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM)

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) Computational Methods and Experimental Measurements XVII 235 Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) K. Rehman Department of Mechanical Engineering,

More information

Non-Newtonian Transitional Flow in an Eccentric Annulus

Non-Newtonian Transitional Flow in an Eccentric Annulus Tutorial 8. Non-Newtonian Transitional Flow in an Eccentric Annulus Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D, turbulent flow of a non-newtonian fluid. Turbulent

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

The Spalart Allmaras turbulence model

The Spalart Allmaras turbulence model The Spalart Allmaras turbulence model The main equation The Spallart Allmaras turbulence model is a one equation model designed especially for aerospace applications; it solves a modelled transport equation

More information

IMAGE ANALYSIS DEDICATED TO POLYMER INJECTION MOLDING

IMAGE ANALYSIS DEDICATED TO POLYMER INJECTION MOLDING Image Anal Stereol 2001;20:143-148 Original Research Paper IMAGE ANALYSIS DEDICATED TO POLYMER INJECTION MOLDING DAVID GARCIA 1, GUY COURBEBAISSE 2 AND MICHEL JOURLIN 3 1 European Polymer Institute (PEP),

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics December 0, 0 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Available online at ScienceDirect. Procedia Engineering 99 (2015 )

Available online at   ScienceDirect. Procedia Engineering 99 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 99 (2015 ) 575 580 APISAT2014, 2014 Asia-Pacific International Symposium on Aerospace Technology, APISAT2014 A 3D Anisotropic

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud

Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud CFD Support s OpenFOAM and UberCloud Containers enable efficient, effective, and easy access and use of MEET THE TEAM End-User/CFD

More information

CFD Modeling of a Radiator Axial Fan for Air Flow Distribution

CFD Modeling of a Radiator Axial Fan for Air Flow Distribution CFD Modeling of a Radiator Axial Fan for Air Flow Distribution S. Jain, and Y. Deshpande Abstract The fluid mechanics principle is used extensively in designing axial flow fans and their associated equipment.

More information

CFD modelling of thickened tailings Final project report

CFD modelling of thickened tailings Final project report 26.11.2018 RESEM Remote sensing supporting surveillance and operation of mines CFD modelling of thickened tailings Final project report Lic.Sc.(Tech.) Reeta Tolonen and Docent Esa Muurinen University of

More information

Express Introductory Training in ANSYS Fluent Workshop 04 Fluid Flow Around the NACA0012 Airfoil

Express Introductory Training in ANSYS Fluent Workshop 04 Fluid Flow Around the NACA0012 Airfoil Express Introductory Training in ANSYS Fluent Workshop 04 Fluid Flow Around the NACA0012 Airfoil Dimitrios Sofialidis Technical Manager, SimTec Ltd. Mechanical Engineer, PhD PRACE Autumn School 2013 -

More information

A NURBS-BASED APPROACH FOR SHAPE AND TOPOLOGY OPTIMIZATION OF FLOW DOMAINS

A NURBS-BASED APPROACH FOR SHAPE AND TOPOLOGY OPTIMIZATION OF FLOW DOMAINS 6th European Conference on Computational Mechanics (ECCM 6) 7th European Conference on Computational Fluid Dynamics (ECFD 7) 11 15 June 2018, Glasgow, UK A NURBS-BASED APPROACH FOR SHAPE AND TOPOLOGY OPTIMIZATION

More information

An Object-Oriented Serial and Parallel DSMC Simulation Package

An Object-Oriented Serial and Parallel DSMC Simulation Package An Object-Oriented Serial and Parallel DSMC Simulation Package Hongli Liu and Chunpei Cai Department of Mechanical and Aerospace Engineering, New Mexico State University, Las Cruces, New Mexico, 88, USA

More information

Parallel Mesh Multiplication for Code_Saturne

Parallel Mesh Multiplication for Code_Saturne Parallel Mesh Multiplication for Code_Saturne Pavla Kabelikova, Ales Ronovsky, Vit Vondrak a Dept. of Applied Mathematics, VSB-Technical University of Ostrava, Tr. 17. listopadu 15, 708 00 Ostrava, Czech

More information

PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko

PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko Computing and Informatics, Vol. 28, 2009, 139 150 PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON Pawe l Wróblewski, Krzysztof Boryczko Department of Computer

More information

Free Convection Cookbook for StarCCM+

Free Convection Cookbook for StarCCM+ ME 448/548 February 28, 2012 Free Convection Cookbook for StarCCM+ Gerald Recktenwald gerry@me.pdx.edu 1 Overview Figure 1 depicts a two-dimensional fluid domain bounded by a cylinder of diameter D. Inside

More information

ELSA Performance Analysis

ELSA Performance Analysis ELSA Performance Analysis Xavier Saez and José María Cela Barcelona Supercomputing Center Technical Report TR/CASE-08-1 2008 1 ELSA Performance Analysis Xavier Saez 1 and José María Cela 2 1 Computer Application

More information

Numerical studies for Flow Around a Sphere regarding different flow regimes caused by various Reynolds numbers

Numerical studies for Flow Around a Sphere regarding different flow regimes caused by various Reynolds numbers Numerical studies for Flow Around a Sphere regarding different flow regimes caused by various Reynolds numbers R. Jendrny, H. Damanik, O. Mierka, S. Turek Institute of Applied Mathematics (LS III), TU

More information

On the high order FV schemes for compressible flows

On the high order FV schemes for compressible flows Applied and Computational Mechanics 1 (2007) 453-460 On the high order FV schemes for compressible flows J. Fürst a, a Faculty of Mechanical Engineering, CTU in Prague, Karlovo nám. 13, 121 35 Praha, Czech

More information

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS 14 th European Conference on Mixing Warszawa, 10-13 September 2012 LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS Felix Muggli a, Laurent Chatagny a, Jonas Lätt b a Sulzer Markets & Technology

More information

Keywords: CFD, aerofoil, URANS modeling, flapping, reciprocating movement

Keywords: CFD, aerofoil, URANS modeling, flapping, reciprocating movement L.I. Garipova *, A.N. Kusyumov *, G. Barakos ** * Kazan National Research Technical University n.a. A.N.Tupolev, ** School of Engineering - The University of Liverpool Keywords: CFD, aerofoil, URANS modeling,

More information

Workbench Tutorial Flow Over an Airfoil, Page 1 ANSYS Workbench Tutorial Flow Over an Airfoil

Workbench Tutorial Flow Over an Airfoil, Page 1 ANSYS Workbench Tutorial Flow Over an Airfoil Workbench Tutorial Flow Over an Airfoil, Page 1 ANSYS Workbench Tutorial Flow Over an Airfoil Authors: Scott Richards, Keith Martin, and John M. Cimbala, Penn State University Latest revision: 17 January

More information