Computation Ideal ρ/ρ Speedup High ρ No. of Processors.

Size: px
Start display at page:

Download "Computation Ideal ρ/ρ Speedup High ρ No. of Processors."

Transcription

1 1 A Fully Concurrent DSMC Implementation with Adaptive Domain Decomposition C.D. Robinson and J.K. Harvey Department of Aeronautics, Imperial College, London, SW7 2BY, U.K. A concurrent implementation of the direct simulation Monte Carlo method (DSMC) for the solution of complex gas ows, coupled with an adaptive domain decomposition algorithm for unstructured meshes is described. An example indicates that use of the dynamic domain decomposition technique signicantly increases the parallel eciency of the DSMC code. A clear direction for further work is indicated. 1. Introduction A single solution method for gas ows ranging from the continuum to the rareed regime would be of great use to engineers and scientists. The direct simulation Monte Carlo method (DSMC) has this capability, but it is computationally expensive. Until recently computer power was insucient to apply the method to gases which were not rareed. Multi-processor (MP) computers provide an answer to this problem. DSMC is a particle based gas simulation method in which a computer is used to track simulator molecules. The particles are phenomenological models of the molecules in the real gas being computed. A mesh is used to discretise the oweld and consequently ows around complex bodies can be simulated if the correct meshing technique is used. A popular method of parallelising DSMC is by use of a spatial mesh decomposition over the processor array of an MP machine [5], [3]. The method then conforms to the single process multiple data (SPMD) paradigm, and the only addition required to the original serial algorithm is the inclusion of message passing for simulators which cross the subdomain boundaries. The computational load exerted by DSMC on a processor is largely dependent on the number of particles simulated upon it. As the particles are free to move throughout the oweld the load across the processor array will be unbalanced at least during some time of the computation, leading to an inecient use of computational resources. Since there is no way to predict the distribution of particles a priori an automatic adaptive scheme that will balance the load during the runtime of the DSMC computation is clearly required. As domain decomposition is used redistribution of load is equivalent to altering the mesh decomposition over the processors. CDR gratefully acknowledges the support of DERA for this work.

2 2 2. DSMC Implementation The key assumption behind the DSMC method, due largely to Bird [1], is the splitting of the movement and collisions of the simulators over a small timestep. The simulators are allowed to move freely over the timestep. Once the move phase is completed and boundary interactions have been computed a number of collisions are calculated such that the collision rate in the simulated volume of gas is commensurate with that in the gas being modelled. The collision partners are chosen by taking pairs of particles at random which have close spatial proximity. The cell structure is usually used for the denition of this proximity, with only pairs of particles from the same cell being considered as potential collision partners. Pairs of simulators are accepted for collision based on appropriate probabilities and the collision is modelled using standard two body mechanics. Splitting of collisions and movement is valid if the modelled gas is dilute, that is, the mean spacing between molecules is much greater than the size of the molecules. DSMC has the advantage that it is applicable to ows which are not in thermodynamic or chemical equilibrium. Chemical reactions are naturally treated within the collision framework, and pose no numerical diculties, other than an increased computational load. A computation is usually started impulsively and advances in real time via time-stepping, eventually reaching a steady state if appropriate boundary conditions are specied. A parallel DSMC system has been developed using the decomposition-spmd paradigms. The program consists of four classes of routines: physical modelling, geometric modelling, management and parallel. There are also several pre and post-processing elements. Each of these classes is functionally independent of the others, with interfacing carried out by the management routines. This approach helps code maintainability and allows ease of scalability, with no modications being required to run physical routines in a parallel environment. The codes are able to run in a serial or parallel environment without source code modications. Unstructured meshes are used because they can represent complex geometries readily. Generation of these meshes requires minimal eort on the part of the user and they can also automatically adapt to ow features such as shock waves. The code has been used to simulate a number of ows ranging from hypersonic shock-boundary layer interaction [7] through to low speed ow instabilities [10] and modelling of the ow in a chemical vapour deposition reactor chamber used for epitaxial silicon growth. The meshed computational domain is decomposed over the number of processors required for the calculation. Each processor possesses a sub-domain on which the DSMC computation is run. Each sub-domain consists of a computational domain and a one deep halo layer of cells which surrounds the computational domain. The halo cells are in the computational domain of neighbouring processors. The interface between the computational domain and the halo layer represents an inter-processor boundary (IPB). The geometric data is localised for each sub-domain and each processor knows nothing of the domain outside of its halo. Message passing for the codes is implemented using MPI primitives. The only message passing required for the DSMC calculation is related to the particles crossing IPB's. All particles crossing IPB's in one timestep are collected into a group for sending en-masse rather than individually. This message passing at each timestep ensures the calculation progresses in a synchronous manner across the processor array. If the timestep is of the

3 3 correct order particles should not move more than one cell width in the interval. Consequently if a particle crosses an IPB during a timestep it is generally possible to identify its nal position in the halo. Sure knowledge of the simulator's destination signicantly reduces communication overhead. Outgoing particles are sorted according to the destination processor and sent using non-blocking localised MPI routines. Use of this technique enhances scalability. 3. Dynamic Mesh Partitioning The mesh decomposition problem is usually approached as a graph partitioning exercise. A mesh is represented as a undirected graph G(V,E), where V represents the set of vertices v in the graph, and the vertices are connected by the set of edges, E. Each vertex can be given a weight w, which is an indication of the computational load associated with it. The vertices in the graph are assigned to dierent processors by the partition function P: This divides the domain into smaller sub-domains and introduces a contracted subdomain graph G (s) (P; "), where P represents the set of sub-domains p induced by P and " represents the edges connecting the sub-domains. An edge in G is termed cut and noted by E c if it joins two vertices v, and u in V such that P(v) = p and P(u) = q where p and q are dierent sub-domains in P. An edge between two dierent sub-domains p and q is dened as " pq [E cpq ; where E cpq is an edge connecting a vertex v belonging to p and vertex u belonging to q. P The aim is to partition the graph such that E c is minimised and the weight of each sub-domain is approximately equal. This problem, in the static case, is known to be NPcomplete, and so approximate heuristic procedures are always employed to obtain a nearoptimal solution [4], [12], [2]. In the case of a dynamic loading scenario the static problem must be iteratively solved in order to maintain ecient use of computing resources Issues Relevant to DSMC DSMC computations are cell-centred. Hence the set of graph vertices are the cells and E is equivalent to the cell connectivity. This type of graph is sometimes referred to as the dual graph of the mesh. The load imbalance during a parallel computation arises due to the particle movement. The spatial position of particles can change relatively quickly, and consequently the domain adaption may have to be carried out many times during a computation. Therefore the cost of re-mapping the domain must not be expensive in comparison to the DSMC calculation. The load variation between timesteps can be fairly signicant within a parallel DSMC calculation largely due to the uctuations in the number of particles per processor. Even a serial calculation may have loading dierences between timesteps due to the probabilistic nature of the collision algorithm and for these reasons DSMC cannot be load balanced to within very ne tolerances as is the case with conventional CFD codes. The number of particles crossing IPB's is dependent on the length (area) of the IPB. Hence, in common with other applications the communication cost is dependent on E c, but is more variable. Given all these factors it is clear that an automatic load balancing strategy is required that monitors the true load on a processor. Since the data structure on each sub-domain is localised the remapping policy should be localised in order to minimise communication costs and re-use the existing partition. This means that vertices should only be exchanged over connected edges in G (s) :

4 Current Implementation The approach taken by the authors has concentrated on implementing a robust parallelised dynamic domain decomposition scheme in the rst instance, with approximate load balancing and edge minimisation heuristics employed. It is aimed that these heuristics will be improved once the robust scheme is proven. The method is similar to that of other researchers in the area [2]. An initial decomposition is performed cheaply and the DSMC computation is initiated on this. This decomposition is then adapted in a localised diusive fashion, by moving vertices lying along the IPB's, throughout the DSMC computation in order to approximately balance the load between processors and minimise surface area. The adaptive domain decomposition program is called adder. A continuous load monitoring policy is employed by adder. At a xed interval of timesteps, each processor broadcasts its current load state and this is fed into a decision scheme such as the stop at rise (SAR) formula of Nicol and Saltz [6] which compares the cost of re-mapping with the cost of not re-mapping. Use of this re-mapping decision process enables automation and the use of true timing data to gauge load which would otherwise be very dicult to model. Once the decision is made for load balancing there are several steps taken on each processor. Firstly each processor computes a localised \load table" for its neighbourhood. It uses this to dene the direction of load ow across an edge " pq. Each processor then identies its vertices lying on an IPB by a search through the halo. The next step is the identication of vertices which should be shed in order to improve the geometric shape of the domain. This is based on a geometric criterion and only border vertices are allowed to be shed. Consequently a sub-domain can only receive vertices from within its halo. Each sub-domain carries out this operation in parallel. In order that conicts do not occur in these \geometry sheds", an update is carried out of the neighbourhood so that each processor is aware of cells in its halo that will be transferred to it. The next step is the identication of vertices to be transferred for the load equalisation. Once again only border vertices are allowed to be transferred. Currently all vertices along the edge " pq are ear-marked for movement. Allowed geometry sheds are also included in the sending lists at this stage. Once the vertices are ear-marked the quality of resulting boundary is examined and further vertices are added to the shedding list if their inclusion reduces E c. This candidate choosing strategy is admittedly crude. Once all the vertices are marked the actual inter-processor communication takes place. The DSMC data associated with the vertices must also be sent at this stage. Note that the geometric data associated with the vertices is not required since the received vertices are already in the halo. Once the transfer of computational cells is complete the haloes on the sub-domain need reconstructing and this results in the nal section of message passing. 4. Results The results in this paper are all from computations of rareed driven cavity ows, computed on the AP1000 at Imperial College. A typical computational domain is shown in Figure 1(a). The upper, left and right hand walls remain stationary whilst the bottom wall can be moved from left to right. All boundaries are the same temperature and are fully diuse. When the wall is moving the gas is compressed into the bottom right hand

5 Computation Ideal ρ/ρ Speedup /β High ρ No. of Processors Figure 1. (a) Normalised Density Contours (b) Comparison of Speedup corner whilst a region of low density develops near the centre of the cavity due to the induced vortex. These densities translate into computational workload for the processors. Figure 1(a) shows the normalised density contours for the case where the wall is moving at approximately Mach 8. The ow and its loading eects have been described in detail elsewhere [8]. If the bottom wall is stationary the only velocities present in the cavity are due to the thermal motions of the simulators and the particles will be uniformly distributed throughout the domain. Under these conditions with a uniform number of cells per processor the computation is as perfectly load balanced as possible. If the decomposition approaches minimal E c results will represent the peak parallel performance of the DSMC code. Figure 1(b) shows the speedup curve for calculations on the mesh shown in Figure 3(a) which has 10,000 cells and with 135,000 simulators, in which the bottom wall is stationary. All decompositions are static and approximately satisfy the conditions indicated above. The partitions are eectively a square mesh overlay, and are of the type shown in [8]. It is seen that the code scales well and compares favourably with the ideal linear speedup curve. The drop o at higher processors observed is due to load imbalance and communications costs, but the curve could be scaled by increasing the number of simulators and hence total computational load. Figure 2(a) shows fractions of load imbalance and number of simulators relative to the total number, versus processor numbers for the calculations which gave the speedup curve in Figure 1(b). It is surprising to observe that even in this \perfectly" balanced case there are still load imbalances of over 20 % for some numbers of processors. This relatively large level of load imbalance can be explained by uctuations in the number of simulators per processor. As the number of processors increases the number of particles per processor falls and the statistical scatter in these numbers has more eect on the load balance. This serves to illustrate the fact that it is pointless trying to load balance a

6 6 DSMC computation to within a ne tolerance. When the bottom wall is allowed to move chronic load imbalances occur with uniform Fraction Speedup Imbalance Particles per Node DSMC+ DSMC - static Ideal DSMC Number of Processors Number of Processors Figure 2. (a) Load Imbalance and Particles per Node Fractions (b) Speedup Comparisons static decompositions. Speedup curves are shown in Figure 2(b) in which the bottom wall is moving at approximately Mach 8. The DSMC calculation with a uniform static decomposition indicated by the circles shows fairly poor performance as might be expected, reaching a maximum speedup of 20 at 121 processors. The \ideal" DSMC curve shown in Figure 1(b) is also shown in Figure 2(b) for comparison. When the DSMC code is run with adder the parallel performance of the code pairing increases dramatically up to 36 processors but falls o thereafter. The code pairing is called DSMC+ (DSMC Parallel Load balanced Unstructured Solver). It can be seen that DSMC+ does not achieve the performance of the ideal case, although it is not too far o at the lower number of processors. Achieving the ideal performance however is an unrealistic goal since the cases with the moving wall possess substantially higher message passing overheads than those with the stationary wall as a result of the bulk gas velocities. An example of the nal domain decomposition obtained for the 25 processor case is shown in Figure 3(a). It can be seen that the lines of partition, shown in bold, have changed substantially from the initial uniform square mesh overlay. Note the reduced size of the sub-domain in the bottom right hand corner and the increased size of the sub-domains in the centre of the cavity reecting the high and low gas densities respectively. During the runs described adder was only invoked when the load imbalance was greater than 20 % and the SAR formula indicated that balancing should be done. The time interval for checking the timing statistics was ten timesteps. Up to 36 processors the load imbalance is kept at around 20 % by adder however past this number of processors the load imbalance grows. The reason that adder is unable to keep a check on the imbalance is that the weight within the borders of the sub-domains becomes too large and transferral of the entire border results in a large perturbation to the load on the processor. This

7 Average Time per Step (s) DSMC ADDER No. of Processors Figure 3. (a) Adapted Decomposition (b) Time per Step Comparison is illustrated by the fact that past 36 processors the domain is repartitioned at virtually every opportunity. Hence an instability develops in which borders are ipped back and forth between neighbouring sub-domains in an eort to balance the load. A very encouraging feature of these results is that although adder was called many times during the runs at higher processor numbers it accounted for a small fraction of the overall computational cost. This is illustrated in Figure 3(b) which shows a comparison of the real times per step of the DSMC code and adder. The run time for each code is of the same order, and the cost of adder does not increase signicantly with number of processors, showing it to be highly scalable. A further point is that most of the time taken up within adder is due to message passing [7]. This indicates that there is scope for inclusion of a more eective load balancing procedure without it having a deleterious eect on performance. A scheme suitable for DSMC load balancing is Song's iterative asynchronous procedure [11], and this will be implemented in the near future. 5. Conclusions DSMC is a exible computational technique capable of simulating a great variety of complex gas ows. A parallel DSMC tool has been described which utilises unstructured grids for geometric modelling exibility and has a modular structure enabling ease of software maintenance and extensibility. A domain decomposition technique is used and the program runs under the SPMD paradigm. In the case of a perfectly load balanced calculation the DSMC implementation shows good scalability. This is not the case for a general ow in which the load across the processor array becomes highly imbalanced due to the movement of simulators and hence load. A heuristic, diusive, hybrid graph-geometric, localised, concurrent scheme has been outlined for the purpose of adaptive domain decomposition in order to balance the load between the processors during run time. Results indicate that the method holds signi-

8 8 cant promise for greatly increasing the parallel scalability of the DSMC implementation. However the crude load balancing heuristics currently applied lead to an instability in the loading characteristic of the processor array. This can be alleviated by application of an exact load balancing scheme which will be implemented in future versions of the code. DSMC+ shows near optimal parallel scalability when the instability is not present. REFERENCES 1. G.A. Bird. Molecular Gas Dynamics and the Direct Simulation of Gas Flows. Oxford University Press, C.Walshaw, M. Cross and M.G. Everett. A Localised Algorithm for Optimising Unstructured Mesh Partitions. International Journal of Supercomputer Applications, 9(4):280{295, S. Dietrich and I. Boyd. Scalar and Parallel Optimised Implementation of the Direct Simulation Monte Carlo Method. J. Comp. Phys., 126:328{342, G. Karypis & V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. Technical Report TR , Computer Science Dept. University of Minnesota, Minneapolis MN55455, U.S.A., Available from ~ karypis. 5. M. Ivanov, G.Markelov, S. Taylor and J. Watts. Parallel DSMC strategies for 3D computations. In P.Schiano, A. Ecer, J. Periaux and N. Satofuka, editor, Parallel Computational Fluid Dynamics: Algorithms and Results using Advanced Computers, pages 485{492. Elsevier, D.M. Nicol and J.H. Saltz. Dynamic Remapping of Parallel Computations with Varying Resource Demands. IEEE Trans. Comput., 37((9)):1073{1087, C. D. Robinson. Particle Simulations on Parallel Computers with Dynamic Load Balancing. PhD thesis, Imperial College, London, Under preparation. 8. C.D. Robinson and J.K. Harvey. Adaptive Domain Decomposition for Unstructured Meshes Applied to the Direct Simulation Monte Carlo Method. In P.Schiano, A. Ecer, J. Periaux and N. Satofuka, editor, Parallel Computational Fluid Dynamics: Algorithms and Results using Advanced Computers, pages 469{476. Elsevier, C.D. Robinson and J.K. Harvey. A Parallel DSMC Implementation on Unstructured Meshes with Adaptive Domain Decomposition. In C. Shen, editor, Rareed Gas Dynamics. Peking University Press, In Press. Proceedings of the Twentieth International Symposium on Rareed Gas Dynamics C.D. Robinson and J.K. Harvey. Two Dimensional DSMC Calculations of the Rayleigh-Benard Instability. In C. Shen, editor, Rareed Gas Dynamics. Peking University Press, In Press. Proceedings of the Twentieth International Symposium on Rareed Gas Dynamics J. Song. A Partially Asynchronous and Iterative Algorithm for Distributed Load Balancing. Par. Comput., 4(2):15{25, D. Vanderstraeten and R. Keunings. Optimized Partitioning of Unstructured Finite Element Meshes. Intl. J. Num. Meth. Engng., 38(3):433{450, 1995.

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace

More information

An Object-Oriented Serial and Parallel DSMC Simulation Package

An Object-Oriented Serial and Parallel DSMC Simulation Package An Object-Oriented Serial and Parallel DSMC Simulation Package Hongli Liu and Chunpei Cai Department of Mechanical and Aerospace Engineering, New Mexico State University, Las Cruces, New Mexico, 88, USA

More information

Application of the MCMC Method for the Calibration of DSMC Parameters

Application of the MCMC Method for the Calibration of DSMC Parameters Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein Aerospace Engineering Dept., 1 University Station, C0600, The University of Texas at Austin,

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

Adaptive-Mesh-Refinement Pattern

Adaptive-Mesh-Refinement Pattern Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points

More information

EXPERIMENTS WITH REPARTITIONING AND LOAD BALANCING ADAPTIVE MESHES

EXPERIMENTS WITH REPARTITIONING AND LOAD BALANCING ADAPTIVE MESHES EXPERIMENTS WITH REPARTITIONING AND LOAD BALANCING ADAPTIVE MESHES RUPAK BISWAS AND LEONID OLIKER y Abstract. Mesh adaption is a powerful tool for ecient unstructured-grid computations but causes load

More information

Unstructured Grids. Abstract. Grid partitioning is the method of choice for decomposing a wide variety of computational

Unstructured Grids. Abstract. Grid partitioning is the method of choice for decomposing a wide variety of computational Parallel Algorithms for Dynamically Partitioning Unstructured Grids Pedro Diniz y Steve Plimpton z Bruce Hendrickson z Robert Leland z Abstract Grid partitioning is the method of choice for decomposing

More information

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck. To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science

More information

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of

More information

Continuum-Microscopic Models

Continuum-Microscopic Models Scientific Computing and Numerical Analysis Seminar October 1, 2010 Outline Heterogeneous Multiscale Method Adaptive Mesh ad Algorithm Refinement Equation-Free Method Incorporates two scales (length, time

More information

The JOSTLE executable user guide : Version 3.1

The JOSTLE executable user guide : Version 3.1 The JOSTLE executable user guide : Version 3.1 Chris Walshaw School of Computing & Mathematical Sciences, University of Greenwich, London, SE10 9LS, UK email: jostle@gre.ac.uk July 6, 2005 Contents 1 The

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

A COMPARISON OF PARALLEL ALGORITHMS FOR THE NUMERICAL SIMULATION OF MULTIPHASE FLOWS. E. Wassen, Th. Frank, Q. Yu

A COMPARISON OF PARALLEL ALGORITHMS FOR THE NUMERICAL SIMULATION OF MULTIPHASE FLOWS. E. Wassen, Th. Frank, Q. Yu The 1. Euro{Conference on Parallel and Distributed Computing for Computational Mechanics. April 6 { May 1st, 1997, Lochinver, Scotland, UK. A COMPARISON OF PARALLEL ALGORITHMS FOR THE NUMERICAL SIMULATION

More information

Fluent User Services Center

Fluent User Services Center Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume

More information

Week 3: MPI. Day 04 :: Domain decomposition, load balancing, hybrid particlemesh

Week 3: MPI. Day 04 :: Domain decomposition, load balancing, hybrid particlemesh Week 3: MPI Day 04 :: Domain decomposition, load balancing, hybrid particlemesh methods Domain decompositon Goals of parallel computing Solve a bigger problem Operate on more data (grid points, particles,

More information

Investigation of mixing chamber for experimental FGD reactor

Investigation of mixing chamber for experimental FGD reactor Investigation of mixing chamber for experimental FGD reactor Jan Novosád 1,a, Petra Danová 1 and Tomáš Vít 1 1 Department of Power Engineering Equipment, Faculty of Mechanical Engineering, Technical University

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

FOUR WHAT S NEW IN THIS VERSION? 4.1 FLOW-3D Usability CHAPTER

FOUR WHAT S NEW IN THIS VERSION? 4.1 FLOW-3D Usability CHAPTER CHAPTER FOUR WHAT S NEW IN THIS VERSION? FLOW-3D v11.2.0 continues to streamline engineers simulation workflows by enabling them to more quickly set up simulations, avoid common errors, identify and enter

More information

Estimation of Flow Field & Drag for Aerofoil Wing

Estimation of Flow Field & Drag for Aerofoil Wing Estimation of Flow Field & Drag for Aerofoil Wing Mahantesh. HM 1, Prof. Anand. SN 2 P.G. Student, Dept. of Mechanical Engineering, East Point College of Engineering, Bangalore, Karnataka, India 1 Associate

More information

Application of Finite Volume Method for Structural Analysis

Application of Finite Volume Method for Structural Analysis Application of Finite Volume Method for Structural Analysis Saeed-Reza Sabbagh-Yazdi and Milad Bayatlou Associate Professor, Civil Engineering Department of KNToosi University of Technology, PostGraduate

More information

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Objective: The objective of this laboratory is to introduce how to use FLUENT to solve both transient and natural convection problems.

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Partition Border Charge Update. Solve Field. Partition Border Force Update

Partition Border Charge Update. Solve Field. Partition Border Force Update Plasma Simulation on Networks of Workstations using the Bulk-Synchronous Parallel Model y Mohan V. Nibhanupudi Charles D. Norton Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic

More information

2 Rupert W. Ford and Michael O'Brien Parallelism can be naturally exploited at the level of rays as each ray can be calculated independently. Note, th

2 Rupert W. Ford and Michael O'Brien Parallelism can be naturally exploited at the level of rays as each ray can be calculated independently. Note, th A Load Balancing Routine for the NAG Parallel Library Rupert W. Ford 1 and Michael O'Brien 2 1 Centre for Novel Computing, Department of Computer Science, The University of Manchester, Manchester M13 9PL,

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Parallel Unstructured Mesh Generation by an Advancing Front Method

Parallel Unstructured Mesh Generation by an Advancing Front Method MASCOT04-IMACS/ISGG Workshop University of Florence, Italy Parallel Unstructured Mesh Generation by an Advancing Front Method Yasushi Ito, Alan M. Shih, Anil K. Erukala, and Bharat K. Soni Dept. of Mechanical

More information

Parallel Computing for Reacting Flows Using Adaptive Grid Refinement

Parallel Computing for Reacting Flows Using Adaptive Grid Refinement Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03054-5 Parallel Computing for Reacting Flows Using Adaptive Grid Refinement Robbert L. Verweij, Aris Twerda, and Tim W.J. Peeters 1. Introduction

More information

Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0

Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 PARMETIS Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 George Karypis and Kirk Schloegel University of Minnesota, Department of Computer Science and Engineering Minneapolis,

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

On Partitioning Dynamic Adaptive Grid Hierarchies. Manish Parashar and James C. Browne. University of Texas at Austin

On Partitioning Dynamic Adaptive Grid Hierarchies. Manish Parashar and James C. Browne. University of Texas at Austin On Partitioning Dynamic Adaptive Grid Hierarchies Manish Parashar and James C. Browne Department of Computer Sciences University of Texas at Austin fparashar, browneg@cs.utexas.edu (To be presented at

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

DYNAMIC DOMAIN DECOMPOSITION AND LOAD BALANCING IN PARALLEL SIMULATION OF FINITE/DISCRETE ELEMENTS

DYNAMIC DOMAIN DECOMPOSITION AND LOAD BALANCING IN PARALLEL SIMULATION OF FINITE/DISCRETE ELEMENTS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 c ECCOMAS DYNAMIC DOMAIN DECOMPOSITION AND LOAD BALANCING IN PARALLEL SIMULATION

More information

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz Compiler and Runtime Support for Programming in Adaptive Parallel Environments 1 Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz UMIACS and Dept. of Computer Science University

More information

Domain Decomposition for Colloid Clusters. Pedro Fernando Gómez Fernández

Domain Decomposition for Colloid Clusters. Pedro Fernando Gómez Fernández Domain Decomposition for Colloid Clusters Pedro Fernando Gómez Fernández MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2004 Authorship declaration I, Pedro Fernando

More information

Load Balancing in Individual-Based Spatial Applications.

Load Balancing in Individual-Based Spatial Applications. Load Balancing in Individual-Based Spatial Applications Fehmina Merchant, Lubomir F. Bic, and Michael B. Dillencourt Department of Information and Computer Science University of California, Irvine Email:

More information

COPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges.

COPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges. Chapter 1 Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges Manish Parashar and Xiaolin Li 1.1 MOTIVATION The exponential growth in computing, networking,

More information

Compressible Flow in a Nozzle

Compressible Flow in a Nozzle SPC 407 Supersonic & Hypersonic Fluid Dynamics Ansys Fluent Tutorial 1 Compressible Flow in a Nozzle Ahmed M Nagib Elmekawy, PhD, P.E. Problem Specification Consider air flowing at high-speed through a

More information

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large

More information

Parallel Algorithm Design

Parallel Algorithm Design Chapter Parallel Algorithm Design Debugging is twice as hard as writing the code in the rst place. Therefore, if you write the code as cleverly as possible, you are, by denition, not smart enough to debug

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Kirk Schloegel, George Karypis, and Vipin Kumar Army HPC Research Center Department of Computer Science and Engineering University

More information

Dynamic Load Distributions for Adaptive Computations on MIMD Machines using Hybrid Genetic Algorithms (a subset)

Dynamic Load Distributions for Adaptive Computations on MIMD Machines using Hybrid Genetic Algorithms (a subset) Dynamic Load Distributions for Adaptive Computations on MIMD Machines using Hybrid Genetic Algorithms (a subset) TIMOTHY H. KAISER PhD, Computer Science - UNM MS, Electrical Engineering - (Applied Physics)

More information

ANSYS AIM Tutorial Compressible Flow in a Nozzle

ANSYS AIM Tutorial Compressible Flow in a Nozzle ANSYS AIM Tutorial Compressible Flow in a Nozzle Author(s): Sebastian Vecchi Created using ANSYS AIM 18.1 Problem Specification Pre-Analysis & Start Up Pre-Analysis Start-Up Geometry Import Geometry Mesh

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Iterative Mesh-based Computation. Mesh Adaptation. Compute a New Partitioning. Redistribute Data

Iterative Mesh-based Computation. Mesh Adaptation. Compute a New Partitioning. Redistribute Data A Unied Algorithm for Load-balancing Adaptive Scientic Simulations Kirk Schloegel and George Karypis and Vipin Kumar ( kirk, karypis, kumar ) @ cs.umn.edu Army HPC Research Center Department of Computer

More information

TAU mesh deformation. Thomas Gerhold

TAU mesh deformation. Thomas Gerhold TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD

More information

Modeling External Compressible Flow

Modeling External Compressible Flow Tutorial 3. Modeling External Compressible Flow Introduction The purpose of this tutorial is to compute the turbulent flow past a transonic airfoil at a nonzero angle of attack. You will use the Spalart-Allmaras

More information

Handling Parallelisation in OpenFOAM

Handling Parallelisation in OpenFOAM Handling Parallelisation in OpenFOAM Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1 Parallelisation

More information

Novel Method to Generate and Optimize Reticulated Structures of a Non Convex Conception Domain

Novel Method to Generate and Optimize Reticulated Structures of a Non Convex Conception Domain , pp. 17-26 http://dx.doi.org/10.14257/ijseia.2017.11.2.03 Novel Method to Generate and Optimize Reticulated Structures of a Non Convex Conception Domain Zineb Bialleten 1*, Raddouane Chiheb 2, Abdellatif

More information

Introduction to C omputational F luid Dynamics. D. Murrin

Introduction to C omputational F luid Dynamics. D. Murrin Introduction to C omputational F luid Dynamics D. Murrin Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat transfer, mass transfer, chemical reactions, and related phenomena

More information

A Portable Parallel N-body Solver 3. Abstract. We present parallel solutions for direct and fast n-body solvers written in the ZPL

A Portable Parallel N-body Solver 3. Abstract. We present parallel solutions for direct and fast n-body solvers written in the ZPL A Portable Parallel N-body Solver 3 E Christopher Lewis y Calvin Lin y Lawrence Snyder y George Turkiyyah z Abstract We present parallel solutions for direct and fast n-body solvers written in the ZPL

More information

Small Matrices fit into cache. Large Matrices do not fit into cache. Performance (MFLOPS) Performance (MFLOPS) bcsstk20 blckhole e05r0000 watson5

Small Matrices fit into cache. Large Matrices do not fit into cache. Performance (MFLOPS) Performance (MFLOPS) bcsstk20 blckhole e05r0000 watson5 On Improving the Performance of Sparse Matrix-Vector Multiplication James B. White, III P. Sadayappan Ohio Supercomputer Center Ohio State University Columbus, OH 43221 Columbus, OH 4321 Abstract We analyze

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Message Passing Interface (MPI)

Message Passing Interface (MPI) What the course is: An introduction to parallel systems and their implementation using MPI A presentation of all the basic functions and types you are likely to need in MPI A collection of examples What

More information

High performance computing using AUTODYN-3D

High performance computing using AUTODYN-3D High performance computing using AUTODYN-3D M. S. Cowler', O. La'adan\ T. Ohta^ ' Century Dynamics Incorporated, USA. Hebrew University ofjerusalem, Israel. * CRC Research Institute Incorporated, Japan.

More information

IMPROVED SIF CALCULATION IN RIVETED PANEL TYPE STRUCTURES USING NUMERICAL SIMULATION

IMPROVED SIF CALCULATION IN RIVETED PANEL TYPE STRUCTURES USING NUMERICAL SIMULATION 26 th ICAF Symposium Montréal, 1 3 June 2011 IMPROVED SIF CALCULATION IN RIVETED PANEL TYPE STRUCTURES USING NUMERICAL SIMULATION S.C. Mellings 1, J.M.W. Baynham 1 and T.J. Curtin 2 1 C.M.BEASY, Southampton,

More information

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved.

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved. Workshop 2 Transonic Flow Over a NACA 0012 Airfoil. Introduction to CFX WS2-1 Goals The purpose of this tutorial is to introduce the user to modelling flow in high speed external aerodynamic applications.

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Parallel Computing. Parallel Algorithm Design

Parallel Computing. Parallel Algorithm Design Parallel Computing Parallel Algorithm Design Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels

More information

Parallel dynamic graph-partitioning for unstructured meshes

Parallel dynamic graph-partitioning for unstructured meshes Parallel dynamic graph-partitioning for unstructured meshes C. Walshaw, M. Cross and M. G. Everett Centre for Numerical Modelling and Process Analysis, University of Greenwich, London, SE18 6PF, UK. email:

More information

Supersonic Flow Over a Wedge

Supersonic Flow Over a Wedge SPC 407 Supersonic & Hypersonic Fluid Dynamics Ansys Fluent Tutorial 2 Supersonic Flow Over a Wedge Ahmed M Nagib Elmekawy, PhD, P.E. Problem Specification A uniform supersonic stream encounters a wedge

More information

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time

More information

Department of Electrical Engineering, Keio University Hiyoshi Kouhoku-ku Yokohama 223, Japan

Department of Electrical Engineering, Keio University Hiyoshi Kouhoku-ku Yokohama 223, Japan Shape Modeling from Multiple View Images Using GAs Satoshi KIRIHARA and Hideo SAITO Department of Electrical Engineering, Keio University 3-14-1 Hiyoshi Kouhoku-ku Yokohama 223, Japan TEL +81-45-563-1141

More information

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM)

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) Computational Methods and Experimental Measurements XVII 235 Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) K. Rehman Department of Mechanical Engineering,

More information

Faculty of Mechanical and Manufacturing Engineering, University Tun Hussein Onn Malaysia (UTHM), Parit Raja, Batu Pahat, Johor, Malaysia

Faculty of Mechanical and Manufacturing Engineering, University Tun Hussein Onn Malaysia (UTHM), Parit Raja, Batu Pahat, Johor, Malaysia Applied Mechanics and Materials Vol. 393 (2013) pp 305-310 (2013) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/amm.393.305 The Implementation of Cell-Centred Finite Volume Method

More information

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Technical Report of ADVENTURE Project ADV-99-1 (1999) PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Hiroyuki TAKUBO and Shinobu YOSHIMURA School of Engineering University

More information

Introduction to ANSYS CFX

Introduction to ANSYS CFX Workshop 03 Fluid flow around the NACA0012 Airfoil 16.0 Release Introduction to ANSYS CFX 2015 ANSYS, Inc. March 13, 2015 1 Release 16.0 Workshop Description: The flow simulated is an external aerodynamics

More information

A High Performance Sparse Cholesky Factorization Algorithm For. University of Minnesota. Abstract

A High Performance Sparse Cholesky Factorization Algorithm For. University of Minnesota. Abstract A High Performance Sparse holesky Factorization Algorithm For Scalable Parallel omputers George Karypis and Vipin Kumar Department of omputer Science University of Minnesota Minneapolis, MN 55455 Technical

More information

Speed and Accuracy of CFD: Achieving Both Successfully ANSYS UK S.A.Silvester

Speed and Accuracy of CFD: Achieving Both Successfully ANSYS UK S.A.Silvester Speed and Accuracy of CFD: Achieving Both Successfully ANSYS UK S.A.Silvester 2010 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Content ANSYS CFD Introduction ANSYS, the company Simulation

More information

Shape optimisation using breakthrough technologies

Shape optimisation using breakthrough technologies Shape optimisation using breakthrough technologies Compiled by Mike Slack Ansys Technical Services 2010 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Introduction Shape optimisation technologies

More information

COMPUTATIONAL AND EXPERIMENTAL INTERFEROMETRIC ANALYSIS OF A CONE-CYLINDER-FLARE BODY. Abstract. I. Introduction

COMPUTATIONAL AND EXPERIMENTAL INTERFEROMETRIC ANALYSIS OF A CONE-CYLINDER-FLARE BODY. Abstract. I. Introduction COMPUTATIONAL AND EXPERIMENTAL INTERFEROMETRIC ANALYSIS OF A CONE-CYLINDER-FLARE BODY John R. Cipolla 709 West Homeway Loop, Citrus Springs FL 34434 Abstract A series of computational fluid dynamic (CFD)

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

Array Decompositions for Nonuniform Computational Environments

Array Decompositions for Nonuniform Computational Environments Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 996 Array Decompositions for Nonuniform

More information

Challenges and recent progress in developing numerical methods for multi-material ALE Hydrocodes

Challenges and recent progress in developing numerical methods for multi-material ALE Hydrocodes Challenges and recent progress in developing numerical methods for multi-material ALE Hydrocodes ICFD 25 year Anniversary Conference 15-16 th September 2008 Oxford University Andrew Barlow AWE Introduction

More information

Parallelization study of a VOF/Navier-Stokes model for 3D unstructured staggered meshes

Parallelization study of a VOF/Navier-Stokes model for 3D unstructured staggered meshes Parallelization study of a VOF/Navier-Stokes model for 3D unstructured staggered meshes L. Jofre, O. Lehmkuhl, R. Borrell, J. Castro and A. Oliva Corresponding author: cttc@cttc.upc.edu Centre Tecnològic

More information

CFD Post-Processing of Rampressor Rotor Compressor

CFD Post-Processing of Rampressor Rotor Compressor Gas Turbine Industrial Fellowship Program 2006 CFD Post-Processing of Rampressor Rotor Compressor Curtis Memory, Brigham Young niversity Ramgen Power Systems Mentor: Rob Steele I. Introduction Recent movements

More information

A New Approach to Modeling Physical Systems: Discrete Event Simulations of Grid-based Models

A New Approach to Modeling Physical Systems: Discrete Event Simulations of Grid-based Models A New Approach to Modeling Physical Systems: Discrete Event Simulations of Grid-based Models H. Karimabadi 1, Y. Omelchenko 1, J. Driscoll 1, N. Omidi 1, R. Fujimoto 2, S. Pande 2, and K. S. Perumalla

More information

Preliminary Spray Cooling Simulations Using a Full-Cone Water Spray

Preliminary Spray Cooling Simulations Using a Full-Cone Water Spray 39th Dayton-Cincinnati Aerospace Sciences Symposium Preliminary Spray Cooling Simulations Using a Full-Cone Water Spray Murat Dinc Prof. Donald D. Gray (advisor), Prof. John M. Kuhlman, Nicholas L. Hillen,

More information

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS 14 th European Conference on Mixing Warszawa, 10-13 September 2012 LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS Felix Muggli a, Laurent Chatagny a, Jonas Lätt b a Sulzer Markets & Technology

More information

A hybrid slurry CFD model: Euler-Euler to Euler-Lagrange

A hybrid slurry CFD model: Euler-Euler to Euler-Lagrange 5th-6th December 2016 A hybrid slurry CFD model: Euler-Euler to Euler-Lagrange Alasdair Mackenzie Weir Advanced Research Centre, University of Strathclyde, Glasgow, Scotland Outline Background, context

More information

Iterative Mesh-based Computation. Mesh Adaptation. Compute a New Partitioning. Redistribute Data

Iterative Mesh-based Computation. Mesh Adaptation. Compute a New Partitioning. Redistribute Data A Unied Algorithm for Load-balancing Adaptive Scientic Simulations Kirk Schloegel and George Karypis and Vipin Kumar ( kirk, karypis, kumar ) @ cs.umn.edu Army HPC Research Center Department of Computer

More information

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k TEMPORAL STABILITY IN SEQUENCE SEGMENTATION USING THE WATERSHED ALGORITHM FERRAN MARQU ES Dept. of Signal Theory and Communications Universitat Politecnica de Catalunya Campus Nord - Modulo D5 C/ Gran

More information

Three dimensional meshless point generation technique for complex geometry

Three dimensional meshless point generation technique for complex geometry Three dimensional meshless point generation technique for complex geometry *Jae-Sang Rhee 1), Jinyoung Huh 2), Kyu Hong Kim 3), Suk Young Jung 4) 1),2) Department of Mechanical & Aerospace Engineering,

More information

Scalable Dynamic Adaptive Simulations with ParFUM

Scalable Dynamic Adaptive Simulations with ParFUM Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University of Illinois at Urbana-Champaign The Big Picture

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D

A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D 3 rd International Symposium on Impact Engineering 98, 7-9 December 1998, Singapore A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D M.

More information

MOL Solvers for Hyperbolic PDEs with Source Terms. I. Ahmad and M. Berzins

MOL Solvers for Hyperbolic PDEs with Source Terms. I. Ahmad and M. Berzins MOL Solvers for Hyperbolic PDEs with Source Terms. I. Ahmad and M. Berzins School of Computer Studies, The University of Leeds, Leeds LS2 9JT, UK. Abstract A method-of-lines solution solution algorithm

More information

PARALLEL DSMC SOLUTION OF THREE-DIMENSIONAL FLOW OVER A FINITE FLAT PLATE

PARALLEL DSMC SOLUTION OF THREE-DIMENSIONAL FLOW OVER A FINITE FLAT PLATE PARALLEL DSMC SOLUTION OF THREE-DIMENSIONAL FLOW OVER A FINITE FLAT PLATE Robert P. Nance * North Carolina State University, Raleigh, North Carolina Richard G. Wilmoth NASA Langley Research Center, Hampton,

More information

Introduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich

Introduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich Introduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich 1 Computational Fluid dynamics Computational fluid dynamics (CFD) is the analysis of systems involving fluid flow, heat

More information

CFD exercise. Regular domain decomposition

CFD exercise. Regular domain decomposition CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Scalability Study of Particle Method with Dynamic Load Balancing

Scalability Study of Particle Method with Dynamic Load Balancing Scalability Study of Particle Method with Dynamic Load Balancing Hailong Teng Livermore Software Technology Corp. Abstract We introduce an efficient load-balancing algorithm for particle method (Particle

More information

A nodal based evolutionary structural optimisation algorithm

A nodal based evolutionary structural optimisation algorithm Computer Aided Optimum Design in Engineering IX 55 A dal based evolutionary structural optimisation algorithm Y.-M. Chen 1, A. J. Keane 2 & C. Hsiao 1 1 ational Space Program Office (SPO), Taiwan 2 Computational

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following:

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Tutorial 22. Modeling Solidification Introduction This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Define a

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES

A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES Khalil Bensassi, and Herman Deconinck Von Karman Institute for Fluid Dynamics Aeronautics & Aerospace

More information