A THREE-DIMENSIONAL CARTESIAN TREE-CODE AND APPLICATIONS TO VORTEX SHEET ROLL-UP

Size: px

Start display at page:

Download "A THREE-DIMENSIONAL CARTESIAN TREE-CODE AND APPLICATIONS TO VORTEX SHEET ROLL-UP"

Bathsheba Lamb
5 years ago
Views:

1 A THREE-DIMENSIONAL CARTESIAN TREE-CODE AND APPLICATIONS TO VORTEX SHEET ROLL-UP by Keith Lindsay A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mathematics) in The University of Michigan 1997 Doctoral Committee: Professor Robert Krasny, Chair Assistant Professor Peter Smereka Associate Professor Grétar Tryggvason Professor Arthur Wasserman Professor Michael Weinstein

4 This thesis is dedicated to the memory of Bruce Lindsay. I miss you and think of you often. ii

5 ACKNOWLEDGEMENTS There are a few people I would like to thank for their support while I have worked on this thesis. I would first like to thank my advisor Robert Krasny. With his guidance, I have learned a great deal about fluid dynamics and numerical analysis. Without his assistance, this thesis would not have been possible. I am grateful for all that he has taught me and I look forward to working with him in the future. I would also like to thank the other members of my dissertaion committee, Peter Smereka, Grétar Tryggvason, Arthur Wasserman, and Michael Weinstein for their thoughtful comments and suggestions. I extend a special thank you to Judy Florian for all of the support that she has given me. I love you very much. iii

6 TABLE OF CONTENTS DEDICATION ACKNOWLEDGEMENTS ii iii LIST OF FIGURES vi LIST OF TABLES viii LIST OF APPENDICES ix CHAPTER 1. INTRODUCTION Overview Contributions of the Thesis FLUID DYNAMICS Governing Equations Vortex Sheets Parametrization Desingularization Discretization Vortex Rings Formation Stability Interactions FAST METHODS FOR PARTICLE SIMULATIONS Mesh Codes Tree Codes Particle-Cluster Interactions Tree Construction Recurrences for Taylor Coefficients Error Analysis of Particle-Cluster Interactions Full Description of the Algorithm Complexity Analysis ALGORITHM VALIDATION AND PERFORMANCE Convergence of Vortex Method Selection of Runtime Parameters iv

7 4.3 Algorithm Performance APPLICATIONS Vortex Ring with Azimuthal Perturbation Elliptical Vortex Ring Colliding Vortex Rings CONCLUSIONS Summary Directions for Future Work APPENDICES BIBLIOGRAPHY v

8 LIST OF FIGURES Figure 2.1 A vortex sheet modeling parallel shear flow Vortex lines and circulation. λ 1, λ 2 : Lagrangian parameters, y 0 : reference point, y : point on surface, C : curve for circulation integral Discretization of parameter space and a circular disk. λ 1, λ 2 : Lagrangian parameters. λ 1 is a radial parameter and λ 2 is a parameter around the disk Particle insertion along a vortex line. given data ( ), new particle ( ) Vortex line insertion. given data ( ), new particle ( ) Propagating vortex ring Cylindrical coordinates and basis vectors Dispersion relation. sign(ω 2 ) ω vs. k. R = 1, δ = 0.18, 0.15, 0.12, 0.09, Going left to right, the peaks correspond to decreasing δ Colliding vortex rings Particle-cluster interaction. x : target particle, y j : particle in cluster, τ : cell, ỹ : center of τ Subdivision of space for random points. (a) Nested subdivision of space. (b) Associated tree structure Subdivision of space for points on a spiral. (a) Nested subdivision of space. (b) Associated tree structure Computing Taylor coefficients for two-dimensional example. ( ) : previous step, ( x) : current step, ( ) : future step Profile of rolling up vortex sheet. t = 1, δ = Profile of rolling up vortex sheet. t = 4, δ = 0.10, t = 0.05, ɛ 1 = 0.15, 0.10, 0.05, ɛ 2 = Execution time (sec.) vs. N 0. p max = 6 ( ), 8 ( ), 10 ( ) Memory usage (MB) vs. N 0. p max = 6 ( ), 8 ( ), 10 ( ) vi

9 4.5 Execution time (sec.) vs. N. p max = 8. tol = 10 2 ( ), 10 3 ( ), 10 4 ( ). direct summation ( ). actual data (o), projected data (x). (a) Execution time, (b) Direct summation time / fast algorithm time Memory usage (MB) vs. N. p max = 8. fast algorithm ( ), direct summation ( ). actual data (o), projected data (x). (a) Memory usage, (b) Fast algorithm memory usage / direct summation memory usage Actual error vs. specified tolerance. p max = 8, N 0 = 500, N = 6284, 12708, 25572, 38444, potential error bound ( ), velocity error bound ( ) Execution time (sec.) vs. actual error. p max = 8, N 0 = 500, N = 6284, 12708, 25572, 38444, Connected lines are tol = 10 2, 10 3, potential error bound ( ), velocity error bound ( ) Variance of perturbed vortex sheet. δ = 0.10, ρ = k : wavenumber of perturbation, t : time Perturbed vortex sheet. k = 5. δ = 0.10, t = 0, 2, 4, Perturbed vortex sheet. k = 9. δ = 0.10, t = 0, 2, 4, Core of perturbed vortex sheet. k = 5, 9. δ = 0.10, t = 0, 2, 4, Elliptical vortex sheet. a = 0.8. δ = 0.10, t = 0, 2, 4, Elliptical vortex sheet. a = 0.6. δ = 0.10, t = 0, 2, 4, Elliptical vortex sheet. a = 0.5. δ = 0.10, t = 0, 2, 4, Vortex sheets modeling colliding disks. δ = 0.10, t = 0, 1, 2, 3, 4, Cut-away of vortex sheets modeling colliding disks. δ = 0.10, t = 0, 1, 2, 3, 4, Vorticity isosurfaces of colliding vortex rings, perspective view. δ = 0.10, t = 0, 1, 2, 3, 4, Vorticity isosurfaces of colliding vortex rings, front view. δ = 0.10, t = 0, 1, 2, 3, 4, Vorticity isosurfaces of colliding vortex rings, side view. δ = 0.10, t = 0, 1, 2, 3, 4, Vorticity isosurfaces of colliding vortex rings, top view. δ = 0.10, t = 0, 1, 2, 3, 4, vii

10 LIST OF TABLES Table 4.1 Machine characteristics Maximum point position differences for circular sheet. t = 1, δ = 0.10, e( t) = max i x i ( t) x i ( t/2) viii

11 LIST OF APPENDICES Appendix A. Notation B. Cylindrical Coordinate Identities C. Details from Circular Filament Analysis C.1 Propagation Speed of Circular Filament C.2 Linearized Evolution Equations for Perturbation ix

12 CHAPTER 1 INTRODUCTION 1.1 Overview This thesis presents an algorithm for the rapid computation of three-dimensional vortex sheet motion. A vortex sheet is a material surface in the fluid across which the tangential component of fluid velocity has a jump discontinuity. They are frequently used as an asymptotic model for parallel shear flow. In our study of vortex sheets, the governing equations are taken in a Lagrangian form. When these equations are discretized, a large system of ordinary differential equations results. Referring to the discretization elements as particles, this system of equations is an N-body problem, a collection of N particles with pairwise interactions. In an N-body problem, it is necessary to evaluate sums of the form N K δ (x i, x j ) w j, i = 1,..., N, (1.1) j=1 where x i, x j are particle positions, w j is a vector-valued weight associated with the jth particle, and δ is a smoothing parameter. Computing the sums in (1.1) directly, which is referred to as direct summation, requires O(N 2 ) operations. In our simulations, N takes on values up to 10 6, so it is not practical to perform direct summation. The algorithm presented in this thesis evaluates the above sums to a specified tolerance with O(N log N) operations. It extends the work of Draghicescu and Draghicescu [21], who studied two-dimensional vortex sheet dynamics, to the three-dimensional case. There are three main ingredients for the efficiency of the algorithm: particle-cluster interactions, a tree-based nested subdivision of space to construct particle clusters, and adaptive strategies. In this thesis, the algorithm is used to study vortex ring dynamics with a vortex sheet model. The layout of the thesis is as follows. Chapter 2 gives an overview of the fluid dynamics relevant to our work. Vortex sheets are discussed and an overview of vortex rings is presented. Chapter 3 presents the new algorithm. It is described in detail 1

13 and is related to previous work. Chapter 4 presents a validation of the algorithm, analyzing its convergence, accuracy, and speed-up. Results are presented for the test case of axisymmetric vortex ring roll-up. Chapter 5 presents simulations for perturbed vortex rings, elliptical vortex rings, and the collision and reconnection of two vortex rings, a configuration based on experiments performed by Schatzle [55]. Chapter 6 gives a summary and discusses possible extensions to the work. Appendix A contains a table of the notation, Appendix B lists identities related to cylindrical basis vectors and Appendix C presents details from the circular filament analysis which is performed in Section Contributions of the Thesis The thesis makes three main contributions. First, the algorithm generalizes previously developed particle simulation algorithms. The main differences between the kernel K δ used here and the ones previously used are that K δ is not harmonic and it is a function of three variables. Second, we introduce new forms of adaptivity into the tree-based subdivision of space. This ensures that the algorithm s execution time will be small compared to direct summation for a variety of particle distributions. Third, we apply the algorithm to a three-dimensional smoothed vortex sheet model to study the dynamics of vortex rings. We show that the model allows vorticity isosurfaces to reconnect, even though the material surfaces do not.

14 CHAPTER 2 FLUID DYNAMICS In this chapter, we present an overview of the fluid dynamics relevant to our work. In Section 1 we introduce the basic equations of fluid motion. Section 2 contains a discussion of vortex sheets, including their applications, how we parametrize them, the behavior they exhibit, and our numerical method for studying them. In Section 3 we introduce vortex rings, as an application of vortex sheet roll-up, and describe some issues that we are interested in studying such as stability and interactions. 2.1 Governing Equations The motion of incompressible homogeneous (i.e. constant density) fluid is governed by the Navier-Stokes equations u t + (u )u = p + ν u, (2.1) u = 0, (2.2) where u(x, t) is the fluid velocity at position x and time t, p(x, t) is the fluid pressure and ν is the viscosity. Equation (2.1) is the momentum equation, a statement of Newton s second law that mass times acceleration is equal to force. Equation (2.2) is the continuity equation, representing conservation of mass and incompressibility. As described by Batchelor [6], in many flows the effect of viscosity is significant only in a small region of the fluid, for example in boundary layers or thin shear layers. Away from these regions, the fluid behaves as if it were inviscid. Furthermore, as demonstrated experimentally by Brown and Roshko for a turbulent mixing layer [9], the large scale features of the flow do not change for large Reynolds numbers, which may be considered the inverse of viscosity for our present purposes. So to understand the dynamics in these portions of the flow, it is useful to study the inviscid limit ν 0 3

15 of the Navier-Stokes equations. This yields the Euler equations u t + (u )u = p, (2.3) u = 0. (2.4) In this thesis, we are considering vortex sheets, a particular type of weak solution to the Euler equations. As mentioned in Chapter 1, a vortex sheet is a surface in the fluid across which the tangential component of fluid velocity has a jump discontinuity. When analyzing weak solutions of differential equations, one difficulty that may arise is a lack of uniqueness of solutions. Thus, one must choose from among the possible solutions the one that is physically significant. We view the vortex sheet as the zero viscosity limit of smooth solutions to the Navier-Stokes equations. Delort [17] proved that the two-dimensional Euler equations with vortex sheet initial data possess global weak solutions if the vorticity is of one sign. Majda [41] extended the proof to show that in the inviscid limit, solutions of the Navier-Stokes equations, with vortex sheet initial data having vorticity of one sign, converge to weak solutions of the Euler equations. It is not known if these results extend to more general vortex sheet configurations, much less to three dimensions. Uniqueness of solutions is also not known. Discussions of these and other analytical aspects of vortex sheets are given by Majda [40] and Caflisch [10]. In the next section, we describe vortex sheets in more detail. The above forms of the Navier-Stokes and Euler equations, in terms of velocity and pressure, are known as primitive variable formulations. An alternative form is in terms of the vorticity, ω = u, which measures rotation within the fluid. Taking the curl of the Euler equation (2.3), we obtain ω t + (u )ω = (ω )u. (2.5) One advantage of this form is that the pressure has been removed. To close the system of equations, the velocity is recovered from the vorticity via the Biot-Savart integral u(x, t) = 1 (x y) ω(y, t) dy. (2.6) 4π R x y 3 3 Equation (2.5) describes how the vorticity evolves in time and can be used together with (2.6) to form a numerical method to solve the Euler equations. Another evolution equation for the vorticity can be obtained in terms of the flow map Φ(x, t), which denotes the position of the fluid particle at time t that was initially at position x at time t = 0. The equations defining Φ are Φ (x, t) = u(φ(x, t), t) t (2.7a)

16 Figure 2.1: A vortex sheet modeling parallel shear flow. Φ(x, 0) = x. (2.7b) The evolution equation for ω in terms of Φ is ω(φ(x, t), t) = Φ(x, t) ω(x, 0). (2.8) Equations (2.6), (2.7) and (2.8) form a closed system which is the basis of the numerical method used in this thesis to study the Euler equations. In the remainder of this chapter we discuss vortex sheets and then present an overview of vortex rings. 2.2 Vortex Sheets As mentioned above, a vortex sheet is a surface in the fluid across which the tangential component of the fluid velocity has a jump discontinuity. Away from the surface, the fluid is assumed to be irrotational, which means that the vorticity is zero. However, since the velocity has a jump discontinuity across the surface, the vorticity is a δ-function there. One common application of vortex sheets is as a model for parallel shear flow in which the transition region between two streams of fluid is thin, as depicted in Figure 2.1. In this situation, the sheet evolves according to the velocity given by the Biot-Savart integral (2.6) and the sheet is called a free vortex sheet. Another application, described by Lamb [36], is to model the movement of a solid body through irrotational inviscid fluid by placing a vortex sheet on the body s boundary. In this case, the sheet is called a bound vortex sheet, since it is bound to the body s surface. Our application of vortex sheets, described in the next section, is a model of the formation process of a vortex ring. A method of generating a vortex ring is to place a solid circular disk in a fluid, give it an impulse along its axis and then dissolve the disk away. This process can be modeled by considering a bound vortex sheet on the solid disk. When the disk is dissolved away, a free vortex sheet remains in the fluid and rolls up into a vortex ring. It is this free sheet that is represented in our computations. To compute the induced velocity of a vortex sheet, it is necessary to consider the Biot-Savart integral (2.6) in the case where the vorticity is a δ-function on a surface. Before proceeding, we introduce some additional notation. Away from the

17 vortex sheet, the fluid is irrotational, so a velocity potential φ exists. Thus, for x not on the sheet, u(x, t) = φ(x, t). The limit of the fluid velocity exists as the sheet is approached from either side. Choosing an orientation for the sheet, let u + and u denote the one-sided limits of u. Similarly, let φ + and φ denote the one-sided limits of φ. The jumps in u and φ across the sheet are denoted [u] = u + u and [φ] = φ + φ respectively. The jump in velocity is tangential to the sheet, so we have n [u] = 0, where n denotes a unit vector normal to the sheet. One can show that the curl of a velocity field which has a tangential jump discontinuity across a surface and is otherwise irrotational is a surface δ-function with vector-valued strength ω = n [u] = n [ φ]. (2.9) Although it is a slight abuse of notation and terminology, we will refer to this vectorvalued strength as the vorticity itself. A consequence of this relationship is that the vorticity ω is parallel to the surface and perpendicular to [ φ], a result we will use later. The Biot-Savart integral is interpreted with this singular vorticity, leading to the following surface integral for the induced velocity : u(x, t) = K(x, y) ω(y, t) ds y, (2.10) where S is the sheet, x is a point not on the sheet, S K(x, y) = 1 x y (2.11) 4π x y 3 is the Biot-Savart kernel, ω(y, t) is given by (2.9) and ds y is the area element of S at y. For x on the sheet, the integral in (2.10) is interpreted as a principal value integral, because it diverges otherwise. In the next subsection, we describe the Lagrangian parametrization of the vortex sheet which is the basis of our numerical method. Then we discuss the singular behavior that vortex sheets exhibit, which leads us to desingularize their motion in order to obtain a tractable model. Finally, the discretization of the equations is described Parametrization For computations, it is advantageous to use a Lagrangian parametrization of the vortex sheet. We do this by representing the sheet as a collection of vortex lines, parametrizing across them with circulation. The Lagrangian parametrization was presented by Caflisch [10] and Kaneda [28]. The sheet s position is denoted y(λ 1, λ 2, t), where λ 1 and λ 2 are Lagrangian parameters. The induced velocity field at a point x

18 Figure 2.2: Vortex lines and circulation. λ 1, λ 2 : Lagrangian parameters, y 0 : reference point, y : point on surface, C : curve for circulation integral. on the sheet is u(x, t) = PV K(x, y(λ 1, λ 2, t)) ω(λ 1, λ 2, t) y y λ 1 λ 2 dλ 1 dλ 2, (2.12) where the PV denotes the principal value integral. Caflisch [10] and Kaneda [28] showed that the jump [φ(y(λ 1, λ 2, t))] is independent of time. The demonstration was based on the fact that the fluid pressure is continuous across the vortex sheet, which follows from conservation of momentum. Thus, we may write φ J (λ 1, λ 2 ) = [φ(y(λ 1, λ 2, t))]. (2.13) Then, using (2.9) and some algebraic manipulations, they derived the identity ω(λ 1, λ 2, t) y y λ 1 λ 2 = φ J y φ J y. (2.14) λ 1 λ 2 λ 2 λ 1 The specific choice of λ 1 and λ 2 is made to simplify the right-hand side of this equation. We choose λ 1 to be the circulation between a fixed reference point on the sheet and other points on the sheet and λ 2 to be a parameter along curves of constant circulation, as shown in Figure 2.2. We describe λ 2 first, in terms of vorticity. Vortex lines are integral curves of the vorticity. Geometrically, they are curves which are parallel to the vorticity field ω. We choose λ 2 so that at time t = 0, λ 2 is a parameter along vortex lines, ensuring that y λ 2 is parallel to ω(λ 1, λ 2 ). It follows that φ J λ 2 = λ 2 ( φ + (y, 0) φ (y, 0) ) (2.15) = φ + (y, 0) y φ (y, 0) y λ 2 λ 2 (2.16) = [ φ(y, 0)] y = 0, λ 2 (2.17)

19 where the last equality is due to the fact that is parallel to ω and [ φ] is perpendicular to ω, as seen from (2.9). For such a choice of λ 2, the Biot-Savart integral (2.12) reduces to u(x, t) = PV K(x, y(λ 1, λ 2, t)) y (λ 1, λ 2, t) φ J dλ 1 dλ 2. (2.18) λ 2 λ 1 For our vortex ring application, the vortex lines are closed curves. We choose λ 2 to range from 0 to 2π, so the vortex lines are 2π-periodic functions of λ 2. In the computations, λ 2 is chosen at t = 0 to be a linear rescaling of arclength. Note that this linear relationship does not hold for t > 0, because the vortex lines stretch non-uniformly as the sheet evolves. As mentioned above, λ 1 is chosen to be the circulation between a fixed reference point on the sheet and other points on the sheet. We fix a material point y 0 on the sheet. For any point y on the sheet, λ 1 (y) is the circulation λ 1 (y) = u ds, (2.19) where C is a closed curve meeting the sheet at y 0 and y, and ds is a line element of arclength, as shown in Figure 2.2. Kelvin s circulation theorem states that the circulation around a set of vortex lines moving with the flow does not change in time, ensuring that λ 1 is a Lagrangian parameter. It follows from the definition that λ 1 = φ J + c, where c is a constant which depends only on the reference point y 0. Thus, φ J λ 1 = 1 and the Biot-Savart integral (2.18) reduces to u(x, t) = PV C y λ 2 K(x, y(λ 1, λ 2, t)) y λ 2 (λ 1, λ 2, t) dλ 1 dλ 2. (2.20) This parametrization and the resulting form of the Biot-Savart integral is a generalization to three dimensions of the Birkhoff-Rott equation for the motion of a vortex sheet in two dimensions [8]. The circulation distribution for a vortex sheet depends on the initial condition of the specific problem being studied. We will describe it later when we discuss the application to vortex rings Desingularization Vortex sheets exhibit behavior that smooth shear layers do not. For example, vortex sheet instabilities have arbitrarily large growth rates, the sheets form curvature singularities [46], and they roll up into infinite spirals [48]. These features make the study of vortex sheets difficult both theoretically and numerically. As an example of the numerical difficulties, consider the motion of a flat vortex sheet. When a small amplitude perturbation is introduced to the sheet, it is amplified at a rate proportional to the spatial wavenumber of the perturbation. This

20 is known as Kelvin-Helmholtz instability. In a numerical simulation, roundoff error introduces a perturbation to the sheet whose wavenumber is inversely proportional to the spacing of the points representing the sheet. Thus, when the computational mesh is refined, the wavenumber of the round-off error perturbation increases, the perturbation is amplified more rapidly and the computations become inaccurate. One technique to overcome this, introduced by Krasny [34], is to filter the sheet s position at each time step. With this technique, it is possible to extend computations to longer times. However, the sheet still develops singularities in finite time. After the singularity forms, it is not possible to use the filter and round-off error grows, overwhelming the computations. Another technique, first proposed by Chorin and Bernard [15], is to desingularize the Biot-Savart kernel K(x, y). We follow this approach, using a desingularization analogous to the one used by Krasny [33] for two-dimensional vortex sheet roll-up. Our smoothed three-dimensional Biot-Savart kernel is K δ (x, y) = 1 x y, (2.21) 4π ( x y 2 + δ 2 ) 3/2 where δ > 0 is the smoothing parameter. We replace the singular Biot-Savart integral (2.20) with u(x, t) = K δ (x, y(λ 1, λ 2, t)) y (λ 1, λ 2, t) dλ 1 dλ 2. (2.22) λ 2 This kernel was first introduced by Rosenhead [51] in the study of vortex dynamics in the wake behind a cylinder. It is related to the Plummer potential which is used in astrophysics to model the distribution of matter in a galaxy. Note that as δ 0, K δ K. The introduction of δ smoothes the kernel and removes its singularity at the origin. This makes it unnecessary to treat the integral in (2.22) as a principal value integral. A consequence of the smoothing is that the kernel is no longer harmonic, which is one of the main reasons for developing the new fast computational method to be described in Chapter 3. Note that it is not possible to desingularize the kernel in such a way that the result is bounded and harmonic, which follows from the maximum principle. The strategy for computing vortex sheet roll-up is to solve the smoothed equation for fixed δ > 0 and to investigate the behavior of these solutions as δ 0. This is analogous to finding weak solutions of the Euler equations by taking the zero viscosity limit of smooth solutions of the Navier-Stokes equations. This analogy is supported by the work of Tryggvason, Dahm and Sbieh [59] who performed computations for the δ 0 limit of a two-dimensional vortex sheet and the ν 0 limit of a corresponding Navier-Stokes computation. They found that the large scale features of a δ > 0

21 Figure 2.3: Discretization of parameter space and a circular disk. λ 1, λ 2 : Lagrangian parameters. λ 1 is a radial parameter and λ 2 is a parameter around the disk. computation agree well with the features of a ν > 0 computation, and that the δ 0 limit and the ν 0 limit coincide. Liu and Xin [39] have shown that the δ 0 limit of solutions to the two-dimensional vortex-blob equations is a weak solution of the Euler equations when the vorticity is of one sign. It is not known if such results hold in three dimensions Discretization In this subsection, we describe how the sheet s position y(λ 1, λ 2, t) and velocity (2.22) are discretized. The assumptions made about the parametrization are that λ 1 measures circulation across the vortex lines and 0 λ 2 2π parametrizes along the vortex lines. We discretize the parameter space as shown in Figure 2.3. We first discretize λ 1 with a uniform grid. Each λ 1 value corresponds to a vortex line which is then discretized in λ 2 with a grid that is uniform with respect to arc-length in physical space (at t = 0). Note that there are more points on longer vortex lines, which leads to λ 1 and λ 2 being treated asymmetrically. This is done to ensure spatial resolution and accuracy of partial derivative computations along the vortex lines. With these points x i (t), we discretize the Biot-Savart integral (2.22) first in λ 1 with the trapezoid rule and then in λ 2, also with the trapezoid rule. The y λ 2 term in the integrand is approximated with a 2nd order centered difference. This results in a system of ordinary differential equations N dx i dt = K δ (x i, x j ) w j, (2.23) j=1

22 particles for cubic interpolant new particle Figure 2.4: Particle insertion along a vortex line. given data ( ), new particle ( ). where x i (t), x j (t) are points on the sheet and w j = D λ2 (x j ) λ 1 λ 2 (2.24) is the product of the finite difference D λ2 along a vortex line and the integration weights λ 1 and λ 2. The integration weights are adjusted appropriately at the λ boundaries for the trapezoid rules. From here on, we refer to the x j as particles. The system of differential equations (2.23) is solved with a 4th order Runge-Kutta method. Computing the right-hand side of (2.23) by direct summation requires O(N 2 ) operations, where N is the number of particles discretizing the vortex sheet. In Chapter 3, we present an algorithm which computes the sums in (2.23) more rapidly, to within a specified tolerance. As the sheet evolves, the vortex lines can individually stretch and can also separate from each other. This causes a loss of resolution which is overcome by inserting new particles along the lines and by inserting new lines. The first case corresponds to refining in λ 2 for fixed λ 1 and the second case corresponds to refining in λ 1 globally in λ 2. The procedure for inserting a new point along a vortex line is depicted in Figure 2.4. The λ 2 coordinate of the new particle is set to be the average of the separated particles coordinates. The position of the new particle is computed with a cubic polynomial in λ 2 which interpolates the positions of the four particles surrounding the new particle, two on each side. The procedure for adding a new vortex line when adjacent lines become separated is analogous, and is depicted in Figure 2.5. The λ 1 coordinate of the new line is the average of the separated lines coordinates. The particle positions on the new line are generated as follows. The first step is to select the λ 2 values where the particles will be placed. We do this by simply choosing the λ 2 values of an adjacent line. The reasoning behind this is that once refinement along vortex lines takes place, the λ 2 values on the adjacent lines yield good spatial resolution along the new line. To compute a particle position for each of these λ 2 values, we first generate a corresponding λ 2 particle position on each of the surrounding four vortex lines, two

23 Figure 2.5: Vortex line insertion. given data ( ), new particle ( ). on each side. If one of these lines does not have a particle at that λ 2 value, then one is generated with a cubic interpolant as described above for particle insertion along a line. The λ 2 particle position on the new vortex line is then computed by interpolating these four particle positions with a cubic polynomial in λ 1. In our computations we make a change of variable λ 1 = λ 1 (α). This will ensure accuracy by placing more vortex lines in regions where the circulation is varying rapidly. Since λ 1 is a Lagrangian parameter, α is one as well. With this change of variable, the smoothed velocity induced by the sheet is u(x, t) = K δ (x, y(α, λ 2, t)) y λ 2 (α, λ 2, t) λ 1 (α) dα dλ 2. (2.25) In the computations, λ 1 (α) is computed analytically at t = 0. When new vortex lines are inserted, the values of λ 1(α) are obtained using a cubic interpolant of the values of λ 1(α) at the surrounding lines. 2.3 Vortex Rings A vortex ring is a flow in which vorticity is concentrated and directed around a torus, as depicted in Figure 2.6. The vorticity distribution causes the fluid to rotate around the torus and the ring propagates. In this thesis, we use a desingularized vortex sheet model to investigate vortex ring dynamics. In particular, we are interested in the formation process, stability properties, and interactions between rings. We review each of these topics in the following subsections.

24 vorticity velocity propagation Figure 2.6: Propagating vortex ring Formation There are various methods of creating a vortex ring, each having advantages and disadvantages for the experimentalist or numerical analyst. One technique, described by Thomson and Newall [58], is to release a drop of colored liquid into a container of water. As the drop falls through the water, it rolls up around the edges and forms a descending vortex ring. This experiment can be performed with a simple apparatus, but it is difficult to simulate numerically due to the collision of the fluid boundaries and the subsequent change in topology. Another method commonly used is to eject fluid from a circular nozzle. A shear layer separates at the opening and rolls up into a vortex ring. As described in Shariff and Leonard s review [56], this process can be modeled using slug flow or self-similar vortex sheet roll-up. Another model, presented by Nitsche and Krasny [47], involves the roll-up of an axisymmetric vortex sheet which is not assumed to be self-similar. In their numerical computations, the sheet was desingularized in a manner similar to the method described above and they modeled the shedding of circulation at the edge of the nozzle. Their results agreed well with experiments performed by Didden [20]. The vortex sheet model used in this thesis is an extension of their work to fully three-dimensional flow. A simple model for vortex ring formation described by Taylor [57], is to supply an impulse to a flat circular disk along its axis of symmetry and to then dissolve the disk. When the disk is given an impulse, the velocity field in the fluid is induced by a bound vortex sheet on the surface of the disk. When the disk is dissolved, the sheet remains in the fluid and rolls up into a vortex ring. This method is more of a thought exercise and is not practical for experiments, but it is the one on which

25 our computations are based. One reason for selecting this flow for our computations is the absence of solid boundaries. The circulation distribution on the initially flat sheet is given by λ 1 = 1 r 2, (2.26) where r is the distance from the center of the disk. The velocity field induced by this circulation distribution is balanced so that the disk propagates. Note that in Cartesian coordinates, the circulation has a square-root singularity at the boundary of the disk. This implies that the jump in velocity becomes infinite at the edge. When the smoothing effect of δ is introduced, the singularity is removed and the balance in velocity is lost, resulting in the disk rolling up into a ring. This effect occurs in physical flow, although the smoothing is due to viscosity. In terms of the vortex sheet parametrization, the disk is given by y(λ 1, λ 2 ) = ( 1 λ 2 1 cos λ 2, 1 λ 2 1 sin λ 2, 0), (2.27) where 0 λ 1 1 and 0 λ 2 2π. For our α reparametrization, we use λ 1 = cos α, which yields y(α, λ 2 ) = (sin α cos λ 2, sin α sin λ 2, 0), (2.28) where 0 α π/2 and 0 λ 2 2π. The λ 1(α) term which arises in (2.25) is given by λ 1 (α) = sin α Stability Since our model for vortex rings consists of a collection of circular vortex lines, we consider first the stability of a single circular vortex line, referred to as a vortex filament. So let y(λ, t) denote the position of a vortex filament in three dimensions, where λ is a Lagrangian parameter along the filament, 0 λ 2π and y(0, t) = y(2π, t). The filament evolves according to the equation y 1 (λ, t) = t 4π 2π 0 K δ (y(λ, t), y( λ, t)) y λ ( λ, t) d λ, (2.29) where K δ is given in (2.21). A propagating circular filament is a steady solution of (2.29) and we are interested in its stability properties. It is convenient to perform the analysis in cylindrical coordinates, so let (r, θ, z) be cylindrical coordinates, as shown in Figure 2.7. Also shown in the figure are the basis vectors e r (θ), e θ (θ), and e z associated with the point (r, θ, z). Identities pertaining to this basis are listed in Appendix B. Suppose that the filament at time t = 0 is given by y(λ, 0) = (R, λ, 0). (2.30)

26 7 8! #"%$&"%')( *,+.-!/0 1)243!56 Figure 2.7: Cylindrical coordinates and basis vectors. Substituting into (2.29), it can be shown that the filament propagates with velocity U = e 2π z 1 cos λ 4πR 0 (2(1 cos λ) ) 3/2 d λ. (2.31) + (δ/r) 2 The derivation is presented in Appendix C. It is worthwhile to point out the effect of the smoothing parameter δ. If δ were equal to zero, the filament velocity would be U = e 2π z 1 8πR 0 (2(1 cos λ)) d λ, (2.32) 1/2 which is a divergent integral. The integrand is positive, so considering the integral as a principal value integral will not result in a finite value. The interpretation of this equation is that a circular vortex line propagates with infinite velocity. The problem is that in an actual fluid, even when the vorticity is concentrated into a small region, the vorticity distribution does not have line delta functions. The desingularization that we use is one approach to overcome this difficulty, and was first introduced by Rosenhead [51] in the study of vortex dynamics in the wake behind a cylinder. Intuitively, the introduction of δ into the kernel spreads the vorticity associated with the vortex line over a region around the line with radius δ. It is this effect that leads us to call the lines filaments. Another approach to overcoming this difficulty is to cut off the integral in a small neighborhood of the point λ = 0, thereby removing

27 the singularity. This technique was used by Crow [16] and Moore [45] in their study of the stability properties of the vortex pair trailing from an airplane wing. We now analyze the linear stability of the propagating circular vortex filament y(λ, t) = (R, λ, Ut), (2.33) where U = 1 2π 1 cos λ 4πR 0 (2(1 cos λ) ) 3/2 d λ. (2.34) + (δ/r) 2 We introduce a perturbation p(λ, t) to the solution y(λ, t), which we write in terms of the cylindrical basis at (R, λ, Ut) p(λ, t) = p r (λ, t)e r (λ) + p θ (λ, t)e θ (λ) + p z (λ, t)e z. (2.35) We substitute y(λ, t) + p(λ, t) into (2.29), and obtain a system of integro-differential equations for the scalars p r (λ, t), p λ (λ, t), and p z (λ, t). Then we linearize these equations about p(λ, t) = 0, which is reasonable under the assumption that p(λ, t) is small in amplitude. The resulting linearized equations (C.18) appear in Appendix C for reference. If the equations are written in the abstract form p = L(p), then it t can be shown that using the Fourier basis for p(λ) diagonalizes the operator L. So we may restrict attention to a single mode of the Fourier expansion for p(λ). Thus, we fix an integer k and substitute the expression p(λ, t) = e ikλ+ωt (A r e r (λ) + A θ e θ (λ) + A z e z ) (2.36) into (C.18) and after some simplifications obtain the system of linear equations ω 0 I 1 A r 0 0 ω ii 2 A θ = 0, (2.37) I 3 0 ω 0 where the I j are the integrals I 1 = 1 4πR 2 2π 0 A z cos λ(1 cos k λ) k sin λ sin k λ (2(1 cos λ) + (δ/r) 2 ) 3/2 d λ, (2.38) I 2 = 1 2π k(1 cos λ) cos k λ sin λ sin k λ 4πR 2 0 (2(1 cos λ) ) 3/2 d λ, (2.39) + (δ/r) 2 I 3 = 1 2π k sin λ sin k λ + 2 cos k λ cos λ(1 + cos k λ) 4πR 2 0 (2(1 cos λ) ) 3/2 + (δ/r) 2 3 (1 cos λ) 2 (1 + cos k λ) (2(1 cos λ) ) 5/2 d λ. + (δ/r) 2 (2.40)

28 The unbalanced form of (2.37) is due to the fact that the integral which would have appeared in the (3, 2) entry of the matrix is zero. There are non-zero solutions of the form (2.36) only if the matrix in (2.37) is singular, which is true only if the determinant of the matrix is zero, Thus, we have a solution only if 0 = ω(ω 2 I 1 I 3 ). (2.41) ω = 0 or ω 2 = I 1 I 3. (2.42) This relationship between k and ω, for fixed R and δ, is called a dispersion relation. For each of these values of ω, there is a corresponding (A r, A θ, A z ) solution to (2.37). Note that the dispersion relation does not depend on I 2, though the solution (A r, A θ, A z ) does. The solution y(λ, t) is linearly stable or unstable with respect to the perturbation p(λ, t) according to whether the real part of ω is negative or positive respectively, and if the real part of ω is zero, then y(λ, t) is linearly neutrally stable with respect to the perturbation p(λ, t). From the definitions (2.38) and (2.40), we see that the product I 1 I 3 is real, so the stability of y(λ, t) depends upon the sign of I 1 I 3. If I 1 I 3 is negative, then y(λ, t) is linearly neutrally stable. However, if I 1 I 3 is positive, then there exist solutions p(λ, t) which grow and solutions p(λ, t) which decay, so y(λ, t) is unstable in general. An observation that can be made from the definitions of the integrals I 1 and I 3 is that for fixed k and δ/r, ω depends linearly on R 2. In particular, the sign of ω 2 will be independent of R. Thus, whether or not a filament is unstable with respect to a perturbation depends only on k and δ/r. Figure 2.8 contains a plot of sign(ω 2 ) ω as a function of k, for R = 1 and δ = 0.18, 0.15, 0.12, 0.09, The sign term multiplying ω is chosen so that positive and negative values correspond to unstable and neutrally stable modes respectively. The values of the integrals were computed numerically with Maple. For values of k larger than those depicted, sign(ω 2 ) ω continues to decrease, leveling off at a value which depends upon δ and R. For a given R and δ, ω 2 depends on k in the following qualitative manner. For k = 0 and 1, ω 2 = 0. As k increases from 1, ω 2 first decreases and then increases, following a parabolic shaped curve. After reaching a local maximum, ω 2 then decreases, eventually leveling off. For some values of δ/r, the value of ω at the local maximum is positive, and for others it is not. Recall that the filament is unstable when the peak is positive and is neutrally stable otherwise. More extensive computations than those depicted in the figure do not reveal an obvious pattern for when the mode at this peak is unstable. Also, for some values, such as δ/r = 0.18 and 0.15, there is more than one k value for which ω 2 is positive. As δ decreases, the

29 1 0 1 sign(ω 2 ) ω k Figure 2.8: Dispersion relation. sign(ω 2 ) ω vs. k. R = 1, δ = 0.18, 0.15, 0.12, 0.09, Going left to right, the peaks correspond to decreasing δ.

30 wavenumber where the peak is located increases, and more extensive computations suggest that the wavenumber grows like O(δ 1 ) as δ 0. The linear stability analysis assumes that the perturbation to the filament is small compared to δ, the nominal size of the filament s core. However, the unstable modes, when they exist, have a wavenumber proportional to δ 1, which implies that these modes have spatial oscillations with wavelengths on the order of the core size. Thus, as these oscillations grow, they quickly leave the realm where the linear stability analysis is valid. So it is not clear how to interpret the results physically. These results are qualitatively similar to those of Widnall and Sullivan s [62] study of vortex ring stability. They used a thin filament approximation as a model for the vortex ring and overcame the divergence of the Biot-Savart integral (2.29) by using an integral cut-off and an asymptotic matching procedure to choose the location of the cut-off. They found that for certain intervals of core sizes, there is a narrow band of modes which are unstable. Rings with core size between these intervals are neutrally stable. As the core size decreases, the band of unstable modes narrows. The wavenumber that the band is centered around grows like a 1, where a is the size of the core. They compared their theoretical predictions with experimental results and found a fair agreement for their prediction for the wavenumber of the unstable mode and good agreement for the amplification rate. One obstacle to generalizing the vortex filament stability analysis to a vortex ring is that the core structure of the ring is not generally known. Thus, one needs to provide a model for the core structure. For instance, in their work mentioned above, Widnall and Sullivan [62] used a constant core radius model and a constant local volume model. Widnall, Bliss, and Tsai [61] modeled the vorticity in the core both as being constant and having a continuous quartic profile across the core, peaked at the center and zero at the boundary. These later models were better able to predict the wavenumber of the unstable mode than the model used in [62]. Saffman [52], using a vorticity distribution which includes viscous effects, was able to predict the unstable wavenumber found in the experiments of Krutzsch [35] and Maxworthy [42, 43]. Another model for the core vorticity distribution is a scaling of the third-order Gaussian exp( r 3 ), which was used in simulations by Knio and Ghoniem [32]. In their study, they modeled the vortex ring as a collection of smooth vortex filaments, as we do. The principle differences between their model and ours is the initial placement of the filaments, the smooth kernel that is used, and the discretization. Their filaments are initialized to form a solid torus and the filament strengths are chosen to approximate the vorticity distribution. They smooth the Biot-Savart kernel by convolving it with a third-order Gaussian. Their computational results agree well with the analytical predictions of Widnall, Bliss and Tsai [61].

31 Figure 2.9: Colliding vortex rings. Another technique for generating a vorticity distribution in the ring s core is to numerically solve the differential equations for an exactly propagating ring. The technique was used by Lifschitz, Suters, and Beale [38] in their study of the stability of axisymmetric vortex rings with swirl. In their study, they compared growth rate predictions from short wavelength asymptotics with computations using a vortex filament model of the ring. In their computations, the Biot-Savart kernel was smoothed by convolving it with a sixth degree piecewise polynomial having compact support. Their computations agreed reasonably well with the analytical predictions, the computational growth rates being consistently 1/3 to 1/2 the predicted maximum growth rates Interactions The type of vortex ring interaction that we are interested in is the collision depicted in Figure 2.9, a configuration studied experimentally by Schatzle [55]. The resulting collision exhibits vortex ring merger and has been studied experimentally, theoretically and numerically. Near the collision, oppositely oriented vortex filaments collide and merge. Our interest in the ring configuration is to find out if the vortex sheet model for vortex rings can capture such complex dynamics, despite the various simplifying assumptions built into the model. The regions of the rings which approach closely contain oppositely oriented vorticity. Saffman [53] proposed a model to describe the dynamics of vortex reconnection. As the rings meet, viscosity causes the opposite vorticity to cancel. This decrease in vorticity causes that region of each ring to stretch away from the point of contact, due to a local increase in pressure. Thus, the fluid is pushed away from the region

32 of contact and it appears that the rings have connected. Saffman [53] modeled this process and the predictions for time scales and strain rates agreed reasonably well with Schatzle s experiments [55]. Various researchers have studied the vortex reconnection problem numerically. Anderson and Greengard [1] used a Lagrangian method and discretized the rings as a collection of vortex filaments. They smoothed the Biot- Savart kernel by convolving it with a characteristic function and used a constant core vorticity model for the rings. They were able to compute the early stages of the ring merger and their results agree qualitatively with Schatzle s experiments. Numerical simulations performed by Aref and Zawadzki [4] and Winckelmans [63] reproduced well the vortex ring collision and reconnection. Using a Eulerian-Lagrangian vortexin-cell code, they reproduced the ring merger and subsequent reconnection into two new rings, which begin to pinch off. Kida, Takaoka and Hussain [30, 31], using an Eulerian spectral method to study the vortex ring merger problem, were able to compute to later times in the sequence. However, in their computations, the rings remain connected after the reconnection, which conflicts with experimental observations of ring separation. This disparity was attributed to the fact that the experimental results are visualized with passive scalar transport, which is different from vorticity transport. The experiments do not necessarily show where the vorticity is large, since it may be amplified by the vortex stretching term in the Navier-Stokes equations. One reason for interest in this ring configuration related to singularity formation in solutions to the Euler equations. As the rings begin to collide, the oppositely oriented vortex filaments that approach each other begin to stretch. In an inviscid flow, this stretching intensifies the vorticity, which is a process that plays an important part in singularity formation. For instance, interacting vortex tube computations by Pumir and Kerr [49] show significant distortion in the core of the colliding rings and vortex filaments computations by Pumir and Siggia [50] for other configurations inidicate the possibility of singularity formation.

33 CHAPTER 3 FAST METHODS FOR PARTICLE SIMULATIONS As mentioned previously, evaluating the sums in (2.23) by direct summation, a technique also referred to as the particle-particle (PP) method, requires O(N 2 ) operations. For large values of N, the time required to perform these operations is excessively large. Two approaches that have been developed in the past to overcome this difficulty are mesh and tree codes. They achieve their efficiency by computing approximations to the exact particle interactions. This is in contrast to algorithms such as the fast Fourier transform, which achieve efficiency by taking advantage of exact algebraic manipulations. Thus, performance is not the only issue to consider when examining mesh and tree codes, for the execution time typically depends on the desired accuracy. A brief description of mesh codes is given before proceeding to tree codes, the approach that this thesis follows. 3.1 Mesh Codes Efficiency is gained in a mesh code by using the fact that elliptic equations can be solved rapidly on meshes. This is done either by using iterative methods such as successive overrelaxation, conjugate gradient, or multigrid, or direct methods based on cyclic reduction or the fast Fourier transform. A comprehensive reference for this material is the book by Hockney and Eastwood [27]. Though mesh codes apply to more general settings, I will describe them as applied to problems in astrophysics. In this setting, the quantities in a simulation are star positions x i (t) and masses m i. The acceleration of the ith star due to the gravitational influence of the other stars is given by a i = G N j=1, j i m j K(x i, x j ), (3.1) 22

34 where K(x i, x j ) = x i x j x i x j 3. (3.2) Define the mass density ρ and gravitational potential Φ by ρ = N m j δ(x x j ), (3.3) j=1 N Φ(x) = G m j φ(x x j ), (3.4) j=1 where φ(z) = z 1. Then from the identities K(x i, x j ) = φ(x i x j ), (3.5) it follows that 2 φ(z) = 4πδ(z), (3.6) 2 Φ(x) = 4πGρ(x), (3.7) a i = Φ(x i ). (3.8) The particle-mesh (PM) method superimposes a fixed mesh over the particles and uses the auxiliary functions ρ and Φ to compute a i as follows : 1. Assign a mass function ρ to the mesh from the x j and m i. 2. Solve a discretized form of the Poisson equation (3.7) on the mesh. 3. Use (3.8) to compute accelerations on the mesh. 4. Interpolate accelerations from the mesh to the star positions x j. There are various techniques for implementing each step mentioned above. The main drawback of this method is that the accuracy is determined by the mesh size. When the grid is refined to improve the accuracy, the execution time increases. An alternative to the PM method is the particle-particle/particle-mesh (P 3 M) method, which combines the PP and PM methods. Interactions between nearby particles are computed with the PP method, and the rest of the interactions are computed with the PM method. So the functions being approximated with the mesh are smoother, resulting in a smaller error than the PM method produces with the same mesh. A full discussion of these methods is beyond the scope of this thesis and the interested reader is directed to Hockney and Eastwood s book [27] for more details.

35 3.2 Tree Codes There are two main ingredients for achieving efficiency in a tree code, particlecluster interactions and a nested subdivision of space which is used to construct the particle clusters. A particle-cluster interaction is used to rapidly compute the influence of a particle cluster on a single target particle. This is done by approximating the cumulative influence of the particles in the cluster on the target particle with a simplified expression. Once a preprocessing phase is performed, the expression can be evaluated for multiple target particles with an operation count independent of the number of particles in the cluster. We will see below that particle-cluster interactions are only performed for particles and clusters which are separated from each other. Thus, the approximation used is referred to as a far-field approximation. The nested subdivision of space, which is used to construct the particle clusters, has a natural tree structure. The objective behind the subdivision of space is to generate particle-cluster interactions in which the particle is far from the cluster, relative to the cluster s size. The combination of these two ingredients leads to an algorithm whose asymptotic operation count is O(N log N). Two early examples of tree code algorithms are due to Appel [3] and Barnes and Hut [5], who used the algorithms for problems in astrophysics. In these algorithms, particle-cluster interactions were performed by approximating the cluster as a single particle located at the cluster s center of mass. A drawback of this approximation is that it has limited accuracy. The Fast Multipole Method of Greengard and Rokhlin [24, 25] overcame this obstacle by using a series expansion to approximate particle-cluster interactions to any specified tolerance. They also introduced clustercluster interactions by expanding the far-field approximation into a local near-field expansion for rapid evaluation at multiple target points. The series expansions used in [24, 25] are Laurent series in two space dimensions and spherical harmonic expansions in three dimensions. Van Dommelen and Rundensteiner [60] employed a similar series approach to study two-dimensional fluid flow around a cylinder which was modeled with point vortices and a random walk simulation of diffusion effects. They used a Laurent series to approximate particle-cluster interactions, but they did not use cluster-cluster interactions. This simplifies the algorithm and results in smaller memory requirements, though for similar error tolerances, their ratio of improvement in execution time versus direct summation is less than Greengard and Rokhlin s two-dimensional results. Another tree code for two- and three-dimensional problems, due to Anderson [2] does not use series expansions. Instead, the approximations for particle-cluster and cluster-cluster interactions are based on the Poisson integral formula for the solution of Laplace s equation in the interior of a circle or

36 sphere. For two-dimensional problems, the ratio of improvement in execution time for Anderson s algorithm is between Van Dommelen and Rundensteiner s and Greengard and Rokhlin s. All of these expansions and approximations are appropriate when the interaction kernel is harmonic, such as the Newtonian potential of electrostatic and gravitational interactions, but they are unsuitable for non-harmonic kernels, such as the kernel K δ under consideration in this thesis. This is because they rely on the harmonicity of the kernel to ensure convergence. An expansion using Cartesian Taylor series, an idea first proposed by Zhao [65], can be used to overcome this constraint. Zhao used Taylor series for simulations with the Newtonian potential, a harmonic function, in three dimensions. The motivation was to generalize Greengard and Rokhlin s [24] two-dimensional complex Taylor series expansion to the three-dimensional setting. The first application of this expansion to particle simulations with a non-harmonic kernel was by Draghicescu and Draghicescu [21], who computed the evolution of a desingularized vortex sheet in two space dimensions. An important contribution of their work is the introduction of recurrences to rapidly compute the expansion coefficients. One contribution of this thesis is to generalize this approach to the threedimensional vortex blob kernel. Our algorithm and the algorithm of Draghicescu and Draghicescu s are like van Dommelen and Rundensteiner s [60], in that they do not use cluster-cluster interactions. The reasoning for this is that converting a far-field expansion into a near-field expansion for Taylor series is a time consuming procedure, requiring O(p 3 ) and O(p 4 ) operations in two and three dimensions respectively, where p is the order of the Taylor series being used. A recent development concerning this issue is a new version of the Fast Multipole Method for the Newtonian potential by Greengard and Rokhlin [26]. They speed up the far-field to near-field conversion by using an intermediate step of converting the expansion into plane wave expansions using Bessel functions. It is a matter for future work to see if this technique can be extended to non-harmonic kernels. Salmon and Warren [54] discussed using Taylor series for the Newtonian potential and the non-harmonic Plummer potential, although their recurrences and expansions were used only for the Newtonian potential. Using low-order methods and error bounds, they were able to improve upon the performance of previous low-order method of Barnes and Hut [5]. Winckelmans et. al. [64], improving upon the error estimates of Salmon and Warren [54], were able to simulate the vortex wake behind an accelerated airfoil using a vortex method with a cut-off Gaussian smoothed Biot-Savart kernel. Before going into more detail, we first briefly describe the overall structure of our tree code. The algorithm has two stages, the construction of the tree and the

37 computation of the particle velocities. The tree construction involves the recursive subdivision of space to form the nested particle clusters, and the computation of cluster parameters which are used for particle-cluster interactions. Particle velocities are computed using a combination of particle-cluster interactions and particle-particle interactions. The decision for where these interactions are performed is based upon tolerance conditions and execution time considerations. The influence of a particle cluster on a target particle is computed with a particle-cluster interaction only when a tolerance condition is satisfied and when doing so takes less time than performing individual particle-particle interactions. Otherwise, either the cluster acts on the target particle with particle-particle interactions, or the computation descends another level into the tree and considers interactions between the cluster s subclusters and the target particle. This process continues until either particle-cluster interactions are performed or the leaves of the tree are reached, in which case particle-particle interactions are performed. Other algorithms which have a similar structure as ours have been shown to require O(N log N) operations. We present numerical results in the next chapter which show that our algorithm s execution time also grows like O(N log N). The layout for the rest of this chapter is as follows. Section 3 describes particlecluster interactions where the approximation is based on Taylor series. The factors which determine the efficiency of such an approximation are discussed. This motivates the nested subdivision of space described in Section 4. Section 5 describes a method based on recurrences for computing the far-field expansion coefficients. Section 6 presents the error analysis upon which the adaptive order selection is based. Section 7 gives a full description of the algorithm. Section 8 discusses the execution time and memory requirements of the algorithm. 3.3 Particle-Cluster Interactions Consider a particle-cluster interaction between a target particle x and a collection of particles y j, where j = 1... N τ. The y j are referred to as a particle cluster and the region of space containing them is referred to as a cell and is denoted τ. This situation is depicted in Figure 3.1 in two space dimensions. When the particle-cluster interaction is performed, the cumulative influence of the y j on x is replaced with a truncated series expansion. From (2.23), the influence of the y j on x is N τ j=1 K δ (x, y j ) w j. (3.9) We expand K δ (x, y j ) in a Taylor series in the second argument about a point ỹ, the center of τ. For the moment, we will not specify where ỹ is located except to say that

38 : ;=< >? 9 Figure 3.1: Particle-cluster interaction. x : target particle, y j : particle in cluster, τ : cell, ỹ : center of τ. two natural possibilities are : (1) the center of mass of the y j, (2) the geometrical center of τ when it is a rectangular box. Using multi-index notation, we have N τ j=1 N τ K δ (x, y j ) w j = K δ (x, ỹ + (y j ỹ)) w j j=1 ( ) N τ 1 = k! Dk y K δ(x, ỹ)(y j ỹ) k w j j=1 = k = k k 1 k! Dk y K δ(x, ỹ) a k (x, ỹ) b k (τ), ( Nτ ) (y j ỹ) k w j j=1 (3.10) where a k (x, ỹ) = 1 N τ k! Dk yk δ (x, ỹ), b k (τ) = (y j ỹ) k w j. (3.11) j=1 The a k (x, ỹ) are the Taylor coefficients of K δ (x, y) with respect to y about y = ỹ. The b k (τ) describe the distribution of particles in τ and are referred to as particle moments. Note that the Taylor coefficients a k (x, ỹ) are independent of the particles y j in the cell τ, and the particle moments b k (τ) are independent of the target particle x. So in a certain sense, the expansion is a separation of variables. Once the particle moments are computed for one particle-cluster interaction, they can be stored and used for subsequent particle-cluster interactions with different target particles. In practice, the influence of the y j on x is approximated by truncating the infinite series in (3.10), yielding a k (x, ỹ) b k (τ), (3.12) k <p

39 where k = k 1 +k 2 +k 3 and p is chosen to ensure that the error is less than a specified tolerance. The determination of p is described in Section 3.6. Let be the radius of the cluster about ỹ, and r τ = max y j ỹ (3.13) j R = ( x ỹ 2 + δ 2 ) 1/2 (3.14) be the regularized distance from x to the center of the cell. It is shown in Section 3.6 that the error incurred by using the truncation (3.12) is O(h p ), where h = r τ /R is the convergence factor of the expansion. Thus, the truncation is referred to as a pth order expansion for a particle-cluster interaction. A particle-cluster interaction is performed as follows : 1. Determine the minimum value of p which ensures that the series truncation error is less than the specified tolerance. 2. If the particle moments b k (τ) have not already been computed up to order p, then compute and store them. 3. Compute the Taylor coefficients a k (x, ỹ) for k < p. 4. Compute the sum in (3.12). Step 1 is based on error bounds described in Section 3.6 and can be performed with O(p max ) operations, where p max is the largest admissible value for p. Step 2 requires O(N τ p 3 ) operations if the particle moments have not already been computed. However, this is a one-time cost whose relative effect on the overall execution time diminishes as τ is used in more particle-cluster interactions. The exponent on p in this operation count is three because we are in three space dimensions. Step 3 can be performed with O(p 3 ) operations using a method based on recurrences that is described in Section 3.5. This is the best that can be expected, since there are O(p 3 ) coefficients to compute. The sum in step 4 has O(p 3 ) terms and can be computed with O(p 3 ) operations. Adding these operation counts, we see that a pth order particlecluster interaction requires O(p 3 ) operations, assuming that the particle moments have been computed. Using a particle-cluster interaction is not always advantageous. For instance, computing the influence of the y j on x by direct summation, i.e. (3.9), may require fewer operations than computing a pth order expansion, where p has been determined by accuracy constraints. In this situation, we do not use the expansion, opting either to use direct summation or to subdivide τ and consider particle-interactions between

40 x and the resulting subcells. This decision is based on the following considerations. Direct summation requires O(N τ ) operations, where N τ is the number of particles in τ, and using the expansion requires O(p 3 ) operations. So if direct summation requires fewer operations than the expansion, it may be loosely stated that either N τ is small or p is large. If N τ is small, then there are no alternatives to direct summation that will reduce the operation count. This is quantified by introducing a parameter N 0 and using direct summation when N τ < N 0. If N τ N 0 and p is such that using the expansion requires more operations than direct summation, then we subdivide τ and consider particle-cluster interactions with the resulting subcells. The motivation for this is that the expansions for the particle-cluster interactions with the subcells will require lower orders (smaller value of p) to satisfy the specified tolerance. This is because the convergence factors for the new expansions are at most 0.79h, where the 0.79 factor arises from the partial bisection algorithm described in the next section. So when the execution time required to perform direct summation is less than the time required to perform a particle-cluster interaction, direct summation is performed if N τ < N 0, and τ is subdivided if N τ N 0. The iterative application of this leads to the nested subdivision of space which is described in detail in the next section. In practice, the comparison between the time required to perform direct summation and the time required to perform a pth order particle-cluster expansion is done as follows. A stand-alone program was written which performs direct summation between a particle and a cluster. The program was run using clusters with varying numbers of particles N τ. The execution time was fit with a linear function of N τ. This linear function is used to estimate how long it takes to perform direct summation with a cluster that has an arbitrary number of particles. A similar program was written which performs particle-cluster expansions and the execution time of this program as a function of p was determined. These execution times are stored and a table lookup is used to determine how long an expansion takes. The comparison between execution times is made between the linear function of N τ and the stored execution time for a pth order expansion. 3.4 Tree Construction The strategy of subdividing cells when using an expansion for a particle-cluster interaction is used to compute the velocity of every particle. A consequence of this is that every cell containing particles will be recursively subdivided until the resulting subcells have fewer than N 0 particles. This operation of subdividing the cells can be done independently of the target particles, so it is advantageous to do it once, at the beginning of the velocity computations. The resulting collection of cells admits a

41 natural tree structure where nodes in the tree correspond to cells of the subdivision. A cell τ 2 is a child of a cell τ 1 if τ 2 was obtained by subdividing τ 1. The tree is constructed with the following recursive algorithm : 1. The collection of particles is enclosed with a rectangular box, which becomes the root cell of the tree and is denoted τ 0. Set the current cell to τ If the current cell τ contains fewer than N 0 particles then exit. The cell τ is a leaf of the tree. 3. Otherwise, subdivide τ into subcells and apply step 2 to each subcell. The resulting subcells become children of τ in the tree. There are two aspects of this tree construction that need further explanation, the choice of N 0 and the method by which the cells are subdivided. The choice of N 0 affects the performance of the algorithm in two ways. If N 0 is too small, then the tree will have many levels, leading to a large memory requirement. However, if N 0 is too large, then the tree consists of cells having large spatial dimensions, and this increases the order p needed in the expansion for particle-cluster interactions, thereby increasing execution time. Computational experiments were performed on a test case to determine an appropriate value of N 0. These tests are described in the next chapter. The subdivision of a cell τ is based upon τ s bounding box, the smallest rectangular box containing τ s particles whose sides are parallel to the coordinate axes. Let l be the longest edge of the bounding box. The bounding box is bisected in each direction in which its length is greater than l/ 2, yielding either 2, 4, or 8 subcells, and the particles are partitioned according to which subcell they are contained in. The subcells which contain particles become children of τ, and the subcells with no particles are discarded. The reason for bisecting the box only in the long directions is that bisecting in short directions does not significantly reduce the convergence factor of the particle-cluster expansion. This is because the convergence factor for the expansion is proportional to r τ, the radius of the cell. Note that when the subdivision process is applied recursively to the subcell, their bounding boxes depend only on their particles. Hence, the bounding boxes shrink to fit the particle distribution. The factor 1/ 2 was chosen to ensure that the child cell s aspect ratio, before shrinking, is closer to 1 than the parent cell s aspect ratio. Using a different factor was not found to improve the algorithm s performance significantly. A byproduct of this algorithm for constructing the tree is that every cell in the tree has a bounding box computed for it. We select ỹ, the base point of the Taylor series expansions, to be the center of the bounding box. Using this expansion point and shrinking the bounding boxes

42 yields an expansion point which is close to the particles. A consequence of this is that small values of p can be used for the expansion, which reduces execution time. Figures 3.2 and 3.3 depict the subdivisions and associated trees resulting from the application of this algorithm to a random collection of points and a sequence of points on a spiral. The rectangles shown are the bounding boxes for the points within them. The thickness of the rectangle borders are thinner for cells deeper in the tree. For these figures, N 0 was set to 20 for illustrative purposes, so cells with more than 20 particles were subdivided as described above. The spiral example demonstrates how the bounding boxes shrink to fit the particle distribution. Distributions like this occur in our computations, since we are dealing with two-dimensional surfaces embedded in R Recurrences for Taylor Coefficients For the algorithm to be computationally efficient, it is necessary to rapidly compute the a k (x, ỹ), the Taylor coefficients of K δ (x, y) defined in (3.11). A method for doing this is described here. Recall that the kernel K δ is given by K δ (x, y) = 1 x y. (3.15) 4π ( x y 2 + δ 2 ) 3/2 Our first observation is that the computation of the partial derivatives of K δ can be sped up using the fact that K δ (x, y) = ψ(x y), where ψ(z) = 1 4π ( z 2 + δ 2 ) 1/2. (3.16) Note that ψ is a regularized form of the fundamental solution to Laplace s equation in three dimensions. The Taylor coefficients of K δ are Defining the quantities we have the relationship a k (x, ỹ) = 1 k! Dk y K δ(x, ỹ) = ( 1) k D k ( ψ)(x ỹ). (3.17) k! c k (x, ỹ) = 1 k! Dk ψ(x ỹ), (3.18) a k = ( 1) k (k 1 + 1)c k1 +1,k 2,k 3 (k 2 + 1)c k1,k 2 +1,k 3 (k 3 + 1)c k1,k 2,k (3.19)

43 (a) (b) Figure 3.2: Subdivision of space for random points. (a) Nested subdivision of space. (b) Associated tree structure.

44 (a) (b) Figure 3.3: Subdivision of space for points on a spiral. (a) Nested subdivision of space. (b) Associated tree structure.

45 We compute the c k (x, ỹ) with recurrences and then use (3.19) to compute the a k (x, ỹ). Another relationship between a k (x, ỹ) and c k (x, ỹ) arises by considering them as functions of x with ỹ fixed a k (x, ỹ) = ( 1) k x c k (x, ỹ). (3.20) This equation will be useful for the error analysis in the next section. To simplify the presentation of the recurrences for the c k (x, ỹ), we first present recurrences for the Taylor coefficients of a one-dimensional analogue of ψ, ψ 1 (x) = (x 2 + δ 2 ) 1/2. Proposition 1. Fix x 0 R and let c k = ψ (k) 1 (x 0 )/k! be the kth order Taylor coefficient of ψ 1 (x) at x = x 0. Then the c k satisfy the recurrence (x δ 2 ) c k + 2x 0 (1 1 2k ) c k 1 + (1 1 k ) c k 2 = 0 (3.21) for k > 0, with the convention that c k = 0 for k < 0. Proof. First observe that ψ 1 satisfies the differential equation (x 2 + δ 2 ) ψ 1 (x) + x ψ 1(x) = 0. (3.22) Let k > 0. Differentiating (k 1) times, using the Leibniz rule for differentiating a product, we obtain (x 2 + δ 2 ) ψ (k) 1 (x) + (k 1) 2x ψ(k 1) 1 (x) + (k 1)(k 2) ψ (k 2) 1 (x) Substituting x = x 0 and grouping similar terms yields + x ψ (k 1) 1 (x) + (k 1) ψ (k 2) 1 (x) = 0. (3.23) (x δ2 ) ψ (k) 1 (x 0) + 2x 0 (k 1 2 ) ψ(k 1) 1 (x 0 ) + (k 1) 2 ψ (k 2) 1 (x 0 ) = 0. (3.24) The result (3.21) is obtained on dividing by k! and using the identities k 1/2 k! = 1 1/(2k), (k 1)! (k 1) 2 k! = 1 1/k (k 2)!. (3.25) The essential ingredient in the proof of Proposition 1 is the differential equation (3.22) that ψ 1 satisfies. The function ψ(z) of (3.16) whose Taylor coefficients we require, satisfies three differential equations which are analogous to (3.22) : ( z 2 + δ 2 ) ψ z 1 (z) + z 1 ψ(z) = 0, (3.26a)

46 ( z 2 + δ 2 ) ψ z 2 (z) + z 2 ψ(z) = 0, (3.26b) ( z 2 + δ 2 ) ψ z 3 (z) + z 3 ψ(z) = 0. (3.26c) Following the proof of Proposition 1, we obtain the following result. Proposition 2. Let z R 3, R = ( z 2 + δ 2 ) 1/2 and c k = 1 k! Dk ψ(z). Then R 2 c k1,k 2,k 3 + 2z 1 (1 1 2k 1 )c k1 1,k 2,k 3 + (1 1 k 1 )c k1 2,k 2,k 3 + 2z 2 c k1,k 2 1,k 3 + c k1,k 2 2,k 3 + 2z 3 c k1,k 2,k c k1,k 2,k 3 2 = 0, (3.27a) R 2 c k1,k 2,k 3 + 2z 2 (1 1 2k 2 )c k1,k 2 1,k 3 + (1 1 k 2 )c k1,k 2 2,k 3 + 2z 1 c k1 1,k 2,k 3 + c k1 2,k 2,k 3 + 2z 3 c k1,k 2,k c k1,k 2,k 3 2 = 0, (3.27b) R 2 c k1,k 2,k 3 + 2z 3 (1 1 2k 3 )c k1,k 2,k (1 1 k 3 )c k1,k 2,k z 1 c k1 1,k 2,k 3 + c k1 2,k 2,k 3 + 2z 2 c k1,k 2 1,k 3 + c k1,k 2 2,k 3 = 0, (3.27c) where k 1 > 0 in (3.27a), k 2 > 0 in (3.27b), k 3 > 0 in (3.27c), with the convention that c k = 0 when any of the indices are negative. These recurrences are used to compute the c k for k p with O(p 3 ) operations. To demonstrate how this is done and to simplify the presentation, we first explain the process for a two-dimensional analogue, obtained by omitting the z 3 and k 3 dependence. We describe the generalization to the three-dimensional case afterward. So consider the two recurrences R 2 c k1,k 2 + 2z 1 (1 1 2k 1 )c k1 1,k 2 + (1 1 k 1 )c k1 2,k 2 + 2z 2 c k1,k c k1,k 2 2 = 0, (3.28a) R 2 c k1,k 2 + 2z 1 c k1 1,k 2 + c k1 2,k 2 + 2z 2 (1 1 2k 2 )c k1,k (1 1 k 2 )c k1,k 2 2 = 0. The coefficients are computed in the following 4 steps, depicted in Figure Compute c 0,0 from the definition. (3.28b)

47 Step 1 Step GBH OP CD Step 3 Step 4 KBL MBN EF IJ Figure 3.4: Computing Taylor coefficients for two-dimensional example. ( ) : previous step, ( x) : current step, ( ) : future step.

48 2. Compute c k,0 and c 0,k for k = 1... p. 3. Compute c k,1 and c 1,k for k = 1... p Compute c k1,k 2 for k 1 + k 2 p. The coefficients obtained in step 4 are computed row-by-row. The computation is ordered this way to ensure that the coefficients needed for the recurrences are available. It also breaks the code into blocks which correspond to the cases when different coefficient indices arising in the recurrences are negative, allowing for a more understandable code. This is more of an issue in the three-dimensional case, where there are more index cases to consider. The computations for the three-dimensional case are performed in the following steps : 1. Compute c 0 from the definition. 2. Compute c k when two indices are 0 and the other is Compute c k when one index is 0, one index is 1 and the other is Compute c k when one index is 0 and the other two are both Compute c k when two indices are 1 and the other is Compute c k when one index is 1 and the other two are Compute c k when all of the indices are greater than 2. As in the two-dimensional case, this ordering of the steps ensures that coefficients needed for the recurrences are available. Once the coefficients are computed, (3.19) is used to compute the a k (x, ỹ) for k < p with another O(p 3 ) operations. Thus, the overall operation count for computing the a k (x, ỹ) for a pth order particle-cluster interaction is O(p 3 ). 3.6 Error Analysis of Particle-Cluster Interactions In this section, we obtain a bound for the error due to the series truncation in a particle-cluster interaction. This bound is used in the algorithm to compute an order p to satisfy the specified tolerance. So consider a cell τ with particles y j acting on the target particle x. To simplify the analysis, we initially bound the error in the series truncation for the influence of a single particle y j in τ on x. The triangle inequality then ensures that the total error in the particle-cluster interaction is less than the sum of the individual errors. From (3.12), y j s computed influence on x is a k (x, ỹ) (y j ỹ) k w j. (3.29) k <p

49 Because the vector weight w j is independent of k, it factors out of the equation, so we restrict our attention to the expression a k (x, ỹ)(y j ỹ) k. (3.30) k <p To analyze the rate of convergence of this series as p increases, the quantities S n = a k (x, ỹ)(y j ỹ) k (3.31) k =n are introduced, where the dependence of S n on x, ỹ, and y j is not explicitly displayed for notational convenience. With the series truncation that we are using, k < p, the multi-dimensional series (3.30) has been reduced to a one dimensional series S n. (3.32) k <p a k (x, ỹ)(y j ỹ) k = n<p We estimate the error due to the truncation by showing that the magnitude of the S n decrease geometrically and using a bound on the first omitted term S p. The differential equation relating a k (x, ỹ) and c k (x, ỹ), (3.20), leads us to introduce the quantity T n = c k (x, ỹ)(y j ỹ) k, (3.33) which is related to S n by k =n S n = ( 1) n x T n. (3.34) From the recurrences for c k (x, ỹ), (3.27), we derive a recurrence for T n. From this, we derive an explicit expression for T n which involves Legendre polynomials. Then using (3.34), we derive an expression for S n. This expression is used to estimate the error incurred by truncating the series (3.12). Proposition 3. With T n, c k (x, ỹ) and R defined as above, where R 2 T n + 2α(1 1 2n )T n 1 + β 2 (1 1 n )T n 2 = 0, (3.35) α = (x ỹ) (y j ỹ), β = y j ỹ. (3.36) Proof. Consider a term R 2 c k (x, ỹ)(y j ỹ) k from the sum in (3.33) which makes up R 2 T n. Letting z = x ỹ, we apply a linear combination of the identities (3.27a,b,c)

50 with weights k 1 /n, k 2 /n and k 3 /n respectively and solve for R 2 c k (x, ỹ)(y j ỹ) k. The result is a weighted sum of c ek (x, ỹ) over k = n 1 and n 2. The weight on a particular c ek (x, ỹ) is computed as follows. If k = n 1, then the c ek (x, ỹ) arose from identities (3.27) being applied to R 2 c k (x, ỹ) terms with k equal to ( k 1 + 1, k 2, k 3 ), ( k 1, k 2 + 1, k 3 ), or ( k 1, k 2, k 3 + 1). The resulting weight multiplying c ek (x, ỹ) is then (using k 1 + k 2 + k 3 = n 1) ( ) ( k z k 2 n 2( k 1 + 1) n + k ) 3 (y j ỹ) (e k 1 +1, e k 2, e k 3 ) n ( k1 + 2z 2 n + k ( ) k ) 3 (y j ỹ) (e k 1, e k 2 +1, e k 3 ) (3.37) n 2( k 2 + 1) n ( k1 + 2z 3 n + k 2 n + k ( )) (y j ỹ) (e k 1, e k 2, e k 3 +1) n 2( k 3 + 1) = 2z 1 (1 1 2n )(y j ỹ) (e k 1 +1, e k 2, e k 3 ) + 2z 2 (1 1 2n )(y j ỹ) (e k 1, e k 2 +1, e k 3 ) + 2z 3 (1 1 (3.38) 2n )(y j ỹ) (e k 1, e k 2, e k 3 +1) = 2 {z 1 (y j,1 ỹ 1 ) + z 2 (y j,2 ỹ 2 ) + z 3 (y j,3 ỹ 3 )} (1 1 2n )(y k j ỹ) e (3.39) = 2z (y j ỹ)(1 1 2n )(y k j ỹ) e (3.40) = 2α(1 1 2n )(y k j ỹ) e (3.41) If k = n 2, then the c ek (x, ỹ) arose from identities (3.27) being applied to R 2 c k (x, ỹ) terms with k equal to ( k 1 + 2, k 2, k 3 ), ( k 1, k 2 + 2, k 3 ), or ( k 1, k 2, k 3 + 2). The resulting

51 weight multiplying c ek (x, ỹ) is then (using k 1 + k 2 + k 3 = n 2) ( k1 + 2 n (1 1 k1 + 2 ) + k 2 n + k ) 3 (y j ỹ) (e k 1 +2, e k 2, e k 3 ) n ( k1 + n + k n (1 1 k2 + 2 ) + k ) 3 (y j ỹ) (e k 1, e k 2 +2, e k 3 ) n ( k1 + n + k 2 n + k ) n (1 1 k3 + 2 ) (y j ỹ) (e k 1, e k 2, e k 3 +2) = (1 1 n )(y j ỹ) (e k 1 +2, e k 2, e k 3 ) + (1 1 n )(y j ỹ) (e k 1, e k 2 +2, e k 3 ) + (1 1 n )(y j ỹ) (e k 1, e k 2, e k 3 +2) = y j ỹ 2 (1 1 n )(y j ỹ) e k = β 2 (1 1 n )(y j ỹ) e k (3.42) (3.43) (3.44) (3.45) Thus, when R 2 T n is expanded with identities (3.27), the result is R 2 T n = e k =n 1 2α(1 1 2n )c e k (x, ỹ)(y j ỹ) ek e k =n 2 β 2 (1 1 n )c e k (x, ỹ)(y j ỹ) e k (3.46) = 2α(1 1 2n )T n 1 β 2 (1 1 n )T n 2. (3.47) The recurrence (3.35) is related to the one satisfied by the Legendre polynomials, P n (x) 2x(1 1 2n )P n 1(x) + (1 1 n )P n 2(x) = 0, (3.48) for n 2 with P 0 (x) = 1 and P 1 (x) = x [22, Chapter 10]. This observation leads to the following explicit formula for T n. Proposition 4. With α, β, and R defined as above, ( T n = hn 4πR P n α ), (3.49) βr where h = β R. (3.50)

52 Proof. The proof works by showing that T n and the right-hand side of (3.49), which we refer to as T n, satisfy the same two-term recurrence and have the same values for n = 0, 1. The recurrence for T n is given in Proposition 3. From the recurrence for P n in (3.48), we have that P n ( α ) is a solution of the recurrence βr Thus, h n P n ( α ) is a solution of βr f n + 2 α βr (1 1 2n )f n 1 + (1 1 n )f n 2 = 0. (3.51) f n + 2h α βr (1 1 2n )f n 1 + h 2 (1 1 n )f n 2 = 0. (3.52) Multiplying this equation by R 2 and using h = β/r yields R 2 f n + 2α(1 1 2n )f n 1 + β 2 (1 1 n )f n 2 = 0, (3.53) which is the recurrence for T n. Since T n is a multiple of h n P n ( α ), it too satisfies the βr recurrence. So T n and T n satisfy the same two-term recurrence. From the definition of T n, (3.33), and the recurrence that the T n satisfy, (3.35), the initial values for T n are T 0 = c 0 (x, ỹ) = 1 4π ( x ỹ 2 + δ 2 ) 1/2 = 1 (3.54) 4πR T 1 = α R T 2 0 = α 4πR. (3.55) 3 Using h = β/r, the initial values for T n are ( T 0 = h0 4πR P 0 α ) βr ( T 0 = h1 4πR P 1 α ) βr Thus, T n = T n for all n 0. = 1 4πR, (3.56) = α 4πR 3. (3.57) To obtain an expression for S n, we take the gradient of T n with respect to x. When written out in terms of x, y j, and ỹ, we have from (3.49) T n = y ( ) j ỹ n ( x ỹ 2 + δ 2) (n+1)/2 (x ỹ) (y j ỹ) Pn. (3.58) 4π y j ỹ ( x ỹ 2 + δ 2 ) 1/2 Defining (x ỹ) (y j ỹ) γ =, (3.59) y j ỹ ( x ỹ 2 + δ 2 ) 1/2

53 we have { S n = ( 1)n y j ỹ n (n + 1)(x ỹ) ( x ỹ 2 + δ 2) (n+3)/2 P n (γ) 4π + ( x ỹ 2 + δ 2) [ (n+1)/2 P (y j ỹ) n (γ) y j ỹ ( x ỹ 2 + δ 2 ) 1/2 + (x ỹ) (x ỹ) (y j ỹ) y j ỹ ( x ỹ 2 + δ 2 ) 3/2 Recalling R = ( x ỹ 2 + δ 2 ) 1/2 and making some rearrangements, we obtain S n = ( 1)n 4πR 2 ( ) n { yj ỹ (n + 1) x ỹ R R P n(γ) [ + P n (γ) y j ỹ y j ỹ + (x ỹ) (x ỹ) (y j ỹ) R 2 y j ỹ ]}. (3.60) ]}. (3.61) Each fraction inside the curly braces is less than 1 in magnitude, so we have S n 1 ( ) n yj ỹ ((n + 1)P 4πR 2 n (γ) + 2P n (γ)). (3.62) R Using the inequalities which rely on γ 1, we have P n (γ) 1, P n(γ) n(n + 1)/2, (3.63) S n (n + 1)2 4πR 2 ( ) n yj ỹ. (3.64) R So the terms of the series in (3.32) decay geometrically, which implies that the truncation error is roughly the magnitude of the first omitted term. Recall that when a pth order particle-cell interaction is performed, the expansion for the influence of all of the particles is a k (x, ỹ) b k (τ) = a k (x, ỹ) b k (τ), (3.65) p 0 k and this is truncated by retaining the terms for which k < p. The geometric decay of the terms ensures that the error incurred by using this truncation will be on the order of k =p a k (x, ỹ) b k (τ). (3.66) k =p

54 The bound (3.64) on S n for n = p translates into the bound a k (x, ỹ) b k (τ) (p + 1)2 N τ w 4πR p+2 j y j ỹ p. (3.67) k =p Define the quantities j=1 N τ σ p (τ) = (p + 1) 2 w j y j ỹ p. (3.68) Then the error incurred by using a pth order expansion to approximate a particlecluster interaction is bounded by error < j=1 σ p(τ). (3.69) 4πRp+2 The σ p (τ) are computed during the construction of the tree. The first step in performing a particle-cluster expansion is to find the smallest p such that the expression in (3.69) is less than the specified tolerance. This can be done with O(p max ) operations, as stated in Section 3.3, where p max is the maximum admissible value of p. When an algorithm using the error bound (3.69) to compute p was implemented, it was found that the actual error incurred in computing the velocity was typically three orders of magnitude smaller than the specified tolerance. Presumably, this is due to the repeated use of the triangle inequality in the analysis, which leads to overestimates. An alternative to bounding the error in the velocity is to bound the error in the velocity potential, which is achieved by using the identity (3.49), leading to the bound T n 1 ( ) n yj ỹ, (3.70) 4πR R a bound analogous to (3.64). The cumulative error bound corresponding to (3.69) is where σ p (τ) is now defined as error < σ p(τ), (3.71) 4πRp+1 N τ σ p (τ) = w j y j ỹ p. (3.72) j=1 When an algorithm using this error bound to compute p was implemented, the error in computing the velocity was still smaller than the specified tolerance, but only by

55 one order of magnitude. This is the approach used for all of the runs described in Chapters 4 and 5. For either of the error bounds, it is clear that the geometric decay rate of the truncation error depends linearly on r τ, the radius of the smallest sphere centered at ỹ which encloses all the particles in the cell τ. Thus, it is appropriate to subdivide cells so that the resulting subcells are as close to spheres as possible. As mentioned in Section 3.4, this is the motivation for the bisection technique used when cells are subdivided. 3.7 Full Description of the Algorithm The algorithm for computing all interactions has two stages, constructing the tree and computing the particle velocities with the aid of the tree. There are two parameters for the program, p max, the maximum admissible order for expansions, and N 0, the maximum number of particles in an undivided cell. The tree is created with the recursive function create_tree, written here in pseudo-code, which accepts for input an array of particles associated with a cell τ, and an integer N, the length of the array. The purpose of the function is to create and initialize a tree node for the particles which are passed to it. This includes computing the particle s bounding box, the cell s moments, and the σ s. If there are more than N 0 particles, then the cell is subdivided and the function is called recursively. The function returns a pointer to the created tree node. function create_tree(particles, N) begin allocate memory for tree node being created compute particle s bounding box compute center of bounding box compute σ p for p = 0... p max if N > N 0 then compute the directions to subdivide the cell partition the particles, yielding subarrays of particles call create_tree for each subarray of particles make each returned tree node a child of τ return τ end Once the tree is created, the recursive function compute_influence is called for each target particle to compute the influence of all particles on it. The function accepts for input a target position x, a cell τ, and a tolerance tol. The function returns the

56 influence of the particles in the cell on the target position computed to the specified tolerance. It is initially called with the root cell of the tree τ 0. function compute_influence(x, τ, tol) begin estimate t 0, the time for direct summation with linear model compute minimum p to satisfy tolerance if p > p max or time for pth order expansion > t 0 then if τ has no children then compute and return influence using direct summation else call compute_influence for each child of τ return sum of returned influences else if τ s pth order particle moments have not been computed yet, then compute and store them compute the c k compute the a k from the c k compute and return sum of expansion end For each recursive call of compute_influence to itself, a local tolerance is required. We want the cumulative errors from the child computations to be less than or equal to the tolerance passed to compute_influence. We achieve this by multiplying the parent s tolerance by the ratio of the child s weights and the parent s weights. That is, if τ is the child cell, and P (τ) is τ s parent, then y tol(τ) = j τ w j y j P (τ) w tol(p (τ)), (3.73) j where tol(τ) denotes the local tolerance for the cell τ. Then it follows that the sum of the errors from the child computations is bounded by the tolerance specified for P (τ). Note that the sums in the ratio are the values of σ 0 (τ) and σ 0 (P (τ)), which were computed in create_tree. 3.8 Complexity Analysis In this section, we describe the memory and time requirements of the algorithm described above. We first show that the memory required for the algorithm is O(N). The algorithm consists of two stages, tree construction and velocity computation, and we analyze them separately for their time requirements. The number of operations

57 required for constructing the tree is O(N log N). We break the velocity computations into two parts, particle-cluster interactions and particle-particle interactions. We present heuristic reasons for why these take O(N log N) and O(N) operations respectively. The bounds obtained in this section should be considered as rough guides to how the algorithm performs in practice, as opposed to sharp estimates. We expect that the asymptotic behavior of the bounds matches the algorithm s asymptotic performance, but that the constants involved may be considerably off. In the next chapter, we present data from runs on test cases which demonstrates the algorithm s performance benefit over an algorithm which uses only direct summation. It is our position that that data is more significant than the asymptotic bounds obtained here, since we desire an actual execution time improvement, not just an asymptotically fast algorithm. With that in mind, we proceed with the analysis. We now discuss the memory requirements for the algorithm, in terms of the parameters N, N 0, and p max. The memory can be broken down into 2 categories, that required for the particles, and that required for the tree. It is clear that the particles require O(N) words of memory. The data for a single cell that requires more than O(1) words are the cell moments and the σ p (τ), which require O(p 3 max ) and O(p max) words respectively. So a single cell uses O(p 3 max) words of memory. We bound the number of cells in the tree, by first bounding the number of cells which are parents of leaf cells, and then use that to bound the size of the entire tree. A cell which is the parent of a leaf cell has at least N 0 particles. Since every particle is in exactly one such cell, there are at most N/N 0 parents of leaf cells. Going down the tree, we see that there are at most 8N/N 0 leaf cells, since each cell has at most 8 children. Going up the tree, we see that there are at most N/2N 0 parents of parents of leaf cells, since each cell has at least 2 children, if it has children. Continuing up the tree, looking at parents of parents and so on, yields collections of cells with at most N/4N 0, N/8N 0,... cells respectively. Thus, there are at most ( /2 + 1/ )N/N 0 = 10N/N 0 (3.74) cells in the tree. Thus, the memory required for the tree is O(p 3 max N/N 0). Note that since each leaf cell has fewer than N 0 particles in it, there must be more than N/N 0 of them. Thus, our bound has the correct asymptotic order. The tree code algorithm presented here requires more memory than a direct summation program. However, in the next chapter, when the algorithm is validated, the amount of memory actually used is compared to the memory used by a direct summation program. It is found that the memory required by the tree code algorithm is 1.3 to 1.6 times that required by direct summation.

58 For certain particle distributions and tolerance values, the algorithm will perform poorly, taking more time than an algorithm which uses only direct summation. However, this behavior has not been observed in our tests. In order for the analysis to reflect the actual performance characteristics of the algorithm, we make a simplifying assumption about the particle distribution. The assumption is that when a cell is subdivided, there is an upper bound on the percentage of particles contained in a subcell. Mathematically, this is stated as N τ N P (τ) C < 1, (3.75) where τ is an arbitrary cell, P (τ) is the parent of τ, and C is a constant independent of τ. Intuitively, this assumption is bounding how inhomogeneous the distribution of particles can be. In our computations, the maximum value of N τ /N P (τ) was computed at each time step and found to be less than 0.5, so the assumption is justified. For the operation count of the tree construction, we first obtain an upper bound on the number of levels in the tree. The root cell of the tree, τ 0, has N particles. Inequality (3.75) gives an upper bound on the number of particles in a cell in terms of its parent. Applying it iteratively, we find that cells at the lth level have fewer than C l N particles. Now if a cell has fewer than N 0 particles, it is not subdivided. This will be guaranteed if C l N < N 0, which is true if l > log(n 0 /N)/ log C = log(n/n 0 )/ log C 1. Thus, there are at most O(log(N/N 0 )) levels in the tree. Consider a cell τ and let T (τ) be the total number of operations required to construct the tree starting with τ and including all of τ s children. In the function create_tree, the steps that require more than O(1) operations are computing the particles bounding box and computing the σ p (τ) for p = 0... p max, which require O(N τ ) and O(p max N τ ) operations respectively. If N τ N 0, then no more operations are performed, so T (τ) = O(p max N τ ). If N τ > N 0, then the particles are partitioned and create_tree is called for each subcell. The partitioning consists of grouping the particles according to which octant they are located in with respect to the cell s center, a procedure that can be done in O(N τ ) operations. Then create_tree is called for each subcell, implying T (τ) = O(p max N τ ) + T ( τ). (3.76) P ( τ)=τ This equation, derived for N τ > N 0, is also true for N τ N 0, since the cell has no children and the sum is empty. Suppose N τ > N 0 and consider applying (3.76) to itself, expanding each T ( τ). The first term of the expansion of T ( τ) is O(p max N τ ). Since the τ are the children

59 of τ, we have P ( τ)=τ N τ = N τ, (3.77) an equality which requires N τ > N 0. Thus, the first terms of the expansions of the T ( τ) sum to O(p max N τ ). So if N τ > N 0, then T (τ) = 2O(p max N τ ) + P (P ( τ))=τ T ( τ), (3.78) with the last sum potentially being empty. We repeat this process and apply (3.76) recursively to the T ( τ). However, it may be the case that not all children of τ have children, because some children of τ may have fewer than N 0 particles and thus are not subdivided. Thus, the inequality P (P ( τ))=τ analogous to (3.77), may be strict. So we obtain in general T (τ) O(lp max N τ ) + N τ N τ, (3.79) P (l) ( τ )=τ T ( τ), l = 1, 2,..., (3.80) where P (l) is the parent function P composed with itself l times. The recursion stops when l is larger than the number of levels in the tree below τ, since the sum in (3.80) is empty then. Thus, substituting τ = τ 0, and using the fact shown above that there are O(log(N/N 0 )) levels in the tree, we have T (τ 0 ) O(p max N log(n/n 0 )). (3.81) For the computation of the particle velocities, we do not have an upper bound on the number of operations that are used. The main difficulties are because of the adaptive nature of the algorithm. There is not an apriori bound on the ratio of a parent s cell size and a child s cell size. Thus, particle-cluster interactions become significantly more advantageous when a parent cell is subdivided and shrunk, as opposed to the gradual improvement that occurs when only subdividing is performed. Also, the cells on a given level of the tree may have very different sizes. This makes it difficult to consider them together which is a natural technique. Tree codes in the past that are not as adaptive as the current one have been shown to take O(N log N) operations. We believe that our algorithm does as well, based on heuristic considerations and actual execution times.

60 The heuristic argument is as follows. We would show that O(N log N) operations are required by showing that each particle takes part in O(log N) particlecluster interactions and O(1) particle-particle interactions. Each particle takes part in O(log N) particle-cluster interactions because it takes part in O(1) of them on each level of the tree and there are O(log N) levels in the tree. The reason that a particle only takes part in O(1) particle-cluster interactions on a given level is that it does not interact with cells which are sufficiently far away relative to the cell s size, because the particle would have interacted with such cell s parents. One obtains an upper bound on the relative distance to a cell with which a particle-cluster interaction is performed, and the upper bound is independent of the level. If the cells are of the same size on the level, this implies an upper bound on the number of cells satisfying the condition. This is one point where the analysis is heuristic and not rigorous for our algorithm. Thus, there is an upper bound on the number of particle-cluster interactions on a level, which implies an O(N log N) operation count. To bound the number of particle-particle interactions, consider how many particles interact with a leaf cell on a particle-particle basis. If one assumes that the number of particles which do so is proportional to the number of particles in the leaf cell, then it follows that leaf cells τ κn 2 τ < κn 0 leaf cells τ N τ < κn 0 N, (3.82) where κ is the constant of proportionality. One can show that the particles which interact with a leaf cell on a particle-particle basis are contained in a sphere around the cell whose radius is proportional to the size of the cell. So if the particle density in the sphere is not too different than the density in the cell, then the assumption of the particle count is justified, and the operation count bound is achieved. So with these heuristic considerations and the rigorous bounds above, we have that the overall operation count for the algorithm is O(N log N) and the memory usage is O(N).

61 CHAPTER 4 ALGORITHM VALIDATION AND PERFORMANCE In this chapter, we present a validation of the algorithm. The algorithm has two different aspects to it, the vortex method, i.e. the discretization of the vortex sheet model, and the tree-code which is used to evaluate particle velocities. The topics in the first section are related to the vortex method. We demonstrate the 4-th order convergence of the Runge-Kutta method and present results showing convergence as the vortex sheet is refined. The other sections deal with the tree-code. We discuss the selection of the runtime parameters N 0 and p max and demonstrate the algorithm s accuracy and execution time improvement over direct summation. The algorithm was implemented in C [29], using double precision arithmetic and runs were performed on a Silicon Graphics Power Challenge L, a Sun UltraSPARC 2, and a Sun SPARCstation 20. Relevant information about the machines is presented in Table 4.1. The computations which involved timing comparisons were performed on the Silicon Graphics machine. 4.1 Convergence of Vortex Method The purpose of the first test case is to verify the 4th order convergence of the Runge-Kutta method for the solution of the differential equations (2.23). These runs were performed with a program using direct summation. The initial condition was a flat circular vortex sheet of radius 1 with circulation distribution λ 1 = (1 r 2 ) 1/2. Such a sheet rolls up into a vortex ring as described in Chapter 2. The α change of variable employed was λ 1 = cos α, 0 α π/2, yielding r = sin α. The smoothing Machine RAM (MB) CPU clock rate (MHz) Power Challenge L UltraSPARC SPARCstation Table 4.1: Machine characteristics. 50

62 Figure 4.1: Profile of rolling up vortex sheet. t = 1, δ = t e( t) e( t)/( t) Table 4.2: Maximum point position differences for circular sheet. t = 1, δ = 0.10, e( t) = max i x i ( t) x i ( t/2). parameter δ was set to 0.1. The sheet was discretized with 64 circular vortex lines, uniformly spaced in α. Each vortex line was discretized with 128(1 + r) particles, where r is the radius of the vortex line, rounding the number of particles up to the nearest multiple of 8. The total number of particles discretizing the sheet was A profile of the sheet at time t = 1 is depicted in Figure 4.1. The computations were performed with different time steps and the results from different runs were compared by computing the maximum distance between particle positions. This comparison was used because in these particular runs, no particles or lines were inserted. The values of t used were 0.2, 0.1, 0.05, and The position differences were computed for consecutive values of t and are displayed in Table 4.2. The results are consistent with 4th order accuracy. Recall from Section that points and lines are inserted during a computation to maintain resolution as the vortex sheet is stretched. There are two parameters governing this process, denoted ɛ 1 and ɛ 2. When the distance between two adjacent vortex lines is greater than ɛ 1, a new vortex line is inserted and if two particles on a vortex line are separated by more than ɛ 2, a new particle is inserted. If either of these parameters is too large, resolution is lost and the computations become inaccurate. Figure 4.2 depicts cross sections of an axisymmetric vortex sheet rolling up into a disk for different values of ɛ 1. As above, the smoothing parameter δ is set to 0.1.

63 Figure 4.2: Profile of rolling up vortex sheet. t = 4, δ = 0.10, t = 0.05, ɛ 1 = 0.15, 0.10, 0.05, ɛ 2 = 0.05 The cross sections are shown for t = 4. The time step size t was set to 0.05, based on the results of the previous section. The ɛ 1 = 0.05 curve is smooth and no corners are discernible. The ɛ 1 = 0.10 curve is not as resolved, but the point positions are in good agreement with the better resolved ɛ 1 = 0.05 curve. The same can be said of the ɛ 1 = 0.15 curve, but the loss of resolution in the core of the ring is considerable. 4.2 Selection of Runtime Parameters In this section, we present the results of runs which were performed to select the runtime parameters N 0, the upper bound on the number of particles in an undivided cell, and p max, the maximum admissible value of p. As explained previously, if N 0 is small, then memory usage is large because the tree will have many levels. If N 0 is large, then fewer particle-cluster interactions will be possible since there will be fewer cells, resulting in a large execution time. Similarly, if p max is large, then there

64 will be a large memory requirement because cell moments require O(p 3 max ) words of memory. If p max is small, then fewer particle-cluster interactions will be possible since the tolerance conditions will be satisfied less often, resulting in an increase in execution time. To find values for these parameters which ensure good performance, runs were performed with N 0 and p max taking on a range of values, N 0 = in increments of 50, and p max = 6, 8, 10. Runs were performed with different N values and tolerances to ensure consistent results. The execution times of these runs are presented in Figure 4.3 as a function of N 0. Going up the page, N increases, and going right across the page, tol, the requested tolerance decreases. The different line patterns correspond to different vales of p max, as described in the caption. A few observations can be made from this data. First, though there are trends in the execution time as a function of N 0, the timings never vary by more than a few percentage points for fixed N and tolerance. The dependence on p max is similar, although for the smaller tolerances, the difference in the execution time is larger. In particular, for the smallest tolerance, tol = 10 4, the p max = 6 times are 15 to 20 percent larger than the p max = 8 and 10 times, which are nearly identical. In roll-up simulations in the Chapter 5, we use tol = 10 3 for accuracy. For this tolerance, p max = 8 consistently has smaller execution times, so that is our choice of p max. We postpone selection of N 0 until after we discuss memory usage. The memory used for these runs, in megabytes, is presented in Figure 4.4 as a function of N 0. As in Figure 4.3, N increases going up the page. The amount of memory used by the algorithm is independent of the requested tolerance, so there is only one column. As N 0 increases, the memory usage decreases and levels out. As expected, for larger values of p max, the memory usage is larger. From execution time and memory considerations, we use the value 512 for N 0. This means that we are potentially performing particle-particle interactions with cells that contain 500 particles. Though this value may seem intuitively large, it is justified from the numerical data in Figures 4.3 and 4.4. If much smaller values of N 0 are used, then execution times as well as memory usage are larger. 4.3 Algorithm Performance In this section, we compare the tree code s performance to direct summation. Execution time and memory usage as functions of N are compared for different tolerances. These comparisons are based on evaluating the velocity at points on a surface which approximates a rolled up vortex sheet, no time evolution is performed. In Figures 4.5 and 4.6, the independent variable N, the total number of particles, was made to vary by changing the refinement in λ 1 (i.e. α).

65 110 tol = 1.0e tol = 1.0e tol = 1.0e N=51276 t (sec.) t (sec.) t (sec.) t (sec.) N N 0 N 0 Figure 4.3: Execution time (sec.) vs. N 0. p max = 6 ( ), 8 ( ), 10 ( ).

66 100 N=51276 t (sec.) t (sec.) t (sec.) t (sec.) N 0 Figure 4.4: Memory usage (MB) vs. N 0. p max = 6 ( ), 8 ( ), 10 ( ).

67 execution time (sec.) (a) N x 10 4 (b) N x 10 4 Figure 4.5: Execution time (sec.) vs. N. p max = 8. tol = 10 2 ( ), 10 3 ( ), 10 4 ( ). direct summation ( ). actual data (o), projected data (x). (a) Execution time, (b) Direct summation time / fast algorithm time. Figure 4.5 displays the execution time. The different line patterns represent different requested tolerances, as described in the caption. Figure 4.5a presents the execution times in seconds and Figure 4.5b shows the ratio between the direct summation time and the tree code s time. In our roll-up computations in Chapter 5, we use tol = 10 3, which corresponds to the dashed line. With this tolerance, the new algorithm is faster than direct summation by a factor of 10 when there are 100,000 particles, and this factor increases with N. The factor of improvement appears to be increasing at a rate which is slightly less than linear. This is the expected behavior for an algorithm which requires O(N log N) operations, since N 2 /(N log N) = N/ log N. Figure 4.6 displays the memory used by the programs. Figure 4.6a presents the usage in megabytes and Figure 4.6b shows the factor of increase, i.e. the ratio between the new algorithm s usage and the direct summation usage. As noted in Section 3.8, the percentage increase over the direct summation algorithm is between 1.3 and 1.6. The actual error in the computed value of the particle velocities, which is due to series truncation, is less than the specified tolerance. The disparity is due to the application of the triangle inequality in the error estimates in Section 3.6. Figure 4.7 displays the actual error as a function of the specified tolerance. Recall from Section 3.6 that there are two different error bounds, one on the velocity potential (3.71) and one on the velocity (3.69). The figure contains data for programs which determine p using these bounds for different values of N, plotted with solid and

68 45 (a) 1.6 (b) memory usage (MB) N x N x 10 4 Figure 4.6: Memory usage (MB) vs. N. p max = 8. fast algorithm ( ), direct summation ( ). actual data (o), projected data (x). (a) Memory usage, (b) Fast algorithm memory usage / direct summation memory usage. dashed lines as described in the caption. It was stated in Section 3.6 that if the choice of p is based on the velocity error bound, then the actual error in the velocity is several orders of magnitude smaller than the requested tolerance, which is clearly demonstrated by the figure. The actual error is also smaller when the potential error bound is used, but by a smaller margin. Note that the actual error is not sensitive to changes in N. Figure 4.8 depicts the execution time as a function of the actual error, using the two different error bounds. The plotted lines correspond to the requested tolerances tol = 10 2, 10 3, 10 4 for a fixed value of N, going up the plot as N increases. A conclusion that can be drawn from the figure is that the potential error bound requires less time to obtain a given actual error for the same number of points than the velocity error bound. This observation, and the closer match of requested tolerance and actual error are the reasons that we use the potential bound.

69 10 3 actual velocity error specified tolerance Figure 4.7: Actual error vs. specified tolerance. p max = 8, N 0 = 500, N = 6284, 12708, 25572, 38444, potential error bound ( ), velocity error bound ( ). 600 execution time (sec.) actual 10 5 velocity error Figure 4.8: Execution time (sec.) vs. actual error. p max = 8, N 0 = 500, N = 6284, 12708, 25572, 38444, Connected lines are tol = 10 2, 10 3, potential error bound ( ), velocity error bound ( ).

70 CHAPTER 5 APPLICATIONS In this chapter, the results of simulations performed using our algorithm are presented. In all of the computations here, unless mentioned otherwise, the requested tolerance was tol = 10 3, and the runtime parameters for the algorithm were N 0 = 512 and p max = 8. The smoothing factor was δ = Vortex Ring with Azimuthal Perturbation This section presents the results of simulations of a perturbed rolling-up vortex sheet. An azimuthal instability was introduced to the sheet by perturbing a flat circular disk. In polar coordinates, the perturbation is of the form p(r, θ) = ρ r 2 cos(kθ)e z, (5.1) where k is the perturbation wavenumber and ρ is the magnitude of the perturbation. The r 2 factor is present to smooth the perturbation at the origin. The perturbation may also be considered as a function of α and θ, its initial magnitude being proportional to sin 2 α, since r = sin α. After the sheet rolls up, the radius of the ring, the position of the core, is approximately 0.8, as seen in Figure 4.2. Recall from the linear stability analysis of Section that the stability of a vortex filament with respect to a perturbation with wavenumber k depends only on k and δ/r. For δ = 0.10, and R = 0.8, δ/r = From Figure 2.8, a vortex filament with δ/r = 0.12 has an unstable mode for k = 9. However, the presence of the rolls which are larger than δ presumably has an effect of spreading the vorticity out more away from the core. This is analogous to increasing δ, which lowers the wavenumber of the unstable mode. With this in mind, simulations were performed with wavenumbers k ranging from 4 to 11. The time step used was t = 0.10, and the point insertion parameters ɛ 1 and ɛ 2 were and 0.05 respectively. The value of ρ, the magnitude of the perturbation at the edge of the disk, was

71 Figure 5.1 shows a measure of the variance of the rings as a function of time. The quantity plotted was obtained as follows. Each value of α corresponds to a filament, which in our computations is perturbed from being circular. The average radius and z position of the filament are computed. For each value of α, we compute the L 2 distance from the filament to the circle whose radius and z position are the averages just computed. The quantity plotted in Figure 5.1 is the L 2 norm of this distance as a function of λ 1. The figure shows that the perturbation for the k = 4 and 5 modes does not grow much. For the larger wave numbers, the disturbance has more growth, peaking with the k = 10 perturbation. To visualize the sheets, we plot the sheet positions for the k = 5 and 9 simulations. These two values of k are representative of the behavior observed for other k values. The position of the sheet for the wavenumber k = 5 at times t = 0, 2, 4, 6 is shown in Figure 5.2. One can see from these images that the sheet is rolling up smoothly, the perturbation having only a marginal effect on the evolution. This is as opposed to the images in Figure 5.3, which shows the vortex sheet for the k = 9 simulation. In this simulation, and the other high wavenumber simulations, the outer turns of the sheet are smooth, but the core is becoming highly distorted. A depiction of the ring s core, for k = 5 and 9, is presented in Figure 5.4. The curves plotted are the filaments that correspond to α > 0.8. Initially, these filaments were near the outer portion of the disk. The distortion in the core for the k = 9 simulations as compared to the k = 5 is clearly evident here. The bulging behavior of the sheet around the waves is consistent with the simulations of Knio and Ghoniem [32] and the experiments of Didden [19]. The bulges are also similar to the deformations found by Meiburg, Lasheras, and Martin [44] in their study of azimuthal perturbations to a jet, which was based upon experiments and numerical simulations. It should be noted that the surfaces plotted in Figures 5.2 and 5.3 and the surface plots which appear later in this chapter are the surfaces formed by the material curves which coincided with the vortex lines of the sheets at t = 0. However, since we are using a smoothed Biot-Savart kernel, they are not the actual vortex lines for t > Elliptical Vortex Ring In this section, results from simulations of an elliptical vortex ring are presented. The computations are similar to those of Dhanak and de Bernardinis [18] and Fernandez et. al. [23]. The model used for the formation of an elliptical vortex ring is to give an impulse to an elliptical disk and then to dissolve the disk away. As with a circular disk, a free vortex sheet remains and rolls up into a vortex ring. Following Dhanak and de Bernardinis [18], the circulation distribution for an elliptical disk is

72 k = 4 k = k = k = k = t k = k = k = t Figure 5.1: Variance of perturbed vortex sheet. δ = 0.10, ρ = k : wavenumber of perturbation, t : time.

73 Figure 5.2: Perturbed vortex sheet. k = 5. δ = 0.10, t = 0, 2, 4, 6.

74 Figure 5.3: Perturbed vortex sheet. k = 9. δ = 0.10, t = 0, 2, 4, 6.

75 Figure 5.4: Core of perturbed vortex sheet. k = 5, 9. δ = 0.10, t = 0, 2, 4, 6.

76 taken to be where the disk is the region λ 1 = 1 x2 a 2 y2 b 2, (5.2) x 2 a + y2 1. (5.3) 2 b2 The vortex filaments are ellipses with the same eccentricity as the elliptical disk. As before, the α change of variable used is λ 1 = cos α, leading to λ 1(α) = sin α. Simulations were performed for disks with different eccentricities, which was controlled by setting b = 1 and allowing a < 1 to vary. The ratio of the minor axis length to the major axis length is a and the eccentricity is 1 a 2. We present results for a = 0.8, 0.6, 0.5. The insertion parameters ɛ 1 and ɛ 2 were both set to The time step used was t = For values of a close to 1, an elliptical ring may be considered as a small perturbation of a circular ring with wavenumber 2. From the linear stability analysis in Section 2.3.2, we expect the perturbation to oscillate with constant magnitude. The behavior is exhibited by the a = 0.8 computation, which is presented in Figure 5.5. Initially, the disk is narrower in the direction coming out of and to the right of the page. Thus, the filaments running along the front-right edge are stretched in comparison to the rest of the disk. This intensifies the vorticity and that is why the outer turns have wrapped up and around more along this and its opposite edge. However, the difference is not enough to disturb the core, which is rolling up smoothly. The a = 0.6 and 0.5 computations are presented in Figure 5.6 and 5.7 respectively. The orientation of these disks is the same as for the a = 0.8 disk. In the regions where the fluid is moving most rapidly around the edge of the disk, the front-right and back-left, the fluid is forced up over the disk towards the center. As the fluid from either side approaches the center, it is forced up and away from the disk. This is the cause of the protruding spikes on the sheets. The presence of these structures make it difficult to study the sheet s motion. This is because as the sheet stretches to form the peaks, additional filaments are inserted, which increases the execution time. For the a = 0.5 computation, it was started with under 7500 particles and at time t = 6, it has 84,000 particles. 5.3 Colliding Vortex Rings In this section, results of a simulation of oblique colliding vortex rings are presented. The configuration of vortex rings is based on experiments performed by

77 Figure 5.5: Elliptical vortex sheet. a = 0.8. δ = 0.10, t = 0, 2, 4, 6.

78 Figure 5.6: Elliptical vortex sheet. a = 0.6. δ = 0.10, t = 0, 2, 4, 6.

79 Figure 5.7: Elliptical vortex sheet. a = 0.5. δ = 0.10, t = 0, 2, 4, 6.

80 Schatzle [55]. In our computations, the rings are inclined from horizontal by 30 degrees. The centers of the initial circular vortex sheets are located at (±1, 0, 0). An adaptive time-step procedure was used, with an initial t = 0.10, although the time steps never went below The point insertion parameters ɛ 1 and ɛ 2 were and 0.05 respectively. Figure 5.8 shows the vortex sheets which represent the colliding vortex rings. Figure 5.9 shows a cut-away of the same view, enabling one to see the rolling up structure which is present. In the region where the rings have merged, the windings of the sheet are flattened up against each other and are being pushed down. Because of this stretching, a large number of filaments and particles are inserted into this region, even though the vorticity amplitude is relatively low, as shown by the vorticity isosurfaces in the next figures. At time t = 0, there were particles representing the disks, and at time t = 4.5, the latest time in our runs, there were particles. For this number of particles, we estimate that our fast algorithm is performing the computations 60 times faster than direct summation. Even with the fast algorithm, the computation took 32 hours to go from t = 4 to t = 4.5, so a direct summation algorithm would take months. Figures 5.10 through 5.13 show isosurfaces of the vorticity field, computed by differentiating the integral (2.25) and evaluating it for positions x on a regular grid. The values chosen for the isosurfaces are one- and two-thirds of the maximum initial computed vorticity. Each figure shows the rings from a different view point for the time sequence t = 0, 1, 2, 3, 4, 4.5. The first view is a perspective view with shading on the surfaces, and the others are orthogonal projections. As the rings approach, they initially pinch, and then they merge and this region flattens out. The connection region then begins to stretch out. This is in qualitative agreement with Schatzle s experiment and the computations of Anderson and Greengard [1]. In Schatzle s experiments, the connection region disconnects and there is another connection and subsequent disconnection which occurs at the bottom of the rings. Because of the stretching and reconnection of vorticity, it is an open question whether or not a vortex filament model can capture these later stages of the evolution. Our simulations appear to have effectively captured the merger of the rings. However, due to the large computational time, we were not able to explore the parameter space. For instance, it would be of interest to know how the ring merger depends upon the angle of inclination. We are also interested in knowing what happens when δ 0.

81 Figure 5.8: Vortex sheets modeling colliding disks. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

82 Figure 5.9: Cut-away of vortex sheets modeling colliding disks. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

83 Figure 5.10: Vorticity isosurfaces of colliding vortex rings, perspective view. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

84 Figure 5.11: Vorticity isosurfaces of colliding vortex rings, front view. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

85 Figure 5.12: Vorticity isosurfaces of colliding vortex rings, side view. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

86 Figure 5.13: Vorticity isosurfaces of colliding vortex rings, top view. δ = 0.10, t = 0, 1, 2, 3, 4, 4.5.

LONG-TIME SIMULATIONS OF VORTEX SHEET DYNAMICS

LONG-TIME SIMULATIONS OF VORTEX SHEET DYNAMICS BY DANIEL SOLANO 1, LING XU 2, AND SILAS ALBEN 2 1 Rutgers University, New Brunswick, NJ 891, USA. 2 University of Michigan, Ann Arbor, MI 4819, USA. Abstract.