COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD

Size: px

Start display at page:

Download "COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD"

Lynn Phillips
5 years ago
Views:

1 COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF MASTER OF SCIENCE IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES 2009 By Philip Clapham School of Computer Science

2 Contents Abstract 6 Declaration 7 Copyright 8 1 Introduction to Computation in Electromagnetics High Performance Computation Maxwell s Equations Introduction to FD-FDTD FDTD FD-FDTD Introduction to Locally One Dimensional Techniques in FDTD Implicit FDTD Crank-Nicolson based schemes Crank-Nicolson in general ADI-FDTD LOD-FDTD Statement of Aims and Objectives Objectives Literary Review Improving the Efficiency of CN Pipelined SOR Domain Decomposition Improving the Efficiency of LOD

3 6 Methodology Programming Methodology Planning 25 Bibliography 27 A Derivation of FD-LOD-FDTD Equations 29 3

4 List of Tables 5.1 Execution Timeline for Pipelined SOR Gantt Chart for Project Completion

5 List of Figures 2.1 Yee Algorithm Field Components Domain Decomposition communication

6 Abstract This is the feasibility report for the Msc level dissertation to be undertaken in the coming months. The aim of this project is to implement the parallelisation of the Frequency Dependent Locally One Dimensional Finite Difference Time Domain algorithm, for which initial serial code has already been produced, in such a manner as to reduce memory usage and execution time on the intended platform of the EUGrid. 6

7 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 7

8 Copyright i. The author of this thesis (including any appendices and/or schedules to this thesis) owns any copyright in it (the Copyright ) and s/he has given The University of Manchester the right to use such Copyright for any administrative, promotional, educational and/or teaching purposes. ii. Copies of this thesis, either in full or in extracts, may be made only in accordance with the regulations of the John Rylands University Library of Manchester. Details of these regulations may be obtained from the Librarian. This page must form part of any such copies made. iii. The ownership of any patents, designs, trade marks and any and all other intellectual property rights except for the Copyright (the Intellectual Property Rights ) and any reproductions of copyright works, for example graphs and tables ( Reproductions ), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property Rights and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property Rights and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and exploitation of this thesis, the Copyright and any Intellectual Property Rights and/or Reproductions described in it may take place is available from the Head of School of School of Computer Science (or the Vice-President). 8

9 Chapter 1 Introduction to Computation in Electromagnetics 1.1 High Performance Computation In the current scientific climate, computation is seen as an enabler for a vast number of applications. As theories become more complex, more computation is required to allow the expression of these theories to simulate experiments. As the performance of the computers increases, so do the expectations of the scientific community for them to carry out the experiments in ever quicker times. It is for these reasons that the field of high performance computing evolved, and explored various avenues of advancement to allow it to satisfy the demands placed upon it. One direction which is seeing rapid development is the field of massively parallel computation, whereby multiple threads or processes of execution run simultaneously. These computations may take place in a single computer, on the now prevalent multi-core processors, or across many such computers, connected together to provide a unified execution space. There are different specifications for the way in which this computation is carried out. One such way dictates that all the threads of execution address the same memory to perform their part of the computation. This shared memory paradigm can be utilised by libraries such as OpenMP. Another specification asserts that each process of the execution runs separately, has its own memory, and whenever it needs to exchange information with another process it does so using messages. This message passing paradigm has different uses to the shared memory one, and each will be better for particular kinds of high performance computation. 9

10 1.2 Maxwell s Equations The set of four partial differential equations known collectively as Maxwell s Equations can be used to fully describe the characteristics and interactions of electric and magnetic fields. These equations can be used to simulate propagation characteristics of electromagnetic waves and light through a continuous time and space domain, or through different mediums or in the frequency domain. When simulating a broadband signal, the most widely used family of computational electromagnetics algorithms, the Finite Difference Time Domain method (FDTD), is frequently used because it directly solves Maxwell s curl equations with a minimal set of assumptions, thus providing a robust, straightforward method. It is also capable of simulating broadband signals in a single simulation, which is a huge attraction in comparison to other methods. This is possible because FDTD is based in the time domain, as opposed to other simulation techniques which are based in the frequency domain, such as the Method of Moments or Finite Element Analysis, which solve different frequencies separately. The solution proceeds by solving for the electric and then the magnetic fields one after the other in a leapfrog fashion, allowing the fields to evolve as they would do in a real experiment. This allows the scientist to watch the fields progression as the experiment continues, which allows insight into the experimental conditions. 10

11 Chapter 2 Introduction to FD-FDTD 2.1 FDTD Finite Difference Time Domain (FDTD) methods seek to discretise the partial differential Maxwell Equations to make them more suitable to be run on a computer. The abstraction used is that of the Yee algorithm, which is used to solve for both the electric and magnetic fields in time and space. Fig. 2.1, taken from [3], shows how the electric field (E) and magnetic field (H) x, y and z components are arranged for a particular point in space. Calculations for any of these points will use values from the neighbouring points, as well as constants set throughout the simulation for boundary conditions and other factors. The simulation space is represented as a collection of these grids in space. Another important feature of the Yee Algorithm is the order in which the calculations are carried out. The leapfrog arrangement, as it is known, states that the calculations for all the E components take place for a particular time t, then the H calculations are carried out for the time t + t, where t is the time increment being 2 used. An important fact to realise is that for each, the latest possible values are used. So for the calculation of H at t + t, the values of E at time t are used. 2 FDTD is an attractive method for simulating UWB environments due to its base in the time domain, which means that it can simulate a wide range of frequencies in a single simulation for a constant medium. The disadvantages of using FDTD increase with the amount of accuracy required, because a more accurate simulation requires finer grid points, which means a large increase in memory and computational resources. 11

12 2.2 FD-FDTD Figure 2.1: Yee Algorithm Field Components FDTD does have more serious disadvantages however, relating to its treatment of permittivity (ɛ) and permeability (µ) values. Permittivity is the capacity of a material to transmit an electric field, and permeability is the degree of magnetisation of a material in response to a magnetic field, so these are very important characteristics for this simulation. In the normal FDTD calculations, ɛ and µ are treated as constants for the material. This is an unacceptable abstraction due to the very nature of ɛ and µ. For instance, a transmission in the band specified as visible light will be unable to penetrate a brick whereas a different frequency of signal, for instance radio, could penetrate it with little difficulty. Therefore, ɛ and µ change with the frequency of the signal they are interacting with. FD-FDTD is an extension to FDTD which incorporates a Frequency Dependent component into the equations. Derivation of both the normal FDTD and FD-FDTD can be found in [5], and the application of Frequency 12

13 Dependant components to Maxwell s equations can be found in Appendix A. Obviously, as the range of frequencies in a signal increases, so does the difficulty of simulating in a FDTD environment. In this case FD-FDTD becomes a much more reasonable choice. Another consideration to take into account is that for FD-FDTD to simulate complex environments such as the human body a very small grid size and thus a small time step for explicit computation to remain stable so the numerical simulation results can be reasonably accurate [5]. Naturally, the increase in granularity of the grid and the reduction of the time step vastly increases the amount of computation necessary, and this is where a parallel, high performance, implementation of the algorithm begins to hold particular interest. 13

14 Chapter 3 Introduction to Locally One Dimensional Techniques in FDTD 3.1 Implicit FDTD The FD-FDTD method represents a good way to simulate UWB signal propagation through materials where the frequency of the signal has an effect on the conditions it experiences. Problems arise with the method when attempting to simulate the high frequencies or fine geometries that require a fine grid size and small time step to retain the stability of the simulation, but at the cost of vastly increasing the computation time. This limitation is known as the Courant-Friedrichs-Lewy (CFL) condition. As a result of these shortfalls of the algorithm, implicit methods were used to provide some improvement. These methods take the explicit FD-FDTD equations and decouple the spatial and time steps with an implicit discretisation of the time step. One method, known as the Alternating Direction Implicit method (ADI-FDTD) [6], a member of the Crank-Nicolson family, overcomes the necessities of satisfying the CFL conditions that cause the large amount of computation for FD-FDTD, but at the expense of introducing an uncontrollable amount of truncation error, resulting in larger numerical errors. Implicit schemes such as ADI-FDTD and Crank-Nicolson FDTD are able to take larger time steps due to the execution of more work at each time step, because instead of explicitly solving equations for variables in terms of previous values, a set of equations is used to obtain the solution. Appendix A contains the application of Maxwell s Equations to produce the set of equations that will be worked with in this project. This specific set of equations could have been produced differently, if for instance the Crank Nicolson algorithm were used 14

15 instead of LOD. Therefore in the following sections it will be useful to establish where the different methods branch from the detailed expression in appendix A. 3.2 Crank-Nicolson based schemes Crank-Nicolson in general The Crank-Nicolson scheme is the base of all the implicit finite difference method for solving partial differential equations. This method was proposed by John Crank and Phyllis Nicolson [7], and it takes the average of forward Euler and backward Euler finite difference approximations to produce the following method: φ n+1 i φ n i t = 1 2 ( φ n+1 i+1 2φn+1 i + φ n+1 ) i 1 + φn i+1 2φ n i + φ n i 1 ( x) 2 ( x) 2 (3.1) The Crank-Nicolson scheme calculates Eq.(A.15) directly. This involves computation with a very large sparse matrix, and this matrix handling takes up most of the computational time and memory. For this reason, an alternative method is being used for this simulation. Research applying CN to FDTD has been carried out by multiple research groups [8] [9] showing that CN-FDTD has better accuracy compared to ADI-FDTD. This method has been shown to be more accurate and unconditionally stable in 3 dimensions even when the time step is 20 times larger than the original explicit FDTD equations could allow due to the CFL limit [8] ADI-FDTD The Alternating Direction Implicit method makes use of tridiagonal matrices instead of the large sparse matrices of the Crank Nicolson method. In this method Eq.(A.15) is approximately factorized as φ n+1 where (1 + A)(1 + B) = 1 + tq 2P (1 + A)(1 + B) (1 A)(1 B) φn (3.2) 15

16 This is quite different from the CN method shown above. The computation is then performed using the following 2 sub-steps: φ n+ 1 2 = (1 + A) (1 B) φn (3.3) φ n+1 = (1 + B) 1 (1 A) φn+ 2 (3.4) This calculation solves the tridiagonal matrices required, but the parallelisation of this scheme is nontrivial due to the interdependencies of the directions field values LOD-FDTD The derivation of FD-LOD-FDTD is presented in Appendix A. The general idea behind this scheme is to split the large implicit equations into a simple 3 step algorithm, which is shown in Eq.(A.17),Eq.(A.18) and Eq.(A.19). This has great benefits in terms of parallelisation, because each direction (x, y and z) can be independently calculated in each step. This can be seen in Eq.(A.20); where the first 3 elements calculate x, y and z electric field components and make no reference to each other, the next 3 calculate the magnetic field components and also make no reference to each other and so on. This makes this scheme very attractive computationally, as well as having the same independence of the CFL limit. 16

17 Chapter 4 Statement of Aims and Objectives The primary aim of this project is to improve the computational efficiency of the implicit -FDTD algorithm. The computation should require small amount of memory and computational time. A secondary aim is to have the implementation with good efficiency characteristics and low execution time under a particular computational environment, the EUGrid. There are two sites in EUGrid which can be made use of, based in France and Ireland, where 2GB and 0.5 GB of memory is available per core, respectively. To be able to run the code at both sites the message passing paradigm has been selected, given the small amount of memory per core in the site based in Ireland. 4.1 Objectives Objectives have been set in order to allow efficient use of time towards these aims. The first of these objectives is to understand CN-FDTD mathematically, as well as the LOD-FDTD algorithm, because it seems to have a higher possibility of efficient parallelisation. Understanding how and why each method calculates the equations the way it does will greatly assist in figuring out effective ways that parallelisation could take place at an algorithmic level. The second objective is to understand the current serial code. This code has been implemented in Fortran to perform the correct calculations, but it has not been optimised for good use of memory or low execution times, therefore it does not exhibit good characteristics for either. The majority of the time spent on this objective will be spent studying the code itself, perhaps with some sample runs to produce output so intermediate values can be seen. 17

18 The third objective is to undertake analysis of the current code to find the sections of the code in which the most work is done, and to look at ways in which those sections could be parallelised or otherwise optimised to present better characteristics. This objective will mostly be done with analysis of the code itself, followed by repeated execution with timing instructions in place to see how the execution behaves. The final objective is to implement parallelising modifications to the code which will reduce memory use and improve the execution time. This step will be heavily influenced by the previous objective, but will for the most part be run as an iterative process. Each possibility for optimisation will be investigated separately, possibly implemented and tested, then the results will dictate whether the modification will be kept. 18

19 Chapter 5 Literary Review 5.1 Improving the Efficiency of CN There did not appear to be a great deal of previous research done on the problem of parallelising CN based equations, however a great deal of work has been done on parallelising Partial Differential Equations in general. Because the CN-FD-FDTD equations fall under that distinction, these algorithms could be used, albeit modified Pipelined SOR The paper [10] proposes an innovative approach to successive overrelaxation, distinct from red-black synchronous methods. The theory behind this method is to use multiple processing units to parallelise an otherwise unmodified SOR equation that is not optimised for parallelism. The explanation for this is easiest given an example. For this example, a matrix of size 5 by 5 elements is given. At a time t, each element can calculate its value at t + 1 using a function of its 4 direct neighbors and itself. So if element 3,3 were to calculate its value for time t + 1 it would use elements 2,3, 3,2, 3,4, 4,3, and its own value at 3,3, all from time t. The algorithm works by setting one iteration going on Processor 1 on time t, then waiting until such a time as the first iteration has completed the first set of values that would be necessary to calculate for time t+1, then it starts a new processor on the next iteration, using the new values calculated. Synchronisation must be used to ensure the processes do not get ahead of each other. The table 5.1 shows the stages each process would be in at any particular time. At the point where 2 processes show sync it indicates that those processes are synchronising and ensuring that the next process is 19

20 ready to begin calculating the next iteration. More synchronisation points would be needed to ensure a process does not get ahead of the values that are ready for it. It should be noted that this example is not optimised. t P1 t P2 t P3 1 1, ,5 1 2, ,5 1 sync 2 sync 1 3,1 2 1, ,5 2 1,1 1 4,1 2 2, ,5 2 2,5 2 sync 3 sync 1 5,1 2 3,1 3 1, ,5 2 3,5 3 1,5 2 4,1 3 2, ,5 3 2,5 4 sync 3 sync 4 1,1 2 5,1 3 3, ,5 2 5,5 3 3,5 Table 5.1: Execution Timeline for Pipelined SOR The disadvantages of this method obviously relate to the large amount of synchronisation necessary to ensure that one processing element does not get ahead of where it should be. This will cause the entire program to run slower, due to the additional overheads entailed from synchronisation, load imbalance, and additional code. Another problem could arise from the complexity of the algorithm itself, and the difficulty entailed with implementing it within the program. 20

21 5.1.2 Domain Decomposition A regularly explored option to exploit parallelism in otherwise non-parallel code is to split up the domain into subdomains that can be worked on in parallel by multiple processors. The advantage of this approach is that if the subdomains need to share data, they can do so with short messages, and if the division is done well, there will be fewer of these messages. Figure 5.1 shows a 1 dimensional domain divided over 3 processors (albeit unevenly) at three timesteps, t-1, t, and t+1. Processor q already has all the information necessary to calculate the grey shaded elements at t+1, but in order to calculate the black shaded elements it requires additional information from process q-1 and q+1. These processes share information in order to calculate these new values. Figure 5.1: Domain Decomposition communication Problems arise from this algorithm when it comes to data communication. The exchange of the data items causes a large amount of latency, compared with the time it takes to load a particular element from memory or even hard disk. The data exchange also introduces synchronisation issues which could cause load imbalance if the data is not partitioned effectively, whereby one processing element could end up waiting for another to reach the same point before it is able to continue. This algorithm also suffers from overheads from additional code, but it allows the algorithm to run in less memory, which is a necessary consideration given the execution environment. It is also important to note that the boundaries of the environment will be subject to slightly modified equations, which could cause further load imbalance. 5.2 Improving the Efficiency of LOD The nature of the LOD calculation naturally lends itself to parallelism. As noted in section 3.2.3, there is some inherent parallelism possible from basic analysis of the equations necessary. Specifically, that each of the sets of x, y and z components can 21

22 be calculated in parallel. Due to the step by step nature of the equations, the H components require information from the freshly calculated E components, as do the G1 and G2 components from the new H components. Researchers have exploited this nature in other applications than FDTD, allowing multiple operations to be carried out in parallel [13], or by making use of data partitioning to allow multiple processing elements to work on the same matrix at the same time [14]. 22

23 Chapter 6 Methodology 6.1 Programming Methodology The primary methodology which will be used for this project is the iterative one, with some initial set up and finalising actions. Before the main iterative cycle can begin, it is important to understand the original code, so that the mathematics can remain intact, and the properties of the algorithm, such as its stability, remain unchanged. The next important stage is to evaluate the current code for weaknesses in programming techniques and algorithms. This evaluation will be lead by background literature and previous studies into programming in Fortran, so the natural next step is to implement changes to the serial code to make it more efficient. This will include changes to improve stride based access, make good use of cache coherence, improve pipelining and other techniques to obtain better performance [12]. The next stage of the process is where the iterative nature of development begins. Each iteration begins with an analysis of the current standing of the code, with the knowledge gained from the literary review on techniques and algorithms which could allow better memory use or performance. Once an area of the code to target and a method to use has been established an attempt will be made to estimate the sort of improvement that this modification could have, along with any overheads this may incur and how they might best be dealt with. With this done, a sub-methodology begins, namely the bottom-up methodology, whereby the new algorithm is implemented in stages of complexity, each stage building on the previous stage s tested achievements. This is an effective way to program complicated algorithms, as it ensures each part of the code performs as expected before further code is built that relies upon it. Once the 23

24 implementation is complete the changes can be tested in full, and memory use and performance benchmarks can be obtained. The next stage should be to compare the results obtained against the expected results, and to analyse any unduly large discrepancies. This analysis may lead to the changes being kept in the program, being discarded, or perhaps even a new algorithm being proposed based on the new understanding gleaned from the results. At this point a new iteration of this process can begin. When the development cycle is coming to an end the iterative cycle stops and attention is paid to the final results obtained. The memory use and performance that could be extracted from the program will be analysed against the initial memory use and performance that was obtained from the original code. Other measurements can also be taken to establish the efficiency, speedup, and temporal performance of the new code, and to otherwise establish how successful the modifications to the code were. These final results will form a core part of the final report. 24

25 Chapter 7 Planning Chapter 7 describes the tasks involved in this project. Literature Survey The undertaking of directed and undirected research into the current standing in the field. Research topics include the application of Crank-Nicolson to FDTD, derivation of CN-FDTD and attempts to parallelise Crank-Nicolson scheme and Locally One Dimensional scheme in general. Feasibility Report This task includes the creation and submission of this feasibility report, including the transfer of knowledge from the literature survey into it. The completion of this report is milestone 1, because it signifies the transition from background work and reports into the true content of the project. Understanding Code The work to be undertaken in this section entails thorough examination and understanding of the currently existing serial code to perform the Crank-Nicolson FDTD calculations. This step is very important, as the understanding gleaned from the initial code will allow insights into how the code could be improved. Optimisation Points This task requires additional examination of the code, complete with timing tests, to discover where the majority of the work is being done in the program, and where the largest amount of memory is required. This will allow focus to be obtained on which parts of the code require the greatest attention. While this section could conceivably be a part of the previous task, it is considered important enough to be separate. The completion of this task is considered milestone 2, because it concludes the analysis of the current code and how it could be improved. 25

26 Modification of Code The longest of all the singular tasks, this contains the actual focus of the project; the parallelisation of the CN-FDTD code. This section, as well as others, is described in more detail in chapter 6. Verification of Code This task, which will run alongside and beyond the previous section, will be the testing and verification that the modified code to ensure that it is still producing the correct result. This section is one of the most important, because without it the code could indeed have a low execution time, but if it does not produce a valid simulation the end result is worthless. Performance Evaluation This section essentially contains the evaluation of the work done thus far. Since the aim of this project is to reduce the memory required whilst also reducing execution time, the performance of the implementation can be evaluated against these criteria, and the overall success of the project can be considered. The completion of this section indicates milestone 3, because of the transition from dealing with code to the report construction stage. By this milestone, some concrete measures of how successful the project has been would be expected to be available. Report Construction This final section contains the fabrication of the final report, complete with information from the literary survey, details on the implementation options explored and justification for them, and evaluation of the end result, as well as any partial results that are deemed important. The completion of this report is, naturally, the final milestone in the project, as it entails the project s completion. Task Name Feb Mar Apr May Jun Jul Aug Sep Literature Survey Feasibility Report Understanding Code Optimisation Points Modification of Code Verification of Code Performance Evaluation Report Construction Table 7.1: Gantt Chart for Project Completion 26

27 Bibliography [1] F , Revision of Part 15 of the Commission s Rules Regarding Ultra- Wideband Transmission Systems, First Report and Order, Washington DC, Adopted 14 Feb 2002, Released 22 April [2] D. Porcino and W. Hirt, Ultra-wideband radio technology: Potential and challenges ahead, IEEE Communications Magazine, July [3] K. Yee, Numerical solution of initial boundary value problems involving maxwell s equations in isotropic media, Antennas and Propagation, IEEE Transactions, May [4] P. Debye, Polar Molecules. Boston, MA: Dover, [5] A. Taflove and S. C. Hagness, Computational Electrodynamics: The Finite- Difference Time-Domain Method, 3rd ed. New York: Artech House Publishers, [6] Z. C. F. Zheng and J. Zhang, A finite-difference time-domain method without the courant stability condition, IEEE Microwave Guided wave Lett 9:441, [7] J. Crank and P. Nicolson, A practical method for numerical evaluation of solutions of partial differential equations of the heat conduction type, Advances in Computational Mathematics, [8] Y. H. Q. Chen R.S. Yang, The three-dimensional unconditionally stable fdtd algorithm based on crank-nicolson method, IEEE Antennas and Propagation Society International Symposium, [9] G. Sun and C. W. Trueman, Unconditionally stable cranknicolson scheme for solving the two-dimensional maxwells equations, IEE Electron. Lett., vol. 39, p

28 [10] W. D. JP Bonomo, Pipleined successive overrelaxation, in Parallel Supercomputing: Method, Algorithms and Applications. New York: John Wiley and Sons, [11] Parallel algorithm for the solutions of pdes in linux clustered workstations, Applied Mathematics and Computation, vol. 200, no. 1, pp , [12] A. H. Stefan Goedecker, Performance optimization of numerically intensive codes. SIAM, [13] A. K. D.A Voss, Parallel lod methods for second order time dependent pdes, Computers and Mathematics with Applications, vol. 30, no. 10, pp , [14] R. Ciegis, Parallel lod scheme for 3d parabolic problem with nonlocal boundary condition, in Lectures in Computer Science. Springer-Verlag,

29 Appendix A Derivation of FD-LOD-FDTD Equations Maxwell equations can be written as H t = 1 µ En (A.1) H n+ 1 2 = D t (A.2) The frequency dependent complex relative permittivity ɛ r is ɛ r = D = ɛ 0 ɛ r E is properly handled using ɛ r σ jωɛ 0 + ɛ + ɛ S ɛ m 1 + jωτ D + ɛ m ɛ 1 + jωτ 2 (A.3) D = ɛ 0 ( σ jωɛ 0 + ɛ + ɛ S ɛ m 1 + jωτ D + ɛ m ɛ 1 + jωτ 2 )E (A.4) Eq.(A.2) can be modified using Eq.(A.4) as H n+ 1 2 D = t σe + ɛ E 0ɛ t + 1 Gn G n

30 where { } ɛ ɛ S ɛ m jωτ E D jω G 1 = = ɛ 0 (ɛ S ɛ m ) E (A.5) t 1 + jωτ { } D ɛ ɛ m ɛ jωτ E 2 jω G 2 = = ɛ 0 (ɛ m ɛ ) E (A.6) t 1 + jωτ 2 Eq.(A.5) is modified as (1 + jωτ D )G 1 = ɛ 0 (ɛ S ɛ m )jωe (A.7) G 1 G 1 + τ D = ɛ 0 (ɛ S ɛ m ) E t t (A.8) The same thing as the first pole goes to the second pole of the Debye model as follows: (1 + jωτ 2 )G 2 = ɛ 0 (ɛ m ɛ )jωe (A.9) G 2 G 2 + τ 2 = ɛ 0 (ɛ m ɛ ) E t t (A.10) Eq.(A.5) without n Eq.(A.11): notation and Eq.(A.1), Eq.(A.8) and Eq.(A.10) produce P φ t = Qφ (A.11) where P is Eq.(A.12) and Q is Eq.(A.13) and φ is Eq.(A.14) µ 0 2 µ 0 3 µ 0 4 ɛ 0 ɛ 5 ɛ 0 ɛ 6 ɛ 0 ɛ (A.12) 7 ɛ 0 (ɛ S ɛ m) τ D 8 ɛ 0 (ɛ S ɛ m) τ D 9 ɛ 0 (ɛ S ɛ m) τ D B10 ɛ 0 (ɛ m ɛ ) τ 2 11 ɛ 0 (ɛ m ɛ ) τ A 2 12 ɛ 0 (ɛ m ɛ ) τ 2 30

31 z y 2 z x 3 y x 4 σ 1 1 z y 5 σ 1 1 z x 6 σ 1 1 y x (A.13) ( (Hx, H y, H z, E x, E y, E z, G 1x, G 1y, G 1z, G 2x, G 2y, G 2z ) T) (A.14) When the Crank-Nicolson scheme is introduced to Eq.(A.11), φ n+1 = 1 + tq 2P 1 tq 2P φ n (1 + X R)(1 + Y R )(1 + Z R ) (1 X R )(1 Y R )(1 Z R ) φn (A.15) where tq 2P is split into three equations as in Eq.(A.16) tq 2P = X R + Y R + Z R (A.16) and X R, Y R, and Z R are dimensional matrices whose elements are composed of the Debye media parameters as well as x, y, z and t. The calculation of Eq.(A.15) is performed in three steps as in Eq.(A.17),Eq.(A.18) and Eq.(A.19). φ n+ 2 3 φ n+ 1 3 = 1 + X R 1 X R φ n (A.17) 1 + Y R = φ n+ 1 3 (A.18) 1 Y R 31

32 φ n+1 = 1 + Z R φ n+ 2 3 (A.19) 1 Z R Eq.(A.17) is identical to Eq.(A.20) where Υ 6,Υ 7,Υ 8, Υ 1,Υ 4,Υ 5,Υ 3,Υ 2, Υ 9,Υ 10,Υ 11, Υ 12,Υ 13,Υ 15 are composed of the location dependent media parameters. Equations of y, z direction are easily obtained by the permutation of the equations of x direction. 32

33 n n n n+ E 1 Υ 6 G x x 3 = 1 + Υ 7 G x 2 + Υ 8 E x Υ 5 n+ E n+ n E y 3 y 3 Υ1 Υ 4 = E n H z x 2 y 2Υ 1 x + Υ1Υ 2E y 4 x 2 n+ E n+ n E z 3 z 3 Υ1 Υ 4 = E n H y x 2 z + 2Υ 1 x + Υ1Υ 2E z 4 x 2 n+ H 1 x 3 = n Hx n n+ n+ H 1 y 3 = n E z Hy + Υ 4 x + Υ E 1 z 3 4 x n n+ n+ H 1 z 3 = n E y Hz Υ 4 x Υ E 1 y 3 4 x n+ G 1 3 x 1 = Υ n n n 9G x 1 + Υ 10 G x 2 + Υ 11 E x Υ 5 n+ G 1 n n+ 3 n y 1 = G y 1 tυ H z 2 x tυ H 1 z 3 2 x n+ G 1 n n+ 3 n H y z 1 = G z1 + tυ 2 x + tυ H 1 y 3 2 x n+ G 1 3 x 2 = Υ n n n 12G x 1 + Υ 13 G x2 + Υ 15 E x Υ 5 n+ G 1 n n+ 3 n y 2 = G y 2 tυ H z 3 x tυ H 1 z 3 3 x n+ G 1 n n+ 3 n H y z 2 = G z2 + tυ 3 x + tυ H 1 y 3 3 x n n (A.20) 33

A Diagonal Split-cell Model for the High-order Symplectic FDTD Scheme

PIERS ONLINE, VOL. 2, NO. 6, 2006 715 A Diagonal Split-cell Model for the High-order Symplectic FDTD Scheme Wei Sha, Xianliang Wu, and Mingsheng Chen Key Laboratory of Intelligent Computing & Signal Processing