COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD

Size: px
Start display at page:

Download "COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD"

Transcription

1 COMPUTATIONAL EFFICIENCY IMPROVEMENT ON THE IMPLICIT FINITE DIFFERENCE TIME DOMAIN METHOD A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF MASTER OF SCIENCE IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES 2009 By Philip Clapham School of Computer Science

2 Contents Abstract 6 Declaration 7 Copyright 8 1 Introduction to Computation in Electromagnetics High Performance Computation Maxwell s Equations Introduction to FD-FDTD FDTD FD-FDTD Introduction to Locally One Dimensional Techniques in FDTD Implicit FDTD Crank-Nicolson based schemes Crank-Nicolson in general ADI-FDTD LOD-FDTD Statement of Aims and Objectives Objectives Literary Review Improving the Efficiency of CN Pipelined SOR Domain Decomposition Improving the Efficiency of LOD

3 6 Methodology Programming Methodology Planning 25 Bibliography 27 A Derivation of FD-LOD-FDTD Equations 29 3

4 List of Tables 5.1 Execution Timeline for Pipelined SOR Gantt Chart for Project Completion

5 List of Figures 2.1 Yee Algorithm Field Components Domain Decomposition communication

6 Abstract This is the feasibility report for the Msc level dissertation to be undertaken in the coming months. The aim of this project is to implement the parallelisation of the Frequency Dependent Locally One Dimensional Finite Difference Time Domain algorithm, for which initial serial code has already been produced, in such a manner as to reduce memory usage and execution time on the intended platform of the EUGrid. 6

7 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 7

8 Copyright i. The author of this thesis (including any appendices and/or schedules to this thesis) owns any copyright in it (the Copyright ) and s/he has given The University of Manchester the right to use such Copyright for any administrative, promotional, educational and/or teaching purposes. ii. Copies of this thesis, either in full or in extracts, may be made only in accordance with the regulations of the John Rylands University Library of Manchester. Details of these regulations may be obtained from the Librarian. This page must form part of any such copies made. iii. The ownership of any patents, designs, trade marks and any and all other intellectual property rights except for the Copyright (the Intellectual Property Rights ) and any reproductions of copyright works, for example graphs and tables ( Reproductions ), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property Rights and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property Rights and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and exploitation of this thesis, the Copyright and any Intellectual Property Rights and/or Reproductions described in it may take place is available from the Head of School of School of Computer Science (or the Vice-President). 8

9 Chapter 1 Introduction to Computation in Electromagnetics 1.1 High Performance Computation In the current scientific climate, computation is seen as an enabler for a vast number of applications. As theories become more complex, more computation is required to allow the expression of these theories to simulate experiments. As the performance of the computers increases, so do the expectations of the scientific community for them to carry out the experiments in ever quicker times. It is for these reasons that the field of high performance computing evolved, and explored various avenues of advancement to allow it to satisfy the demands placed upon it. One direction which is seeing rapid development is the field of massively parallel computation, whereby multiple threads or processes of execution run simultaneously. These computations may take place in a single computer, on the now prevalent multi-core processors, or across many such computers, connected together to provide a unified execution space. There are different specifications for the way in which this computation is carried out. One such way dictates that all the threads of execution address the same memory to perform their part of the computation. This shared memory paradigm can be utilised by libraries such as OpenMP. Another specification asserts that each process of the execution runs separately, has its own memory, and whenever it needs to exchange information with another process it does so using messages. This message passing paradigm has different uses to the shared memory one, and each will be better for particular kinds of high performance computation. 9

10 1.2 Maxwell s Equations The set of four partial differential equations known collectively as Maxwell s Equations can be used to fully describe the characteristics and interactions of electric and magnetic fields. These equations can be used to simulate propagation characteristics of electromagnetic waves and light through a continuous time and space domain, or through different mediums or in the frequency domain. When simulating a broadband signal, the most widely used family of computational electromagnetics algorithms, the Finite Difference Time Domain method (FDTD), is frequently used because it directly solves Maxwell s curl equations with a minimal set of assumptions, thus providing a robust, straightforward method. It is also capable of simulating broadband signals in a single simulation, which is a huge attraction in comparison to other methods. This is possible because FDTD is based in the time domain, as opposed to other simulation techniques which are based in the frequency domain, such as the Method of Moments or Finite Element Analysis, which solve different frequencies separately. The solution proceeds by solving for the electric and then the magnetic fields one after the other in a leapfrog fashion, allowing the fields to evolve as they would do in a real experiment. This allows the scientist to watch the fields progression as the experiment continues, which allows insight into the experimental conditions. 10

11 Chapter 2 Introduction to FD-FDTD 2.1 FDTD Finite Difference Time Domain (FDTD) methods seek to discretise the partial differential Maxwell Equations to make them more suitable to be run on a computer. The abstraction used is that of the Yee algorithm, which is used to solve for both the electric and magnetic fields in time and space. Fig. 2.1, taken from [3], shows how the electric field (E) and magnetic field (H) x, y and z components are arranged for a particular point in space. Calculations for any of these points will use values from the neighbouring points, as well as constants set throughout the simulation for boundary conditions and other factors. The simulation space is represented as a collection of these grids in space. Another important feature of the Yee Algorithm is the order in which the calculations are carried out. The leapfrog arrangement, as it is known, states that the calculations for all the E components take place for a particular time t, then the H calculations are carried out for the time t + t, where t is the time increment being 2 used. An important fact to realise is that for each, the latest possible values are used. So for the calculation of H at t + t, the values of E at time t are used. 2 FDTD is an attractive method for simulating UWB environments due to its base in the time domain, which means that it can simulate a wide range of frequencies in a single simulation for a constant medium. The disadvantages of using FDTD increase with the amount of accuracy required, because a more accurate simulation requires finer grid points, which means a large increase in memory and computational resources. 11

12 2.2 FD-FDTD Figure 2.1: Yee Algorithm Field Components FDTD does have more serious disadvantages however, relating to its treatment of permittivity (ɛ) and permeability (µ) values. Permittivity is the capacity of a material to transmit an electric field, and permeability is the degree of magnetisation of a material in response to a magnetic field, so these are very important characteristics for this simulation. In the normal FDTD calculations, ɛ and µ are treated as constants for the material. This is an unacceptable abstraction due to the very nature of ɛ and µ. For instance, a transmission in the band specified as visible light will be unable to penetrate a brick whereas a different frequency of signal, for instance radio, could penetrate it with little difficulty. Therefore, ɛ and µ change with the frequency of the signal they are interacting with. FD-FDTD is an extension to FDTD which incorporates a Frequency Dependent component into the equations. Derivation of both the normal FDTD and FD-FDTD can be found in [5], and the application of Frequency 12

13 Dependant components to Maxwell s equations can be found in Appendix A. Obviously, as the range of frequencies in a signal increases, so does the difficulty of simulating in a FDTD environment. In this case FD-FDTD becomes a much more reasonable choice. Another consideration to take into account is that for FD-FDTD to simulate complex environments such as the human body a very small grid size and thus a small time step for explicit computation to remain stable so the numerical simulation results can be reasonably accurate [5]. Naturally, the increase in granularity of the grid and the reduction of the time step vastly increases the amount of computation necessary, and this is where a parallel, high performance, implementation of the algorithm begins to hold particular interest. 13

14 Chapter 3 Introduction to Locally One Dimensional Techniques in FDTD 3.1 Implicit FDTD The FD-FDTD method represents a good way to simulate UWB signal propagation through materials where the frequency of the signal has an effect on the conditions it experiences. Problems arise with the method when attempting to simulate the high frequencies or fine geometries that require a fine grid size and small time step to retain the stability of the simulation, but at the cost of vastly increasing the computation time. This limitation is known as the Courant-Friedrichs-Lewy (CFL) condition. As a result of these shortfalls of the algorithm, implicit methods were used to provide some improvement. These methods take the explicit FD-FDTD equations and decouple the spatial and time steps with an implicit discretisation of the time step. One method, known as the Alternating Direction Implicit method (ADI-FDTD) [6], a member of the Crank-Nicolson family, overcomes the necessities of satisfying the CFL conditions that cause the large amount of computation for FD-FDTD, but at the expense of introducing an uncontrollable amount of truncation error, resulting in larger numerical errors. Implicit schemes such as ADI-FDTD and Crank-Nicolson FDTD are able to take larger time steps due to the execution of more work at each time step, because instead of explicitly solving equations for variables in terms of previous values, a set of equations is used to obtain the solution. Appendix A contains the application of Maxwell s Equations to produce the set of equations that will be worked with in this project. This specific set of equations could have been produced differently, if for instance the Crank Nicolson algorithm were used 14

15 instead of LOD. Therefore in the following sections it will be useful to establish where the different methods branch from the detailed expression in appendix A. 3.2 Crank-Nicolson based schemes Crank-Nicolson in general The Crank-Nicolson scheme is the base of all the implicit finite difference method for solving partial differential equations. This method was proposed by John Crank and Phyllis Nicolson [7], and it takes the average of forward Euler and backward Euler finite difference approximations to produce the following method: φ n+1 i φ n i t = 1 2 ( φ n+1 i+1 2φn+1 i + φ n+1 ) i 1 + φn i+1 2φ n i + φ n i 1 ( x) 2 ( x) 2 (3.1) The Crank-Nicolson scheme calculates Eq.(A.15) directly. This involves computation with a very large sparse matrix, and this matrix handling takes up most of the computational time and memory. For this reason, an alternative method is being used for this simulation. Research applying CN to FDTD has been carried out by multiple research groups [8] [9] showing that CN-FDTD has better accuracy compared to ADI-FDTD. This method has been shown to be more accurate and unconditionally stable in 3 dimensions even when the time step is 20 times larger than the original explicit FDTD equations could allow due to the CFL limit [8] ADI-FDTD The Alternating Direction Implicit method makes use of tridiagonal matrices instead of the large sparse matrices of the Crank Nicolson method. In this method Eq.(A.15) is approximately factorized as φ n+1 where (1 + A)(1 + B) = 1 + tq 2P (1 + A)(1 + B) (1 A)(1 B) φn (3.2) 15

16 This is quite different from the CN method shown above. The computation is then performed using the following 2 sub-steps: φ n+ 1 2 = (1 + A) (1 B) φn (3.3) φ n+1 = (1 + B) 1 (1 A) φn+ 2 (3.4) This calculation solves the tridiagonal matrices required, but the parallelisation of this scheme is nontrivial due to the interdependencies of the directions field values LOD-FDTD The derivation of FD-LOD-FDTD is presented in Appendix A. The general idea behind this scheme is to split the large implicit equations into a simple 3 step algorithm, which is shown in Eq.(A.17),Eq.(A.18) and Eq.(A.19). This has great benefits in terms of parallelisation, because each direction (x, y and z) can be independently calculated in each step. This can be seen in Eq.(A.20); where the first 3 elements calculate x, y and z electric field components and make no reference to each other, the next 3 calculate the magnetic field components and also make no reference to each other and so on. This makes this scheme very attractive computationally, as well as having the same independence of the CFL limit. 16

17 Chapter 4 Statement of Aims and Objectives The primary aim of this project is to improve the computational efficiency of the implicit -FDTD algorithm. The computation should require small amount of memory and computational time. A secondary aim is to have the implementation with good efficiency characteristics and low execution time under a particular computational environment, the EUGrid. There are two sites in EUGrid which can be made use of, based in France and Ireland, where 2GB and 0.5 GB of memory is available per core, respectively. To be able to run the code at both sites the message passing paradigm has been selected, given the small amount of memory per core in the site based in Ireland. 4.1 Objectives Objectives have been set in order to allow efficient use of time towards these aims. The first of these objectives is to understand CN-FDTD mathematically, as well as the LOD-FDTD algorithm, because it seems to have a higher possibility of efficient parallelisation. Understanding how and why each method calculates the equations the way it does will greatly assist in figuring out effective ways that parallelisation could take place at an algorithmic level. The second objective is to understand the current serial code. This code has been implemented in Fortran to perform the correct calculations, but it has not been optimised for good use of memory or low execution times, therefore it does not exhibit good characteristics for either. The majority of the time spent on this objective will be spent studying the code itself, perhaps with some sample runs to produce output so intermediate values can be seen. 17

18 The third objective is to undertake analysis of the current code to find the sections of the code in which the most work is done, and to look at ways in which those sections could be parallelised or otherwise optimised to present better characteristics. This objective will mostly be done with analysis of the code itself, followed by repeated execution with timing instructions in place to see how the execution behaves. The final objective is to implement parallelising modifications to the code which will reduce memory use and improve the execution time. This step will be heavily influenced by the previous objective, but will for the most part be run as an iterative process. Each possibility for optimisation will be investigated separately, possibly implemented and tested, then the results will dictate whether the modification will be kept. 18

19 Chapter 5 Literary Review 5.1 Improving the Efficiency of CN There did not appear to be a great deal of previous research done on the problem of parallelising CN based equations, however a great deal of work has been done on parallelising Partial Differential Equations in general. Because the CN-FD-FDTD equations fall under that distinction, these algorithms could be used, albeit modified Pipelined SOR The paper [10] proposes an innovative approach to successive overrelaxation, distinct from red-black synchronous methods. The theory behind this method is to use multiple processing units to parallelise an otherwise unmodified SOR equation that is not optimised for parallelism. The explanation for this is easiest given an example. For this example, a matrix of size 5 by 5 elements is given. At a time t, each element can calculate its value at t + 1 using a function of its 4 direct neighbors and itself. So if element 3,3 were to calculate its value for time t + 1 it would use elements 2,3, 3,2, 3,4, 4,3, and its own value at 3,3, all from time t. The algorithm works by setting one iteration going on Processor 1 on time t, then waiting until such a time as the first iteration has completed the first set of values that would be necessary to calculate for time t+1, then it starts a new processor on the next iteration, using the new values calculated. Synchronisation must be used to ensure the processes do not get ahead of each other. The table 5.1 shows the stages each process would be in at any particular time. At the point where 2 processes show sync it indicates that those processes are synchronising and ensuring that the next process is 19

20 ready to begin calculating the next iteration. More synchronisation points would be needed to ensure a process does not get ahead of the values that are ready for it. It should be noted that this example is not optimised. t P1 t P2 t P3 1 1, ,5 1 2, ,5 1 sync 2 sync 1 3,1 2 1, ,5 2 1,1 1 4,1 2 2, ,5 2 2,5 2 sync 3 sync 1 5,1 2 3,1 3 1, ,5 2 3,5 3 1,5 2 4,1 3 2, ,5 3 2,5 4 sync 3 sync 4 1,1 2 5,1 3 3, ,5 2 5,5 3 3,5 Table 5.1: Execution Timeline for Pipelined SOR The disadvantages of this method obviously relate to the large amount of synchronisation necessary to ensure that one processing element does not get ahead of where it should be. This will cause the entire program to run slower, due to the additional overheads entailed from synchronisation, load imbalance, and additional code. Another problem could arise from the complexity of the algorithm itself, and the difficulty entailed with implementing it within the program. 20

21 5.1.2 Domain Decomposition A regularly explored option to exploit parallelism in otherwise non-parallel code is to split up the domain into subdomains that can be worked on in parallel by multiple processors. The advantage of this approach is that if the subdomains need to share data, they can do so with short messages, and if the division is done well, there will be fewer of these messages. Figure 5.1 shows a 1 dimensional domain divided over 3 processors (albeit unevenly) at three timesteps, t-1, t, and t+1. Processor q already has all the information necessary to calculate the grey shaded elements at t+1, but in order to calculate the black shaded elements it requires additional information from process q-1 and q+1. These processes share information in order to calculate these new values. Figure 5.1: Domain Decomposition communication Problems arise from this algorithm when it comes to data communication. The exchange of the data items causes a large amount of latency, compared with the time it takes to load a particular element from memory or even hard disk. The data exchange also introduces synchronisation issues which could cause load imbalance if the data is not partitioned effectively, whereby one processing element could end up waiting for another to reach the same point before it is able to continue. This algorithm also suffers from overheads from additional code, but it allows the algorithm to run in less memory, which is a necessary consideration given the execution environment. It is also important to note that the boundaries of the environment will be subject to slightly modified equations, which could cause further load imbalance. 5.2 Improving the Efficiency of LOD The nature of the LOD calculation naturally lends itself to parallelism. As noted in section 3.2.3, there is some inherent parallelism possible from basic analysis of the equations necessary. Specifically, that each of the sets of x, y and z components can 21

22 be calculated in parallel. Due to the step by step nature of the equations, the H components require information from the freshly calculated E components, as do the G1 and G2 components from the new H components. Researchers have exploited this nature in other applications than FDTD, allowing multiple operations to be carried out in parallel [13], or by making use of data partitioning to allow multiple processing elements to work on the same matrix at the same time [14]. 22

23 Chapter 6 Methodology 6.1 Programming Methodology The primary methodology which will be used for this project is the iterative one, with some initial set up and finalising actions. Before the main iterative cycle can begin, it is important to understand the original code, so that the mathematics can remain intact, and the properties of the algorithm, such as its stability, remain unchanged. The next important stage is to evaluate the current code for weaknesses in programming techniques and algorithms. This evaluation will be lead by background literature and previous studies into programming in Fortran, so the natural next step is to implement changes to the serial code to make it more efficient. This will include changes to improve stride based access, make good use of cache coherence, improve pipelining and other techniques to obtain better performance [12]. The next stage of the process is where the iterative nature of development begins. Each iteration begins with an analysis of the current standing of the code, with the knowledge gained from the literary review on techniques and algorithms which could allow better memory use or performance. Once an area of the code to target and a method to use has been established an attempt will be made to estimate the sort of improvement that this modification could have, along with any overheads this may incur and how they might best be dealt with. With this done, a sub-methodology begins, namely the bottom-up methodology, whereby the new algorithm is implemented in stages of complexity, each stage building on the previous stage s tested achievements. This is an effective way to program complicated algorithms, as it ensures each part of the code performs as expected before further code is built that relies upon it. Once the 23

24 implementation is complete the changes can be tested in full, and memory use and performance benchmarks can be obtained. The next stage should be to compare the results obtained against the expected results, and to analyse any unduly large discrepancies. This analysis may lead to the changes being kept in the program, being discarded, or perhaps even a new algorithm being proposed based on the new understanding gleaned from the results. At this point a new iteration of this process can begin. When the development cycle is coming to an end the iterative cycle stops and attention is paid to the final results obtained. The memory use and performance that could be extracted from the program will be analysed against the initial memory use and performance that was obtained from the original code. Other measurements can also be taken to establish the efficiency, speedup, and temporal performance of the new code, and to otherwise establish how successful the modifications to the code were. These final results will form a core part of the final report. 24

25 Chapter 7 Planning Chapter 7 describes the tasks involved in this project. Literature Survey The undertaking of directed and undirected research into the current standing in the field. Research topics include the application of Crank-Nicolson to FDTD, derivation of CN-FDTD and attempts to parallelise Crank-Nicolson scheme and Locally One Dimensional scheme in general. Feasibility Report This task includes the creation and submission of this feasibility report, including the transfer of knowledge from the literature survey into it. The completion of this report is milestone 1, because it signifies the transition from background work and reports into the true content of the project. Understanding Code The work to be undertaken in this section entails thorough examination and understanding of the currently existing serial code to perform the Crank-Nicolson FDTD calculations. This step is very important, as the understanding gleaned from the initial code will allow insights into how the code could be improved. Optimisation Points This task requires additional examination of the code, complete with timing tests, to discover where the majority of the work is being done in the program, and where the largest amount of memory is required. This will allow focus to be obtained on which parts of the code require the greatest attention. While this section could conceivably be a part of the previous task, it is considered important enough to be separate. The completion of this task is considered milestone 2, because it concludes the analysis of the current code and how it could be improved. 25

26 Modification of Code The longest of all the singular tasks, this contains the actual focus of the project; the parallelisation of the CN-FDTD code. This section, as well as others, is described in more detail in chapter 6. Verification of Code This task, which will run alongside and beyond the previous section, will be the testing and verification that the modified code to ensure that it is still producing the correct result. This section is one of the most important, because without it the code could indeed have a low execution time, but if it does not produce a valid simulation the end result is worthless. Performance Evaluation This section essentially contains the evaluation of the work done thus far. Since the aim of this project is to reduce the memory required whilst also reducing execution time, the performance of the implementation can be evaluated against these criteria, and the overall success of the project can be considered. The completion of this section indicates milestone 3, because of the transition from dealing with code to the report construction stage. By this milestone, some concrete measures of how successful the project has been would be expected to be available. Report Construction This final section contains the fabrication of the final report, complete with information from the literary survey, details on the implementation options explored and justification for them, and evaluation of the end result, as well as any partial results that are deemed important. The completion of this report is, naturally, the final milestone in the project, as it entails the project s completion. Task Name Feb Mar Apr May Jun Jul Aug Sep Literature Survey Feasibility Report Understanding Code Optimisation Points Modification of Code Verification of Code Performance Evaluation Report Construction Table 7.1: Gantt Chart for Project Completion 26

27 Bibliography [1] F , Revision of Part 15 of the Commission s Rules Regarding Ultra- Wideband Transmission Systems, First Report and Order, Washington DC, Adopted 14 Feb 2002, Released 22 April [2] D. Porcino and W. Hirt, Ultra-wideband radio technology: Potential and challenges ahead, IEEE Communications Magazine, July [3] K. Yee, Numerical solution of initial boundary value problems involving maxwell s equations in isotropic media, Antennas and Propagation, IEEE Transactions, May [4] P. Debye, Polar Molecules. Boston, MA: Dover, [5] A. Taflove and S. C. Hagness, Computational Electrodynamics: The Finite- Difference Time-Domain Method, 3rd ed. New York: Artech House Publishers, [6] Z. C. F. Zheng and J. Zhang, A finite-difference time-domain method without the courant stability condition, IEEE Microwave Guided wave Lett 9:441, [7] J. Crank and P. Nicolson, A practical method for numerical evaluation of solutions of partial differential equations of the heat conduction type, Advances in Computational Mathematics, [8] Y. H. Q. Chen R.S. Yang, The three-dimensional unconditionally stable fdtd algorithm based on crank-nicolson method, IEEE Antennas and Propagation Society International Symposium, [9] G. Sun and C. W. Trueman, Unconditionally stable cranknicolson scheme for solving the two-dimensional maxwells equations, IEE Electron. Lett., vol. 39, p

28 [10] W. D. JP Bonomo, Pipleined successive overrelaxation, in Parallel Supercomputing: Method, Algorithms and Applications. New York: John Wiley and Sons, [11] Parallel algorithm for the solutions of pdes in linux clustered workstations, Applied Mathematics and Computation, vol. 200, no. 1, pp , [12] A. H. Stefan Goedecker, Performance optimization of numerically intensive codes. SIAM, [13] A. K. D.A Voss, Parallel lod methods for second order time dependent pdes, Computers and Mathematics with Applications, vol. 30, no. 10, pp , [14] R. Ciegis, Parallel lod scheme for 3d parabolic problem with nonlocal boundary condition, in Lectures in Computer Science. Springer-Verlag,

29 Appendix A Derivation of FD-LOD-FDTD Equations Maxwell equations can be written as H t = 1 µ En (A.1) H n+ 1 2 = D t (A.2) The frequency dependent complex relative permittivity ɛ r is ɛ r = D = ɛ 0 ɛ r E is properly handled using ɛ r σ jωɛ 0 + ɛ + ɛ S ɛ m 1 + jωτ D + ɛ m ɛ 1 + jωτ 2 (A.3) D = ɛ 0 ( σ jωɛ 0 + ɛ + ɛ S ɛ m 1 + jωτ D + ɛ m ɛ 1 + jωτ 2 )E (A.4) Eq.(A.2) can be modified using Eq.(A.4) as H n+ 1 2 D = t σe + ɛ E 0ɛ t + 1 Gn G n

30 where { } ɛ ɛ S ɛ m jωτ E D jω G 1 = = ɛ 0 (ɛ S ɛ m ) E (A.5) t 1 + jωτ { } D ɛ ɛ m ɛ jωτ E 2 jω G 2 = = ɛ 0 (ɛ m ɛ ) E (A.6) t 1 + jωτ 2 Eq.(A.5) is modified as (1 + jωτ D )G 1 = ɛ 0 (ɛ S ɛ m )jωe (A.7) G 1 G 1 + τ D = ɛ 0 (ɛ S ɛ m ) E t t (A.8) The same thing as the first pole goes to the second pole of the Debye model as follows: (1 + jωτ 2 )G 2 = ɛ 0 (ɛ m ɛ )jωe (A.9) G 2 G 2 + τ 2 = ɛ 0 (ɛ m ɛ ) E t t (A.10) Eq.(A.5) without n Eq.(A.11): notation and Eq.(A.1), Eq.(A.8) and Eq.(A.10) produce P φ t = Qφ (A.11) where P is Eq.(A.12) and Q is Eq.(A.13) and φ is Eq.(A.14) µ 0 2 µ 0 3 µ 0 4 ɛ 0 ɛ 5 ɛ 0 ɛ 6 ɛ 0 ɛ (A.12) 7 ɛ 0 (ɛ S ɛ m) τ D 8 ɛ 0 (ɛ S ɛ m) τ D 9 ɛ 0 (ɛ S ɛ m) τ D B10 ɛ 0 (ɛ m ɛ ) τ 2 11 ɛ 0 (ɛ m ɛ ) τ A 2 12 ɛ 0 (ɛ m ɛ ) τ 2 30

31 z y 2 z x 3 y x 4 σ 1 1 z y 5 σ 1 1 z x 6 σ 1 1 y x (A.13) ( (Hx, H y, H z, E x, E y, E z, G 1x, G 1y, G 1z, G 2x, G 2y, G 2z ) T) (A.14) When the Crank-Nicolson scheme is introduced to Eq.(A.11), φ n+1 = 1 + tq 2P 1 tq 2P φ n (1 + X R)(1 + Y R )(1 + Z R ) (1 X R )(1 Y R )(1 Z R ) φn (A.15) where tq 2P is split into three equations as in Eq.(A.16) tq 2P = X R + Y R + Z R (A.16) and X R, Y R, and Z R are dimensional matrices whose elements are composed of the Debye media parameters as well as x, y, z and t. The calculation of Eq.(A.15) is performed in three steps as in Eq.(A.17),Eq.(A.18) and Eq.(A.19). φ n+ 2 3 φ n+ 1 3 = 1 + X R 1 X R φ n (A.17) 1 + Y R = φ n+ 1 3 (A.18) 1 Y R 31

32 φ n+1 = 1 + Z R φ n+ 2 3 (A.19) 1 Z R Eq.(A.17) is identical to Eq.(A.20) where Υ 6,Υ 7,Υ 8, Υ 1,Υ 4,Υ 5,Υ 3,Υ 2, Υ 9,Υ 10,Υ 11, Υ 12,Υ 13,Υ 15 are composed of the location dependent media parameters. Equations of y, z direction are easily obtained by the permutation of the equations of x direction. 32

33 n n n n+ E 1 Υ 6 G x x 3 = 1 + Υ 7 G x 2 + Υ 8 E x Υ 5 n+ E n+ n E y 3 y 3 Υ1 Υ 4 = E n H z x 2 y 2Υ 1 x + Υ1Υ 2E y 4 x 2 n+ E n+ n E z 3 z 3 Υ1 Υ 4 = E n H y x 2 z + 2Υ 1 x + Υ1Υ 2E z 4 x 2 n+ H 1 x 3 = n Hx n n+ n+ H 1 y 3 = n E z Hy + Υ 4 x + Υ E 1 z 3 4 x n n+ n+ H 1 z 3 = n E y Hz Υ 4 x Υ E 1 y 3 4 x n+ G 1 3 x 1 = Υ n n n 9G x 1 + Υ 10 G x 2 + Υ 11 E x Υ 5 n+ G 1 n n+ 3 n y 1 = G y 1 tυ H z 2 x tυ H 1 z 3 2 x n+ G 1 n n+ 3 n H y z 1 = G z1 + tυ 2 x + tυ H 1 y 3 2 x n+ G 1 3 x 2 = Υ n n n 12G x 1 + Υ 13 G x2 + Υ 15 E x Υ 5 n+ G 1 n n+ 3 n y 2 = G y 2 tυ H z 3 x tυ H 1 z 3 3 x n+ G 1 n n+ 3 n H y z 2 = G z2 + tυ 3 x + tυ H 1 y 3 3 x n n (A.20) 33

A Diagonal Split-cell Model for the High-order Symplectic FDTD Scheme

A Diagonal Split-cell Model for the High-order Symplectic FDTD Scheme PIERS ONLINE, VOL. 2, NO. 6, 2006 715 A Diagonal Split-cell Model for the High-order Symplectic FDTD Scheme Wei Sha, Xianliang Wu, and Mingsheng Chen Key Laboratory of Intelligent Computing & Signal Processing

More information

Computational Acceleration of Image Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA

Computational Acceleration of Image Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA Computational Acceleration of Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA Mutaqin Akbar mutaqin.akbar@gmail.com Pranowo pran@mail.uajy.ac.id Suyoto suyoto@mail.uajy.ac.id Abstract

More information

GPU Implementation of Implicit Runge-Kutta Methods

GPU Implementation of Implicit Runge-Kutta Methods GPU Implementation of Implicit Runge-Kutta Methods Navchetan Awasthi, Abhijith J Supercomputer Education and Research Centre Indian Institute of Science, Bangalore, India navchetanawasthi@gmail.com, abhijith31792@gmail.com

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

A Parallel Algorithm for Finding Sub-graph Isomorphism

A Parallel Algorithm for Finding Sub-graph Isomorphism CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Sub-graph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns

More information

Toward the Development of a Three-Dimensional Unconditionally Stable Finite-Difference Time-Domain Method

Toward the Development of a Three-Dimensional Unconditionally Stable Finite-Difference Time-Domain Method 1550 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 48, NO. 9, SEPTEMBER 2000 Toward the Development of a Three-Dimensional Unconditionally Stable Finite-Difference Time-Domain Method Fenghua

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 6:

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 6: file:///d:/chitra/nptel_phase2/mechanical/cfd/lecture6/6_1.htm 1 of 1 6/20/2012 12:24 PM The Lecture deals with: ADI Method file:///d:/chitra/nptel_phase2/mechanical/cfd/lecture6/6_2.htm 1 of 2 6/20/2012

More information

A New High Order Algorithm with Low Computational Complexity for Electric Field Simulation

A New High Order Algorithm with Low Computational Complexity for Electric Field Simulation Journal of Computer Science 6 (7): 769-774, 1 ISSN 1549-3636 1 Science Publications A New High Order Algorithm with Low Computational Complexity for lectric Field Simulation 1 Mohammad Khatim Hasan, Jumat

More information

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

AS THE MOST standard algorithm, the traditional finitedifference

AS THE MOST standard algorithm, the traditional finitedifference IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 56, NO. 2, FEBRUARY 2008 493 Survey on Symplectic Finite-Difference Time-Domain Schemes for Maxwell s Equations Wei Sha, Zhixiang Huang, Mingsheng Chen,

More information

Medical Image Segmentation using Level Sets

Medical Image Segmentation using Level Sets Medical Image Segmentation using Level Sets Technical Report #CS-8-1 Tenn Francis Chen Abstract Segmentation is a vital aspect of medical imaging. It aids in the visualization of medical data and diagnostics

More information

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 13: The Lecture deals with:

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 13: The Lecture deals with: The Lecture deals with: Some more Suggestions for Improvement of Discretization Schemes Some Non-Trivial Problems with Discretized Equations file:///d /chitra/nptel_phase2/mechanical/cfd/lecture13/13_1.htm[6/20/2012

More information

Parallel Poisson Solver in Fortran

Parallel Poisson Solver in Fortran Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first

More information

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity

More information

Index. C m (Ω), 141 L 2 (Ω) space, 143 p-th order, 17

Index. C m (Ω), 141 L 2 (Ω) space, 143 p-th order, 17 Bibliography [1] J. Adams, P. Swarztrauber, and R. Sweet. Fishpack: Efficient Fortran subprograms for the solution of separable elliptic partial differential equations. http://www.netlib.org/fishpack/.

More information

1\C 1 I)J1mptMIll 'betl~flbejlli

1\C 1 I)J1mptMIll 'betl~flbejlli 1\C 1 )J1mptMll 'betl~flbejll l~~1hlbd ada]t6gh -or~«ejf. '~illrlf J~..6 '~~~ll!~ 4iJ~ "Mf:i',nl.Nqr2l' ~':l:mj}.i~:tv t.~l '\h Dr. N.Homsup, Abstract n this paper, two high-order FDTD schemes are developed

More information

Mapping Vector Codes to a Stream Processor (Imagine)

Mapping Vector Codes to a Stream Processor (Imagine) Mapping Vector Codes to a Stream Processor (Imagine) Mehdi Baradaran Tahoori and Paul Wang Lee {mtahoori,paulwlee}@stanford.edu Abstract: We examined some basic problems in mapping vector codes to stream

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Point-to-Point Synchronisation on Shared Memory Architectures

Point-to-Point Synchronisation on Shared Memory Architectures Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

arxiv: v1 [math.na] 26 Jun 2014

arxiv: v1 [math.na] 26 Jun 2014 for spectrally accurate wave propagation Vladimir Druskin, Alexander V. Mamonov and Mikhail Zaslavsky, Schlumberger arxiv:406.6923v [math.na] 26 Jun 204 SUMMARY We develop a method for numerical time-domain

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Parallelization of a Electromagnetic Analysis Tool

Parallelization of a Electromagnetic Analysis Tool Parallelization of a Electromagnetic Analysis Tool Milissa Benincasa Black River Systems Co. 162 Genesee Street Utica, NY 13502 (315) 732-7385 phone (315) 732-5837 fax benincas@brsc.com United States Chris

More information

Computational electromagnetic modeling in parallel by FDTD in 2D SIMON ELGLAND. Thesis for the Degree of Master of Science in Robotics

Computational electromagnetic modeling in parallel by FDTD in 2D SIMON ELGLAND. Thesis for the Degree of Master of Science in Robotics Computational electromagnetic modeling in parallel by FDTD in 2D Thesis for the Degree of Master of Science in Robotics SIMON ELGLAND School of Innovation, Design and Engineering Mälardalen University

More information

Annex 10 - Summary of analysis of differences between frequencies

Annex 10 - Summary of analysis of differences between frequencies Annex 10 - Summary of analysis of differences between frequencies Introduction A10.1 This Annex summarises our refined analysis of the differences that may arise after liberalisation between operators

More information

Proposal of Research Activity. PhD Course in Space Sciences, Technologies and Measurements (STMS)

Proposal of Research Activity. PhD Course in Space Sciences, Technologies and Measurements (STMS) Proposal of Research Activity PhD Course in Space Sciences, Technologies and Measurements (STMS) Curriculum: Sciences and Technologies for Aeronautics and Satellite Applications (STASA) XXXIV Cycle PhD

More information

Statistical Testing of Software Based on a Usage Model

Statistical Testing of Software Based on a Usage Model SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(1), 97 108 (JANUARY 1995) Statistical Testing of Software Based on a Usage Model gwendolyn h. walton, j. h. poore and carmen j. trammell Department of Computer

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

CS205b/CME306. Lecture 9

CS205b/CME306. Lecture 9 CS205b/CME306 Lecture 9 1 Convection Supplementary Reading: Osher and Fedkiw, Sections 3.3 and 3.5; Leveque, Sections 6.7, 8.3, 10.2, 10.4. For a reference on Newton polynomial interpolation via divided

More information

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Parallel hp-finite Element Simulations of 3D Resistivity Logging Instruments

Parallel hp-finite Element Simulations of 3D Resistivity Logging Instruments Parallel hp-finite Element Simulations of 3D Resistivity Logging Instruments M. Paszyński 1,3, D. Pardo 1,2, L. Demkowicz 1, C. Torres-Verdin 2 1 Institute for Computational Engineering and Sciences 2

More information

A Simple Method for Static Load Balancing of Parallel FDTD Codes Franek, Ondrej

A Simple Method for Static Load Balancing of Parallel FDTD Codes Franek, Ondrej Aalborg Universitet A Simple Method for Static Load Balancing of Parallel FDTD Codes Franek Ondrej Published in: Electromagnetics in Advanced Applications (ICEAA) 2016 International Conference on DOI (link

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,00,000 M Open access books available International authors and editors Downloads Our authors

More information

Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest.

Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest. Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest. D.A. Karras, S.A. Karkanis and D. E. Maroulis University of Piraeus, Dept.

More information

ACCELERATING THE FDTD METHOD USING SSE AND GRAPHICS PROCESSING UNITS

ACCELERATING THE FDTD METHOD USING SSE AND GRAPHICS PROCESSING UNITS ACCELERATING THE FDTD METHOD USING SSE AND GRAPHICS PROCESSING UNITS A DISSERTATION SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF MASTER OF SCIENCE IN THE FACULTY OF ENGINEERING AND PHYSICAL

More information

Syntactic Measures of Complexity

Syntactic Measures of Complexity A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Arts 1999 Bruce Edmonds Department of Philosophy Table of Contents Table of Contents - page 2

More information

Attribute combinations for image segmentation

Attribute combinations for image segmentation Attribute combinations for image segmentation Adam Halpert and Robert G. Clapp ABSTRACT Seismic image segmentation relies upon attributes calculated from seismic data, but a single attribute (usually amplitude)

More information

Symbolic Evaluation of Sums for Parallelising Compilers

Symbolic Evaluation of Sums for Parallelising Compilers Symbolic Evaluation of Sums for Parallelising Compilers Rizos Sakellariou Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL United Kingdom e-mail: rizos@csmanacuk Keywords:

More information

HIRP OPEN 2018 Compiler & Programming Language. An Efficient Framework for Optimizing Tensors

HIRP OPEN 2018 Compiler & Programming Language. An Efficient Framework for Optimizing Tensors An Efficient Framework for Optimizing Tensors 1 Theme: 2 Subject: Compiler Technology List of Abbreviations NA 3 Background Tensor computation arises frequently in machine learning, graph analytics and

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction This dissertation will describe the mathematical modeling and development of an innovative, three degree-of-freedom robotic manipulator. The new device, which has been named the

More information

NUMERICAL SIMULATION OF THE SHALLOW WATER EQUATIONS USING A TIME-CENTERED SPLIT-IMPLICIT METHOD

NUMERICAL SIMULATION OF THE SHALLOW WATER EQUATIONS USING A TIME-CENTERED SPLIT-IMPLICIT METHOD 18th Engineering Mechanics Division Conference (EMD007) NUMERICAL SIMULATION OF THE SHALLOW WATER EQUATIONS USING A TIME-CENTERED SPLIT-IMPLICIT METHOD Abstract S. Fu University of Texas at Austin, Austin,

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Using Co-Array Fortran to Enhance the Scalability of the EPIC Code

Using Co-Array Fortran to Enhance the Scalability of the EPIC Code Using Co-Array Fortran to Enhance the Scalability of the EPIC Code Jef Dawson Army High Performance Computing Research Center, Network Computing Services, Inc. ABSTRACT: Supercomputing users continually

More information

Numerical Methods for (Time-Dependent) HJ PDEs

Numerical Methods for (Time-Dependent) HJ PDEs Numerical Methods for (Time-Dependent) HJ PDEs Ian Mitchell Department of Computer Science The University of British Columbia research supported by National Science and Engineering Research Council of

More information

Application of Parallel Processing to Rendering in a Virtual Reality System

Application of Parallel Processing to Rendering in a Virtual Reality System Application of Parallel Processing to Rendering in a Virtual Reality System Shaun Bangay Peter Clayton David Sewry Department of Computer Science Rhodes University Grahamstown, 6140 South Africa Internet:

More information

Chapter 6. Petrov-Galerkin Formulations for Advection Diffusion Equation

Chapter 6. Petrov-Galerkin Formulations for Advection Diffusion Equation Chapter 6 Petrov-Galerkin Formulations for Advection Diffusion Equation In this chapter we ll demonstrate the difficulties that arise when GFEM is used for advection (convection) dominated problems. Several

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

AMS Behavioral Modeling

AMS Behavioral Modeling CHAPTER 3 AMS Behavioral Modeling Ronald S. Vogelsong, Ph.D. Overview Analog designers have for many decades developed their design using a Bottom-Up design flow. First, they would gain the necessary understanding

More information

Implementing a Statically Adaptive Software RAID System

Implementing a Statically Adaptive Software RAID System Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 1: Basics of Parallel Computing G. Rapin Brazil March 2011 Outline 1 Structure of Lecture 2 Introduction 3 Parallel Performance

More information

PTE 519 Lecture Note Finite Difference Approximation (Model)

PTE 519 Lecture Note Finite Difference Approximation (Model) PTE 519 Lecture Note 3 3.0 Finite Difference Approximation (Model) In this section of the lecture material, the focus is to define the terminology and to summarize the basic facts. The basic idea of any

More information

A Graphical User Interface (GUI) for Two-Dimensional Electromagnetic Scattering Problems

A Graphical User Interface (GUI) for Two-Dimensional Electromagnetic Scattering Problems A Graphical User Interface (GUI) for Two-Dimensional Electromagnetic Scattering Problems Veysel Demir vdemir@olemiss.edu Mohamed Al Sharkawy malshark@olemiss.edu Atef Z. Elsherbeni atef@olemiss.edu Abstract

More information

Scheduling of Compute-Intensive Code Generated from Event-B Models: An Empirical Efficiency Study

Scheduling of Compute-Intensive Code Generated from Event-B Models: An Empirical Efficiency Study Scheduling of Compute-Intensive Code Generated from Event-B Models: An Empirical Efficiency Study Fredrik Degerlund Åbo Akademi University & TUCS - Turku Centre for Computer Science Joukahainengatan 3-5,

More information

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

The Replication Technology in E-learning Systems

The Replication Technology in E-learning Systems Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 28 (2011) 231 235 WCETR 2011 The Replication Technology in E-learning Systems Iacob (Ciobanu) Nicoleta Magdalena a *

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Locality Optimization of Stencil Applications using Data Dependency Graphs

More information

Partitioning Effects on MPI LS-DYNA Performance

Partitioning Effects on MPI LS-DYNA Performance Partitioning Effects on MPI LS-DYNA Performance Jeffrey G. Zais IBM 138 Third Street Hudson, WI 5416-1225 zais@us.ibm.com Abbreviations: MPI message-passing interface RISC - reduced instruction set computing

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Bootcamp for SahasraT 7th September 2018 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, IISc Acknowledgments Akhila, SERC S. Ethier, PPPL P. Messina, ECP LLNL HPC tutorials

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Flight Systems are Cyber-Physical Systems

Flight Systems are Cyber-Physical Systems Flight Systems are Cyber-Physical Systems Dr. Christopher Landauer Software Systems Analysis Department The Aerospace Corporation Computer Science Division / Software Engineering Subdivision 08 November

More information

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations Title A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations Author(s) Wang, C; Hou, Y Citation The IEEE Power and Energy Society (PES) General Meeting, Washington, USA, 27-31

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions 1 n Efficient, Geometric Multigrid Solver for the nisotropic Diffusion Equation in Two and Three Dimensions Tolga Tasdizen, Ross Whitaker UUSCI-2004-002 Scientific Computing and Imaging Institute University

More information

Procedures for cross-border transmission capacity assessments PROCEDURES FOR CROSS-BORDER TRANSMISSION CAPACITY ASSESSMENTS.

Procedures for cross-border transmission capacity assessments PROCEDURES FOR CROSS-BORDER TRANSMISSION CAPACITY ASSESSMENTS. PROCEDURES FOR CROSS-BORDER TRANSMISSION CAPACITY ASSESSMENTS October 2001 1/13 Table of contents 1 INTRODUCTION... 4 2 GENERAL GUIDELINES... 5 3 BASE CASE CONSTRUCTION... 6 3.1 NETWORK MODEL... 6 3.1

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors Samuel Adams and Jason Payne US Air Force Research Laboratory, Human Effectiveness Directorate (AFRL/HE), Brooks City-Base, TX

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

MultiGrid-Based Fuzzy Systems for Function Approximation

MultiGrid-Based Fuzzy Systems for Function Approximation MultiGrid-Based Fuzzy Systems for Function Approximation Luis Javier Herrera 1,Héctor Pomares 1, Ignacio Rojas 1, Olga Valenzuela 2, and Mohammed Awad 1 1 University of Granada, Department of Computer

More information

The Immersed Interface Method

The Immersed Interface Method The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL A METHOD TO MODELIE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL Marc LEBELLE 1 SUMMARY The aseismic design of a building using the spectral analysis of a stick model presents

More information

An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (2000/2001)

An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (2000/2001) An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (000/001) Summary The objectives of this project were as follows: 1) Investigate iterative

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /TAP.2010.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /TAP.2010. Railton, C. J., & Paul, D. L. (2010). Analysis of structures containing sharp oblique metal edges in FDTD using MAMPs. IEEE Transactions on Antennas and Propagation, 58(9), 2954-2960. DOI: 10.1109/TAP.2010.2052561

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

Application of the Computer Capacity to the Analysis of Processors Evolution. BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018

Application of the Computer Capacity to the Analysis of Processors Evolution. BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018 Application of the Computer Capacity to the Analysis of Processors Evolution BORIS RYABKO 1 and ANTON RAKITSKIY 2 April 17, 2018 arxiv:1705.07730v1 [cs.pf] 14 May 2017 Abstract The notion of computer capacity

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Jeffrey T. Muehring and John K. Antonio Deptartment of Computer Science, P.O. Box 43104, Texas Tech University, Lubbock, TX

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Modeling and Analysis of Crosstalk between Differential Lines in High-speed Interconnects

Modeling and Analysis of Crosstalk between Differential Lines in High-speed Interconnects 1293 Modeling and Analysis of Crosstalk between Differential Lines in High-speed Interconnects F. Xiao and Y. Kami University of Electro-Communications, Japan Abstract The crosstalk between a single-ended

More information

Dual Polarized Phased Array Antenna Simulation Using Optimized FDTD Method With PBC.

Dual Polarized Phased Array Antenna Simulation Using Optimized FDTD Method With PBC. Dual Polarized Phased Array Antenna Simulation Using Optimized FDTD Method With PBC. Sudantha Perera Advanced Radar Research Center School of Electrical and Computer Engineering The University of Oklahoma,

More information

Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report

Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report Enabling Loop Parallelization with Decoupled Software Pipelining in LLVM: Final Report Ameya Velingker and Dougal J. Sutherland {avelingk, dsutherl}@cs.cmu.edu http://www.cs.cmu.edu/~avelingk/compilers/

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Modeling with Uncertainty Interval Computations Using Fuzzy Sets

Modeling with Uncertainty Interval Computations Using Fuzzy Sets Modeling with Uncertainty Interval Computations Using Fuzzy Sets J. Honda, R. Tankelevich Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO, U.S.A. Abstract A new method

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information