JAKUŠEV DEVELOPMENT, ANALYSIS AND APPLICATIONS OF THE TECHNOLOGY FOR PARALLELIZATION OF NUMERICAL ALGORITHMS FOR SOLUTION OF PDE AND SYSTEMS OF PDES

Aleksandr JAKUŠEV DEVELOPMENT, ANALYSIS AND APPLICATIONS OF THE TECHNOLOGY FOR PARALLELIZATION OF NUMERICAL ALGORITHMS FOR SOLUTION OF PDE AND SYSTEMS OF PDES Summary of Doctoral Dissertation Technological Sciences, Informatics Engineering (07T) 1412 Vilnius 2007

VILNIUS GEDIMINAS TECHNICAL UNIVERSITY Aleksandr JAKUŠEV DEVELOPMENT, ANALYSIS AND APPLICATIONS OF THE TECHNOLOGY FOR PARALLELIZATION OF NUMERICAL ALGORITHMS FOR SOLUTION OF PDE AND SYSTEMS OF PDES Summary of Doctoral Dissertation Technological Sciences, Informatics Engineering (07T) Vilnius 2007

Doctoral dissertation was prepared at Vilnius Gediminas Technical University in 2003 2007. Scientific Supervisor Prof Dr Habil Raimondas ČIEGIS (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering 07T). The dissertation is being defended at the Council of Scientific Field of Informatics Engineering at Vilnius Gediminas Technical University: Chairman Prof Dr Habil Romualdas BAUŠYS (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering 07T). Members: Prof Dr Habil Rimantas BARAUSKAS (Kaunas University of Technology, Technological Sciences, Informatics Engineering 07T), Prof Dr Habil Gintautas DZEMYDA (Institute of Mathematics and Informatics, Technological Sciences, Informatics Engineering 07T), Prof Dr Habil Feliksas IVANAUSKAS (Vilnius University, Physical Sciences, Informatics 09P), Assoc Prof Dr Arnas KAČENIAUSKAS (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering 07T). Opponents: Prof Dr Habil Henrikas PRANEVIČIUS (Kaunas University of Technology, Physical Sciences, Informatics 09P), Dr Julius ŽILINSKAS (Institute of Mathematics and Informatics, Technological Sciences, Informatics Engineering 07T). The dissertation will be defended at the public meeting of the Council of Scientific Field of Informatics Engineering in the Senate Hall of Vilnius Gediminas Technical University at 1 p. m. on 25 January 2008. Address: Saul tekio al. 11, LT-10223 Vilnius, Lithuania. Tel.: +370 5 274 4952, +370 5 274 4956; fax +370 5 270 0112; e-mail: doktor@adm.vgtu.lt The summary of the doctoral dissertation was distributed on 22 December 2007. A copy of the doctoral dissertation is available for review at the Library of Vilnius Gediminas Technical University (Saul tekio al. 14, LT-10223 Vilnius, Lithuania) and at the Library of Institute of Mathematics and Informatics (Akademijos g. 4, LT-08663 Vilnius, Lithuania). Aleksandr Jakušev, 2007

VILNIAUS GEDIMINO TECHNIKOS UNIVERSITETAS Aleksandr JAKUŠEV DIFERENCIALINIŲ LYGČIŲ IR JŲ SISTEMŲ SKAITINIO SPRENDIMO ALGORITMŲ LYGIAGRETINIMO TECHNOLOGIJOS KŪRIMAS, ANALIZö IR TAIKYMAI Daktaro disertacijos santrauka Technologijos mokslai, informatikos inžinerija (07T) Vilnius 2007

Disertacija rengta 2003 2007 metais Vilniaus Gedimino technikos universitete. Mokslinis vadovas prof. habil. dr. Raimondas ČIEGIS (Vilniaus Gedimino technikos universitetas, technologijos mokslai, informatikos inžinerija 07T). Disertacija ginama Vilniaus Gedimino technikos universiteto Informatikos inžinerijos mokslo krypties taryboje: Pirmininkas prof. habil. dr. Romualdas BAUŠYS (Vilniaus Gedimino technikos universitetas, technologijos mokslai, informatikos inžinerija 07T). Nariai: prof. habil. dr. Rimantas BARAUSKAS (Kauno technologijos universitetas, technologijos mokslai, informatikos inžinerija 07T), prof. habil. dr. Gintautas DZEMYDA (Matematikos ir informatikos institutas, technologijos mokslai, informatikos inžinerija 07T), prof. habil. dr. Feliksas IVANAUSKAS (Vilniaus universitetas, fiziniai mokslai, informatika 09P), doc. dr. Arnas KAČENIAUSKAS (Vilniaus Gedimino technikos universitetas, technologijos mokslai, informatikos inžinerija 07T). Oponentai: prof. habil. dr. Henrikas PRANEVIČIUS (Kauno technologijos universitetas, fiziniai mokslai, informatika 09P), dr. Julius ŽILINSKAS (Matematikos ir informatikos institutas, technologijos mokslai, informatikos inžinerija 07T). Disertacija bus ginama viešame Informatikos inžinerijos mokslo krypties tarybos pos dyje 2008 m. sausio 25 d. 13 val. Vilniaus Gedimino technikos universiteto senato pos džių sal je. Adresas: Saul tekio al. 11, LT-10223 Vilnius, Lietuva. Tel.: (8 5) 274 4952, (8 5) 274 4956; faksas (8 5) 270 0112; el. paštas doktor@adm.vgtu.lt Disertacijos santrauka išsiuntin ta 2007 m. gruodžio 22 d. Disertaciją galima peržiūr ti Vilniaus Gedimino technikos universiteto bibliotekoje (Saul tekio al. 14, LT-10223 Vilnius, Lietuva) ir Matematikos ir informatikos instituto bibliotekoje (Akademijos g. 4, LT-08663 Vilnius, Lietuva). VGTU leidyklos Technika 1412 mokslo literatūros knyga. Aleksandr Jakušev, 2007

General Characteristic of the Dissertation Topicality of the problem. It is very important to produce parallel versions of algorithms and software, because with the help of parallel computers it is possible to solve bigger problems and/or to solve them faster. This is very important today, when the size of problems quickly surpasses the resources of a single personal computer (PC). Parallel computing is also economically feasible, since it is much cheaper to combine N standard PCs rather than produce an N times more powerful PC. Sometimes it is not even necessary to buy a new hardware for that, but it is sufficient to use the existing one. Moreover, in the future even average computers will be parallel. Multicore computers with up to 4 central processing units (CPU) are a common thing today. Some specialized computers boast up to 32 CPU. Manycore technology is actively developed, it will allow combining thousands of CPU s in one chip. However, it is not easy to create a parallel program, which may take advantage of parallel resources. The programmer must possess a lot of special knowledge and skills. This fact hinders the spreading of parallel software. Therefore it is so important to produce tools, which can simplify the development of a parallel software for the average user. C++ programming language originated as an extension of C. It has a lot of new features, allowing simplification of many programming tasks and having more flexible code. However, those new C++ features require more training from the programmer to apply them effectively. If it is applied in a wrong way, then it may result in a slower, more resource-hungry code. This is why C++ is slowly adopted for implementation of mathematical modelling tasks. Similar to creating of parallel programs, special libraries of numerical algorithms, which would allow to retain high performance while maintaining clean and flexible code, could be of great help. Multiphase flow in porous media gains a lot of attention today. This is due to the fact that similar processes may be found in many places, for example wood drying and impregnation, soil contamination and cleaning, oil recovery, radioactive waste disposal, paper production, biology, filter construction, to name some. The applications of image smoothing are numerous; some examples may be noise reduction and edge detection. It is important in such areas as computer tomography and machine vision. In many problems, where the modelling is done by solving PDE (partial differential equation) or the systems of PDE s, the solution of the resulting systems of linear equations plays an important part. For this class of problems, the solution of the systems of linear equations may consume up to 80 % of all 5

computational time. Thus the effective methods (and their parallel versions) for solving systems of linear equations are of high importance for modelling problems. Aim of the work to create, implement and prepare for general usage the technology for parallelization of numerical models, which are derived from hyperbolic and parabolic equations and their systems by using finite difference, finite volume or finite elements approximations. We also aim to use the technology for parallelization of real-life problems. Tasks of the work 1. To analyze the parallelization methods of existing solvers of differential equations in order to determine the requirements for parallelization of such problems. 2. To analyze the existing parallelization models, standards and tools. 3. Using the performed analysis, to create principles of a new parallelization technology for efficient parallelization of PDE solving algorithms. 4. To create the library of mathematical objects based on the proposed technology. 5. To apply the technology for parallelizing real life problems (modelling of processes in porous media, image smoothing using non-linear diffusion filters, variational iterative solvers of systems of linear equations). Scientific novelty 1. New parallelization technology is created and implemented. This technology may be used for semiautomatic parallelization of numerical models derived after discretization of differential equations. In the new model, elements of data parallel and global memory parallel programming models are used. Compared to data parallel model, a broader spectrum of problems may be parallelized. Compared to global memory model, data exchange may be implemented more efficiently, without the loss of semiautomatic parallelization possibility. 2. The implementation of new technology allows creation of clear, effective and semi-automatically parallelized algorithms in C++. 3. Parallel algorithms of several sophisticated applications (modelling of processes in porous media, image smoothing using non-linear diffusion filters, variational iterative solvers of systems of linear equations) are implemented. 6

Methodology of research includes numerical methods, the analysis of parallel algorithm efficiency and complexity, comparative analysis of various parallelization tools and experimental research. The author have used C++ programming language, object oriented programming and template metaprogramming technology, implementations of Message Passing Interface (MPI) standard. Practical value. The proposed technology allows creating both more convenient and more effective tools for parallelization of numerical algorithms, compared to common parallel programming models such as data parallel or global memory models. The technology is implemented in a library of mathematical objects. The library allows the implementation of algorithms in C++ conveniently and efficiently. Such algorithms may later be semi automatically parallelized. Only widespread standards (C++, MPI) are used, resulting in highly portable library. Parallel implementations of several algorithms are created. A tool for modelling a fluid flow in porous media is parallelized and extended. Parallel version of image filtering using non-linear diffusion algorithm is created. It is used to detect ischemic stroke area in CT of human brain. Variational iterative linear solvers are created. By using these parallel algorithms it is possible to solve bigger problems and to solve them faster. Defended propositions 1. New technology for semi automatic parallelisation of PDE solvers, combining elements of both data parallel and global memory parallelization models. 2. The technology implementation as a library of mathematical objects, written using C++ and MPI. 3. Parallel versions of several problems (multiphase fluid flow in porous media, image smoothing using nonlinear diffusion filters, variational iterative linear solvers), implemented using the proposed parallelization technology. The scope of the scientific work. The scientific work consists of the general characteristic of the dissertation, 3 chapters, conclusions, list of literature and publications and appendices on electronic media. The total scope of the dissertation is 147 pages, 20 pictures and 10 tables. 7

1. Parallelization of Linear Algebra Algorithms Linear algebra algorithms are the building blocks of many other numerical algorithms. PDE solving algorithms are among them, up to 80 % of all computation time in such cases is spent solving linear algebra problems. Numerical PDE solution is widely used in numerical modelling of various processes. As bigger problems with finer grids arise, and faster solution times are required, parallelization of PDE solution algorithms becomes very important. The aim of this work is to create technology for parallelization of PDE solvers. So it is natural to analyze the parallelization details of existing popular PDE solvers. The parallelization methods of the following PDE solving tools were analyzed: Diffpack, OpenFOAM, UG, TOUGH2 and Clawpack. The following similarities were noticed: parallelization is implemented on the level of linear algebra; data parallel method of parallelization is used; most tools hide parallelization details so that for the end-user parallelization seems semi automatic, only data distribution among processors needs to be set; the parallelization code is heavily bound with the rest parts of the tools, so it is difficult (if it is possible at all) to reuse it for the parallelization of other similar algorithms; MPI is most often used for parallelization (the main target is clusters of PCs or supercomputers). Upon consideration it is clear that all the items in the list above are crucial features of such parallelization tools, except for the bondage between parallelization and the rest of the code, as it gives no advantage when parallelizing another solver. Thus technology is sought specially for parallelization of such problems. It should meet the following requirements: it should be a high-level technology, its usage should not require one to write a lot of code specific to the tool being parallelized (as in the case with MPI); parallelization should be done on the level of linear algebra and the discretization of differential equations; the technology should allow for easy creation of semi automatically parallelized code; parallel version should run effectively on computer clusters. When the requirements are defined, the next step is to analyze the existing parallelization models, standards and libraries in order to determine if they meet the requirements above. The most popular parallel programming models for 8

such kind of problems are Message Passing, Data parallel and Global memory. Other standards analyzed in the dissertation are PVM, MPI, OpenMP, HPF, UPC and Multithreading libraries (such as Pthreads); also such libraries as FreePOOMA, Global Arrays, TBB and PETSc. It was found that none of the above fully meets the requirements. Message passing model and solutions based on it produce low-level complicated code; solutions based on data parallel model are feature limited. Applications of global memory model result in code that is not semi automatically parallelized. Both global memory and data parallel models do not guarantee the efficient data exchange among processors. 2. Technology for Parallelization of Numerical Algorithms Based on the analysis performed in Chapter 1, a new technology for parallelization of numerical algorithms is proposed. The technology is based on data parallel model, thus it defines parallelizable data structures (arrays, vectors, matrices, etc.) and operations. However, pure data parallel model is featurelimited, because the user has to use only the operations provided by it in order the program to be easily parallelized. This was the case with HPF. The global memory model is more flexible here, so the elements of this model are also used in the proposed technology. The array elements must have global addresses. Special care is given to user interface, so that the array element index bounds adjust automatically in parallel versions to point to local elements. In this case, the amount of code changed during parallelization is less. Heavy restrictions are imposed on which elements are exchanged among processors. In PDE solving algorithms, it is often necessary to know neighbours of the element being processed. The position of neighbours, also known as stencil, is usually fixed for the problem and depends on discretization scheme. The stencil information is enough to determine the data exchange among processors. In many parallelization tools, the software tries to determine a stencil automatically. This results in clearer code at the cost of possibly wrong optimization decisions. It was decided to provide a convenient stencil interface, but the user must set the stencil explicitly. Another area of concern is the methods of data exchange. It was decided that the user must specify when the data exchange should start and, possibly, when it should end. Compared to implicit data exchange, it requires several additional function calls (without any complicated parameters, though), but allows user to optimize the interaction between computations and data exchange. 9

Initially, the technology should provide the user with the following classes: Arrays the basic data structure, which may be automatically parallelized. Vectors arrays with mathematical functionality. Vectors are more convenient for numerical algorithms; arrays are good for parallelizing arrays of data types that do not support mathematical operations. Stencils these types of objects are used to provide information which neighbour elements are needed during computations. Matrices used for matrix vector multiplications. Array elements are internally stored in 1D array; however, the user is provided with multidimensional element indexes. The element index transformations are optimized. Also, arrays implement dynamically calculated boundaries, which are adjusted automatically in parallel versions. Cyclic arrays are implemented in a specific way, using additional shadow elements where the data is copied from the opposite side of the array. This increases array footprint slightly and requires user to specify data exchange commands even in a sequential version; however, the element access does not suffer, as no additional calculations are necessary. Vectors provide additional functionality on top of the arrays, such as global operations, multiplication by a constant, calculation of various norms. The implemented operations are necessary to solve PDE efficiently. The technology provides the user with both dense and sparse matrices. The dense matrices are stored in 1D array, sparse matrices in CSR (compressed sparse row) format, as shown on Figure 2.1. For sparse matrices, it is possible to estimate the number of preallocated elements if stencil is known. The matrix dimensions are defined at creation, using vectors that the matrix will be used in conjunction with. Fig 2.1. Internal structure of sparse matrices The parallelization scheme is shown on Figure 2.2. The arrays (and thus vectors) are divided among processors depending on topology (as with the stencils, the technology provides interface for setting it) and the stencil. When data exchange starts, neighbour elements are copied to the shadow area, where 10

they must be read-only. There are several methods of data exchange: all-atonce, pair-by-pair, and pair-by-pair-ordered. Fig 2.2. Parallelization scheme: a) sequential, b) parallel C++ programming language and MPI were chosen for technology implementation. Advanced C++ features, such as OOP and template metaprogramming may be used for convenient user interface, but attention was paid to achieve high computational performance without sacrificing object-oriented approach. MPI is a standard that ensures high efficiency and portability. Fig 2.3. Library class diagram The technology is implemented as mathematical objects library. The library is available on Internet, at address http://techmat.vgtu.lt/~alexj/parsol/. It consists of ~20000 lines of code, ~10000 lines more for various automatic tests (CppUnit test framework is used). Only standard C++ and MPI features are 11

used, ensuring high portability. The library is known to run on various *nix systems, using gcc compiler and various MPI implementations, as well as on MS- Windows (MSVC + MPICH). The partial class diagram is shown on the Figure 2.3 (the library code contains more than 70 various classes). The parallel versions of the classes are implemented as children of sequential analogical classes, ensuring that the code should not be changed much during parallelization. The results of the comparison of the library classes efficiency are presented. In the numerical experiments, the efficiency of the library arrays and standard C/C++ arrays is compared. One experiment tests the speed of access of array element, and the other one compares the speed of array expressions, using either ParSol way (A = B+C) or standard C way (for-loop). The results show that the efficiency of the library arrays is similar to that of standard C arrays, which means a high level of optimization of the library classes. Finally, the plans for the future development of the proposed technology are discussed. The main directions are highlighted. One of the most important directions is the library optimization for multicore processors. This may be done by combining MPI (for separate processes on distributed memory systems) with multithreading (to leverage the power of multiple cores on single machine). Also, the memory usage optimization techniques are discussed, as RAM usage is one of the main bottlenecks of multicore parallelization performance. Another important area of library improvement is the support of unstructured grids. The additional classes and their functionality are discussed, together with the possible usage of third party tools, such as METIS package. Load balancing could also be an important feature of the technology, as it would allow for more effective usage of the cluster resources, especially the heterogeneous ones. 1D and multidimensional topologies are considered, and a user-friendly way to set static load balancing is proposed. 3. Applications of the Parallelization Technology In this chapter, applications of ParSol library to produce parallel versions of various algorithms are described. First, the parallelization of multiphase flow solver is described. The flow solver is not just parallelized, but also new functionality is added. The initial version of flow solver employed multiphase flow model (MFM). In this model, the mass conservation law for every phase k is used together with Darcy law: 12

Here, ε is the porosity of porous medium, u is phase velocity, K is the absolute permeability tensor of porous medium, ρ is the phase density, p k the phase pressure, k rk is the relative permeability, µ k is the phase dynamic viscosity and g is acceleration vector due to gravity. The difference between the phase pressures is capillary pressure which is also used to produce complete set of equations. The global pressure model, which is used in the flow solver, is derived from MFM, when new artificial global variables are introduced, and all the phases are viewed as one mixture with global coefficients, which are actually some superposition of the appropriate coefficients of all the phases. The global pressure formulation is easier to implement numerically, however, an assumption is made when introducing global pressure that allows the model to be used for homogeneous problems only. Finally, if we have the two-phase system, the final set of equations is In the literature, λ k are often called the fractional flow functions. The structure of the flow solver is shown on the Figure 3.1.,,.,. (3.1) (3.2) (3.3) Fig 3.1. Scheme of the MFSolver tool 13

Pressure class and its descendants solve the pressure equation (3.2), Saturation the saturation equation (3.3). Class CommonRel and its descendants contain the various system properties, different for every problem. Finally, the manager class is responsible for general process organization. It manages all the solver classes, and is also responsible for time and space discretization and the solution of linear equations. In the new version of the solver, both the new model is implemented and the existing model is being expanded. For the global pressure model, phase components are now taken into account, as well as thermal transfers. The new, pressure-saturation model, has no artificial additional requirements, thus allowing heterogeneous problems to be modelled. Also, the new second-order central upwind scheme was investigated and implemented. The new version is now built upon ParSol library. Due to this transition, the part of the MFSolver code common to problems with different number of dimensions has been changed from taking the number of dimensions as a parameter to template metaprogramming version. The transition increases the size of executable program, but reduces computational speed. A lot of parallelization efforts were saved using MPIversion condition. Second, the application of the ParSol library to the image smoothing problem is discussed. The image may be represented as 2D array of values, each representing the grayscale value of appropriate pixel. Applying the finite difference scheme to the following equation may do the smoothing: (3.4). However, if we want to preserve edges, the non-linear scheme should be used, for example (3.5). In the points, where the derivative is small the diffusion process is slowed down, while where is large this process is fasted up. These algorithms were implemented using ParSol library, and tested on SP4 supercomputer and VGTU cluster of PCs Vilkas. The results have shown good efficiency of parallelization. Generally, parallelization efficiency was better on SP4 supercomputer cluster, due to the fact that communication costs were much less. However, for non-linear diffusion, where computation takes 14

more time compared to data exchange, the parallelization efficiency was close to 1 even on Vilkas cluster. Another application was the parallelization of the iterative system of linear equation solvers. The solvers to parallelize were CG (conjugate gradient) and MSD(30,10) (Modified Steepest Descent) algorithm, shown below. The experiments showed parallelization efficiency starting from 0.8 and up for both methods. General Conclusions After developing and applying the new technology for parallelization of PDE solution algorithms, the following scientific and practical conclusions were formulated: 1. The parallelization methods of popular existing PDE solvers are effective ones, however, their implementations are tightly connected to the tools and can t be used for other purposes. The existing parallelization technologies and tools can t be used to achieve desired parallelization properties automatically. 2. The principles of proposed parallelization technology allow overcoming the shortcomings of widely used data parallel and global memory parallel programming models. Due to such shortcomings as limited functionality or low parallelization efficiency the abovementioned models don t meet the desired requirements. 15

3. New technology is designed for implementation and parallelization of discretization and linear algebra steps arising during numerical solution of PDEs. It may be used to create semi automatically parallelizable scalable PDE solvers. 4. Transition to the new technology is clear and formalized. 6 steps are required for the transition to sequential version; parallelization takes additional 5 steps. The program structure remains intact. The technology is tested on MS- Windows, Linux and AIX operating systems, using MSVC++ and g++ compilers and MPICH, LAM/MPI and IBM MPI implementations. 5. The library allows efficient implementation of linear algebra and PDE discretization algorithms in C++, not abandoning the language s advanced features. This is proved by the fact that the usage of the library decreases performance of some operations maximum 2.5 times. For some operations performance is decreased just by 1-8 %. The library makes efficient usage of modern compiler optimization algorithms. Experiments show that compiler optimization yields ~2.5 times speed increase for the library, while the speed increase for standard C/C++ methods is only ~1.5 times. 6. The analysis of presented applications shows that the technology allows to achieve the desired goals. It was possible to implement and parallelize selected PDE and linear algebra problems using the new technology. It was possible to compile and run the programs on different platforms without any modifications. The efficiency and scalability of the developed parallel algorithms were on par with theoretical best case predictions. 7. The implementation of the proposed technology can be widely used for programming and parallelization of PDE solvers. It is because the tasks that the implementation is targeted at, both effective C++ usage and algorithm parallelization, are very important for modern numerical software development. List of Published Works on the Topic of the Dissertation In the reviewed scientific periodical publications 1. ČIEGIS, R.; JAKUŠEV, A.; STARIKOVIČIUS, V. Parallel tool for solution of multiphase flow problems. In Lecture Notes in Computer Science, 6th International Conference, PPAM-2005, Poznan, Poland, September 11 14, 2005. Revised Selected Papers, 2006, Vol. 3911, p. 312 319. ISSN 0302-9743 (ISI Master Journal List). 2. STARIKOVIČIUS, V.; ČIEGIS, R.; JAKUŠEV, A. Analysis of upwind and high-resolution schemes for solving convection dominated problems in 16

porous media. Mathematical Modelling and Analysis, 2006, 11(4), p. 451 474. ISSN 1392-6292. 3. ČIEGIS, R.; JAKUŠEV, A.; KRYLOVAS, A.; SUBOČ, O. Parallel algorithms for solution of nonlinear diffusion problems in image smoothing. Mathematical Modelling and Analysis, 2005, 10(2), p. 155 172. ISSN 1392-6292. 4. ČIEGIS, Raim.; ČIEGIS, Rem.; JAKUŠEV, A.; ŠALTENIENö, G. Parallel Variational Iterative Linear Solvers. Mathematical Modelling and Analysis, 2007, 12(1), p. 1 16. ISSN 1392-6292. 5. JAKUŠEV, A. Application of Template Metaprogramming Technologies to Improve the Efficiency of Parallel Arrays. Mathematical Modelling and Analysis, 2007, 12(1), p. 71 79. ISSN 1392-6292. 6. ČIEGIS, R.; JAKUŠEV, A. Lygiagretieji algoritmai vaizdų filtravime [Parallel algorithms in image filtering]. Lietuvos matematikos rinkinys, 2005, 45, spec. nr., p. 411 416. ISSN 0132-2818. 7. JAKUŠEV, A.; STARIKOVIČIUS, V. Daugiafazio tek jimo uždavinių sprendimo įrankis ir jo taikymas daugiamačiams uždaviniams [Multiphase fluid flow solver and its application to multidimensional problems]. Lietuvos matematikos rinkinys, 2004, 44, spec. nr., p. 634 638. ISSN 0132-2818. In the other editions 8. JAKUŠEV, A.; STARIKOVIČIUS, V. Application of Parallel Arrays for Parallelisation of Data Parallel Algorithms. Computer Aided Methods in Optimal Design and Operations, Series on Computers and Operations research, 2006, Vol. 7, p. 109 118. ISBN 981-256-909-X (ISI Proceedings). 9. JAKUŠEV, A. Išraiškų šablonų naudojimas C++ masyvų efektyvumo didinimui [Improvement of C++ arrays efficiency using expression templates]. Iš Matematika (2006 m. balandžio 2 d.). Informatika (2006 m. balandžio 12 13 d.). 9-osios Lietuvos jaunųjų mokslininkų konferencijos Mokslas Lietuvos ateitis medžiaga. Vilnius: Technika, 2006, p. 94 101. ISBN 9986-05-997-6. 10. JAKUŠEV, A.; STARIKOVIČIUS, V.; ČIEGIS, R. Application of parallel arrays for semiautomatic parallelization of flow in porous media problem solver. In Proceedings of the 10 th International Conference MMA2005 & CMAM2, Trakai, Lithuania, 2005. Vilnius: Technika, 2005, p. 171 177. ISBN 9986-05-924-0. 11. ČIEGIS, R.; JAKUŠEV, A.; SUBOČ, O. Nonlinear diffusion problems in image smoothing. In Proceedings of the 10 th International Conference 17

MMA2005 & CMAM2, Trakai, Lithuania, 2005. Vilnius: Technika, 2005, p. 381 388. ISBN 9986-05-924-0. 12. JAKUŠEV, A.; STARIKOVIČIUS, V. Daugiafazio tek jimo uždavinių sprendimo įrankis ir jo testavimas [Multiphase fluid flow problem solver and its benchmarking]. Iš Matematika (2004 m. balandžio 7 8 d.). Informatika (2004 m. balandžio 13 14 d). 7-osios Lietuvos jaunųjų mokslininkų konferencijos Lietuva be mokslo Lietuva be ateities pranešimų rinkinys. Vilnius: Technika, 2004, p. 58 65. ISBN 996-05-724-8. About the author Aleksandr Jakušev was born in Klaip da, on 30 of April 1977. First degree in physics, Faculty of Physics, Vilnius University, 1999. Master of Science in Informatics Engineering, Faculty of Fundamental Sciences, Vilnius Gediminas Technical University, 2003. In 2003 2007 PhD student of Vilnius Gediminas Technical University. Aleksandr Jakušev in 2006 was on internship at the Kaiserslautern technical university, Germany. 2003 2007 Assistant in Mathematical Modelling Department of Vilnius Gediminas Technical University. DIFERENCIALINIŲ LYGČIŲ IR JŲ SISTEMŲ SKAITINIO SPRENDIMO ALGORITMŲ LYGIAGRETINIMO TECHNOLOGIJOS KŪRIMAS, ANALIZö IR TAIKYMAI Mokslo problemos aktualumas. Programų ir algoritmų lygiagretinimo nauda yra akivaizdi naudojant lygiagrečiuosius kompiuterius galima išspręsti didesnius uždavinius ir tai padaryti greičiau. Tai labai svarbu šiais laikais, kai formuluojamų uždavinių apimtys pralenkia moderniausių kompiuterių paj gumus. Lygiagrečiai spręsti uždavinius yra ir ekonomiškai naudinga, nes sujungti N kompiuterių yra pigiau, negu pagaminti N kartų galingesnį kompiuterį. Kartais tam net nereikia įsigyti naujų kompiuterių, nes užtenka panaudoti esamus resursus. Negana to, kompiuterių architektūros vystymosi tendencijos rodo, kad ateityje ir paprasti (nespecializuoti) asmeniniai kompiuteriai gali tapti lygiagretūs. Jau dabar kelių branduolių (angl. multicore) kompiuteriai, kurie turi iki 4 procesorių, yra įprastas reiškinys. Kai kurie specializuoti kompiuteriai turi iki 32 CPU, naudojančių tą pačią atmintį. Šiuo metu vystoma nauja manycore architektūra, kuri leistų sujungti šimtus ir tūkstančius CPU vienoje mikroschemoje. 18

Tačiau programų, pritaikytų lygiagretiesiems skaičiavimams, kūrimas yra nelengvas procesas, reikalaujantis iš vartotojo daug specifinių žinių ir įgūdžių. Tai stabdo lygiagretinimo pl trą. Tod l priemonių, palengvinančių lygiagrečiųjų programų rašymą eiliniam vartotojui, kūrimas yra labai svarbi užduotis. C++ kalba yra C kalbos vystymo rezultatas. Ji turi daug galimybių, leidžiančių daugelį algoritmų realizuoti aiškiau ir paprasčiau, negu su C, gauti lankstesnį kodą. Deja, plačios C++ kalbos galimyb s reikalauja iš programuotojo daugiau pastangų, norint įvertinti jų prid tinius kaštus ir išmokti visas galimybes tinkamai taikyti. Viena iš netinkamo taikymo pasekmių l čiau veikiantis ir (arba) daugiau atminties naudojantis kodas. Tai stabdo šios kalbos plitimą sprendžiant skaičiavimo matematikos ir matematinio modeliavimo uždavinius. Panašiai kaip ir lygiagrečiųjų programų kūrimo atveju didelę pagalbą taikant C++ skaičiavimo uždavinių sprendimui suteikia specialios bibliotekos, leidžiančios vartotojui naudotis visais C++ kalbos privalumais ir neprarasti efektyvumo. Pastaruoju metu daugiafaziam tek jimui poringose terp se skiriama labai daug d mesio, nes egzistuoja daugyb uždavinių, kur sutinkami šitie reiškiniai. Galima nurodyti tokių pavyzdžių, pvz., poringų medžiagų (pvz., medienos) džiovinimas arba prisotinimas, žem s užteršimo modeliavimas ir jo valymas, naftos gavyba, radioaktyviųjų atliekų saugojimas, popieriaus gamyba, biologija (įvairių audinių veikla), filtrų konstravimas ir kiti uždaviniai. Vaizdų glodinimas gali tur ti daug taikymų, pavyzdžiui, triukšmų šalinimas arba kraštų detekcija. Tai yra svarbu tokiose srityse, kaip kompiuterin tomografija arba kompiuterinis reg jimas (angl. machine vision). Tiesinių lygčių sistemos sprendimas yra daugyb s modeliavimo uždavinių, kur modeliavimas atliekamas, sprendžiant diferencialinę lygtį arba lygčių sistemas, svarbi sudedamoji dalis. Tiesinių lygčių sistemų sprendimas tokiuose uždaviniuose gali sudaryti iki 80 % viso skaičiavimo laiko. Tod l efektyvūs tiesinių lygčių sprendimo metodai ir jų lygiagrečiosios versijos yra labai svarbios daugelio uždavinių modeliavimui. Darbo tikslas. Sukurti, realizuoti ir paruošti naudojimui technologiją, kurią bus galima taikyti lygiagretinant diskrečiuosius modelius, gaunamus po parabolinių ir hiperbolinių lygčių bei jų sistemų aproksimavimo baigtinių skirtumų, baigtinių tūrių ar baigtinių elementų metodais. Išbandyti technologiją realių uždavinių lygiagretinimui. 19

Darbo uždaviniai 1. Atlikti egzistuojančių diferencialinių lygčių sprendimo įrankių lygiagretinimo analizę, siekiant nustatyti reikalavimus tokio tipo uždavinių lygiagretinimui. 2. Atlikti egzistuojančių lygiagretinimo modelių ir jais paremtų lygiagretinimo standartų bei įrankių analizę. 3. Remiantis atlikta analize, pasiūlyti naujos lygiagretinimo technologijos, kuri leidžia efektyviau lygiagretinti diferencialinių lygčių sprendimo algoritmus, principus. 4. Realizuoti pasiūlytą technologiją skaičiavimo matematikos objektų bibliotekos pavidalu. 5. Pritaikyti technologiją realių uždavinių (procesų poringose terp se modeliavimas, vaizdų glodinimas taikant netiesinius difuzinius filtrus, iteraciniai variaciniai tiesinių lygčių sistemų sprendimo algoritmai) lygiagretinimui. Mokslinis naujumas 1. Pasiūlyta ir realizuota nauja algoritmų lygiagretinimo technologija, kuri gali būti taikoma diskrečiųjų modelių, gaunamų po diferencialinių lygčių bei jų sistemų aproksimavimo, pusiau automatiniam lygiagretinimui. Ji naudoja egzistuojančių lygiagrečiųjų duomenų ir globalios atminties lygiagretinimo modelių elementus, tačiau, palyginus su lygiagrečiųjų duomenų modeliu, ji leidžia spręsti platesnę uždavinių klasę, o palyginus su globalios atminties modeliu, naujos technologijos apribojimai leidžia realizuoti efektyvesnį duomenų apsikeitimą, neprarandant pusiau automatinio lygiagretinimo galimyb s. 2. Technologijos realizacija leidžia C++ kalba kurti tiek efektyvius, tiek aiškius ir pusiau automatiškai lygiagretinamus algoritmus. 3. Taikant sukurtą technologiją realizuoti kelių sud tingų uždavinių (procesų poringose terp se modeliavimas, vaizdų glodinimas taikant netiesinius difuzinius filtrus, iteraciniai variaciniai tiesinių lygčių sistemų sprendimo algoritmai) lygiagretieji algoritmai. Tyrimų metodika apima skaitinius metodus, lygiagrečiųjų algoritmų efektyvumo ir sud tingumo analizę, įvairių lygiagretinimo priemonių lyginamąją analizę, eksperimentinį tyrimų metodą. Naudotos C++ programavimo kalba, objektinio programavimo ir šabloninio metaprogramavimo technologijos, MPI duomenų perdavimo standartą realizuojančios bibliotekos. 20

Praktin vert. Pasiūlyta technologija leidžia kurti tiek patogesnes, tiek efektyvesnes skaičiavimo matematikos algoritmų lygiagretinimo priemones, lyginant su šiuo metu paplitusiomis lygiagrečiųjų duomenų bei globalios atminties lygiagretinimo modeliais. Pasiūlyta technologija realizuota skaičiavimo matematikos objektų bibliotekos pavidalu. Ši biblioteka leidžia vartotojui patogiai ir efektyviai realizuoti algoritmus naudojant C++ kalbą. Tokiu būdu realizuotas algoritmas gali būti pusiau automatiškai išlygiagretintas. Naudojami tik plačiai paplitę standartai (C++, MPI), tod l naudojant biblioteką sukurtos programos yra lengvai pernešamos į kitas platformas. Naudojant biblioteką sukurtos aktualių uždavinių algoritmų lygiagrečios realizacijos. Išlygiagretintas skysčių tek jimo poringose terp se modeliavimo įrankis ir prapl stas jo funkcionalumas. Taip pat sukurtos lygiagrečiosios versijos vaizdų filtravimo (taikomas nustatant insulto sritis žmonių smegenyse) ir iteracinių tiesinių lygčių sistemų sprendimo algoritmų. Dabar šitie uždaviniai gali būti sprendžiami didesn s apimties ir greičiau. Ginamieji teiginiai 1. Nauja diferencialinių lygčių sprendimo įrankių pusiau automatinio lygiagretinimo technologija, jungianti lygiagrečiųjų duomenų ir globalios atminties lygiagrečiojo programavimo modelių elementus. 2. Technologijos realizacija matematinių objektų bibliotekos pavidalu, naudojant C++ ir MPI. 3. Taikomųjų uždavinių (daugiafazis skysčių tek jimas poringose terp se, vaizdų glodinimas netiesiniais difuziniais filtrais, variaciniai iteraciniai tiesinių lygčių sprendimo metodai) algoritmų lygiagrečiosios versijos, gautos taikant pasiūlytą lygiagretinimo technologiją. Darbo apimtis. Darbą sudaro bendra darbo charakteristika, 3 skyriai, išvados, literatūros sąrašas, publikacijų sąrašas ir priedai elektronin je laikmenoje. Bendra disertacijos apimtis 147 puslapiai, 20 iliustracijų ir 10 lentelių. Pirmajame disertacijos skyriuje analizuojami egzistuojančių parabolinių ir hiperbolinių diferencialinių lygčių sprendimo įrankių lygiagretinimo ypatumai, siekiant nustatyti reikalavimus tokio tipo uždavinių lygiagretinimo technologijai. Toliau nagrin jami egzistuojantys lygiagrečiojo programavimo modeliai ir jais paremti standartai ir bibliotekos. Tiriama, ar jie atitinka keliamus reikalavimus. Antrajame disertacijos skyriuje aprašoma nauja lygiagretinimo technologija ir jos realizacija. Aprašomi ir analizuojami pagrindiniai naujos technologijos principai ir detal s. Nagrin jami technologijos realizacijos skai- 21

čiavimo matematikos objektų bibliotekos ypatumai ir panaudojimo galimyb s. Skyrių užbaigia bibliotekos efektyvumo bandymų rezultatai ir technologijos vystymo galimybių aptarimas. Trečiajame disertacijos skyriuje aprašomi technologijos taikymai lygiagretinant taikomuosius uždavinius skysčių tek jimo poringose terp se modeliavimo įrankį, vaizdų filtravimą taikant netiesinius difuzinius filtrus, variacinius iteracinius tiesinių lygčių sistemų sprendimo algoritmus. Bendrosios išvados Sukūrus naują diferencialinių lygčių sprendimo algoritmų lygiagretinimo technologiją ir pritaikius ją realių uždavinių lygiagretinimui, suformuluotos šios mokslin s ir praktin s išvados: 1. Populiariausių diferencialinių lygčių sprendimo įrankių lygiagretinimo metodai yra efektyvūs, tačiau jų realizacija atlikta žemame lygyje, tod l negali būti automatiškai naudojama kitų įrankių lygiagretinimui. Egzistuojančios lygiagretinimo technologijos ir priemon s, kurios gali būti panaudotos diferencialinių lygčių sprendimo įrankių lygiagretinimui, neleidžia pasiekti norimų savybių. 2. Šiame darbe pasiūlytos naujos lygiagretinimo technologijos principai leidžia įveikti plačiai naudojamų lygiagrečiųjų duomenų ir globalios atminties modelių trūkumus, tokius kaip ribotas funkcionalumas arba žemas lygiagretinimo efektyvumas, d l kurių šie modeliai tiesiogiai neatitinka visų iškeltų reikalavimų: veikimas kompiuterių klasteriuose, pusiau automatinis lygiagretinimas, lygiagretinimas tiesin s algebros ir diferencialinių lygčių diskretizacijos lygyje. 3. Nauja technologija skirta diskretizacijos ir tiesin s algebros uždavinių, atsirandančių sprendžiant diferencialines lygtis ir jų sistemas skaitiniais metodais, realizacijai ir lygiagretinimui. Ji gali būti taikoma, kuriant pusiau automatiškai lygiagretinamus diferencialinių lygčių sprendimo įrankius. 4. Per jimas nuo standartinių C/C++ priemonių prie naujos technologijos atliekamas remiantis aiškia, formalizuota schema. Prie nuosekliosios versijos gali būti pereita per 6 žingsnius, lygiagretinimas iš vartotojo reikalauja tik 5 žingsnių. Išlaikoma pažįstamą programos struktūrą. Šiuo metu technologiją išbandyta MS-Windows, Linux ir AIX operacin se sistemose, naudojant MSVC++ ir g++ kompiliatorius bei MPICH, LAM/MPI ir IBM MPI realizacijas. 5. Biblioteka leidžia efektyviai realizuoti tiesin s algebros ir diferencialinių lygčių diskretizacijos algoritmus, neatsisakant C++ kalbos privalumų. Šį teiginį patvirtina faktas, kad bibliotekos panaudojimas sumažina tam tikrų operacijų efektyvumą daugiausia 2,5 karto, kai 22

kurioms operacijoms efektyvumo praradimas yra tik 1-8 %. Biblioteka efektyviai išnaudoja šiuolaikinių kompiliatorių optimizavimo galimybes, eksperimentai parod ~2,5 karto efektyvumo padid jimą (palyginimui, standartin mis C/C++ priemon mis parašyta programa po optimizacijos greit ja tik ~1,5 karto). 6. Pasiūlytos technologijos taikymų analiz parod, kad ją taikant galima pasiekti užbr žtų tikslų. Technologijos galimybių užteko pasirinktų diferencialinių lygčių ir tiesin s algebros uždavinių realizacijai ir lygiagretinimui. Technologijos pagalba realizuoti algoritmai be pakeitimų buvo kompiliuojami ir vykdomi skirtingose platformose. Lygiagretinimo efektyvumas ir išplečiamumas atitiko teorines,,geriausio atvejo prognozes. 7. Pasiūlytos technologijos realizacija yra aktuali šiuo metu ir turi plačias taikymo galimybes, realizuojant diferencialinių lygčių sprendimo algoritmus. Taip galima teigti d l bibliotekos sprendžiamų problemų, tokių kaip efektyvus C++ naudojimas ir algoritmų lygiagretinimas, svarbos. Trumpos žinios apie autorių Aleksandras Jakuševas gim 1977 m. balandžio 30 d. Klaip doje. 1999 m. įgijo fizikos mokslų bakalauro laipsnį Vilniaus universiteto Fizikos fakultete. 2003 m. apgyn magistro baigiamąjį darbą tema Vizualinio modeliavimo aplinkos kūrimas ir panaudojimas virtualiajam optinių procesų tyrimui ir įgijo magistro laipsnį VGTU Fundamentinių mokslų fakultete. 2003 2007 m. studijavo VGTU informatikos inžinerijos krypties doktorantūroje Matematinio modeliavimo katedroje. 2006 m. stažavosi Kaizerslauterno technikos universitete Vokietijoje. 2003 2007 m. dirbo asistentu VGTU Fundamentinių mokslų fakulteto Matematinio modeliavimo katedroje. 23

Aleksandr Jakušev DEVELOPMENT, ANALYSIS AND APPLICATIONS OF THE TECHNOLOGY FOR PARALLELIZATION OF NUMERICAL ALGORITHMS FOR SOLUTION OF PDE AND SYSTEMS OF PDES Summary of Doctoral Dissertation Technological Sciences, Informatics Engineering (07T) Aleksandr Jakušev DIFERENCIALINIŲ LYGČIŲ IR JŲ SISTEMŲ SKAITINIO SPRENDIMO ALGORITMŲ LYGIAGRETINIMO TECHNOLOGIJOS KŪRIMAS, ANALIZö IR TAIKYMAI Daktaro disertacijos santrauka Technologijos mokslai, informatikos inžinerija (07T) 2007 12 14. 1,5 sp. l. Tiražas 100 egz. Vilniaus Gedimino technikos universiteto leidykla Technika, Saul tekio al. 11, 10223 Vilnius http://leidykla.vgtu.lt Spausdino UAB Baltijos kopija, Kareivių g. 13B, 09109 Vilnius, http://www.kopija.lt 24