A Parallel Algorithm based on Monte Carlo for Computing the Inverse and other Functions of a Large Sparse Matrix

Size: px
Start display at page:

Download "A Parallel Algorithm based on Monte Carlo for Computing the Inverse and other Functions of a Large Sparse Matrix"

Transcription

1 A Parallel Algorithm based on Monte Carlo for Computing the Inverse and other Functions of a Large Sparse Matrix Patrícia Isabel Duarte Santos Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisors: Prof. José Carlos Alves Pereira Monteiro Prof. Juan António Acebron de Torres Examination Committee Chairperson: Prof. Alberto Manuel Rodrigues da Silva Supervisor: Prof. José Carlos Alves Pereira Monteiro Member of the Committee: Prof. Luís Manuel Silveira Russo November 2016

2 ii

3 To my parents: Alda e Fernando; To my brother: Pedro. iii

4 iv

5 Resumo Atualmente, a inversão de matrizes desempenha um papel importante em várias áreas do conhecimento. Por exemplo, quando analisamos características específicas de uma rede complexa como a centralidade do nó ou comunicabilidade. Por forma a evitar a computação explícita da matriz inversa, ou outras operações computacionalmente pesadas sobre matrizes, existem vários métodos eficientes que permitem resolver sistemas de equações algébricas lineares que têm como resultado a matriz inversa ou outras funções matriciais. Contudo, estes métodos, sejam eles diretos ou iterativos, têm um elevado custo quando a dimensão da matriz aumenta. Neste contexto, apresentamos um algoritmo baseado nos métodos de Monte Carlo, como uma alternativa à obtenção da matriz inversa e outras funções duma matriz esparsa de grandes dimensões. A principal vantagem deste algoritmo é o facto de permitir calcular apenas uma linha da matriz resultado, evitando explicitar toda a matriz. Esta solução foi paralelizada usando OpenMP. Entre as versões paralelizadas desenvolvidas, foi desenvolvida uma versão escalável, para as matrizes testadas, que usa a diretiva omp declare reduction. Palavras-chave: métodos de Monte Carlo, OpenMP, algoritmo paralelo, operações sobre uma matriz, redes complexas v

6 vi

7 Abstract Nowadays, matrix inversion plays an important role in several areas, for instance, when we analyze specific characteristics of a complex network such as node centrality and communicability. In order to avoid the explicit computation of the inverse matrix, or other matrix functions, which is costly, there are several high computational methods to solve linear systems of algebraic equations that obtain the inverse matrix and other matrix functions. However, these methods, whether direct or iterative, have a high computational cost when the size of the matrix increases. In this context, we present an algorithm based on Monte Carlo methods as an alternative to obtain the inverse matrix and other functions of a large-scale sparse matrix. The main advantage of this algorithm is the possibility of obtaining the matrix function for only one row of the result matrix, avoiding the instantiation of the entire result matrix. Additionally, this solution is parallelized using OpenMP. Among the developed parallelized versions, a scalable version was developed, for the tested matrices, which uses the directive omp declare reduction. networks Keywords: Monte Carlo methods, OpenMP, parallel algorithm, matrix functions, complex vii

8 viii

9 Contents Resumo Abstract v vii List of Figures xiii 1 Introduction Motivation Objectives Contributions Thesis Outline Background and Related Work Application Areas Matrix Inversion with Classical Methods Direct Methods Iterative Methods The Monte Carlo Methods The Monte Carlo Methods and Parallel Computing Sequential Random Number Generators Parallel Random Number Generators The Monte Carlo Methods Applied to Matrix Inversion ix

10 2.5 Language Support for Parallelization OpenMP MPI GPUs Evaluation Metrics Algorithm Implementation General Approach Implementation of the Different Matrix Functions Matrix Format Representation Algorithm Parallelization using OpenMP Calculating the Matrix Function Over the Entire Matrix Calculating the Matrix Function for Only One Row of the Matrix Results Instances Matlab Matrix Gallery Package CONTEST toolbox in Matlab The University of Florida Sparse Matrix Collection Inverse Matrix Function Metrics Complex Networks Metrics Node Centrality Node Communicability Computational Metrics Conclusions Main Contributions x

11 5.2 Future Work Bibliography 51 xi

12 xii

13 List of Figures 2.1 Centralized methods to generate random numbers - Master-Slave approach Process 2 (out of a total of 7 processes) generating random numbers using the Leapfrog technique Example of a matrix B = I A and A, and the theoretical result B 1 = (I A) 1 of the application of this Monte Carlo method Matrix with value factors v ij for the given example Example of stop probabilities calculation (bold column) First random play of the method Situating all elements of the first row given its probabilities Second random play of the method Third random play of the method Algorithm implementation - Example of a matrix B = I A and A, and the theoretical result B 1 = (I A) 1 of the application of this Monte Carlo method Initial matrix A and respective normalization Vector with value factors v i for the given example Code excerpt in C with the main loops of the proposed algorithm Example of one play with one iteration Example of the first iteration of one play with two iterations Example of the second iteration of one play with two iterations Code excerpt in C with the sum of all the gains for each position of the inverse matrix xiii

14 3.9 Code excerpt in C with the necessary operations to obtain the inverse matrix of one single row Code excerpt in C with the necessary operations to obtain the matrix exponential of one single row Code excerpt in C with the parallel algorithm when calculating the matrix function over the entire matrix Code excerpt in C with the function that generates a random number between 0 and Code excerpt in C with the parallel algorithm when calculating the matrix function for only one row of the matrix, using omp atomic Code excerpt in C with the parallel algorithm when calculating the matrix function for only one row of the matrix, using omp declare reduction Code excerpt in C with omp delcare reduction declaration and combiner Code excerpt in Matlab with the transformation needed for the algorithm convergence Minnesota sparse matrix format inverse matrix function - Relative Error (%) for row 17 of matrix inverse matrix function - Relative Error (%) for row 33 of matrix inverse matrix function - Relative Error (%) for row 26 of matrix inverse matrix function - Relative Error (%) for row 51 of matrix inverse matrix function - Relative Error (%) for row 33 of matrix and row 51 of matrix node centrality - Relative Error (%) for row 71 of pref matrix node centrality - Relative Error (%) for row 71 of pref matrix node centrality - Relative Error (%) for row 71 of and pref matrices node centrality - Relative Error (%) for row 71 of smallw matrix node centrality - Relative Error (%) for row 71 of smallw matrix node centrality - Relative Error (%) for row 71 of and smallw matrices node centrality - Relative Error (%) for row 71 of minnesota matrix xiv

15 4.15 node communicability - Relative Error (%) for row 71 of pref matrix node communicability - Relative Error (%) for row 71 of pref matrix node communicability - Relative Error (%) for row 71 of and pref matrix node communicability - Relative Error (%) for row 71 of smallw matrix node communicability - Relative Error (%) for row 71 of smallw matrix node communicability - Relative Error (%) for row 71 of and smallw matrix node communicability - Relative Error (%) for row 71 of minnesota matrix omp atomic version - Efficiency(%) for row 71 of pref matrix omp atomic version - Efficiency(%) for row 71 of pref matrix omp atomic version - Efficiency(%) for row 71 of smallw matrix omp atomic version - Efficiency(%) for row 71 of smallw matrix omp declare reduction version - Efficiency(%) for row 71 of pref matrix omp declare reduction version - Efficiency(%) for row 71 of pref matrix omp declare reduction version - Efficiency(%) for row 71 of smallw matrix omp declare reduction version - Efficiency(%) for row 71 of smallw matrix omp atomic and omp declare reduction and version - Speedup relative to the number of threads for row 71 of pref matrix xv

16 xvi

17 Chapter 1 Introduction The present document describes an algorithm to obtain the inverse and other functions of a large-scale sparse matrix, in the context of a master s thesis. We start by presenting the motivation behind this algorithm, the objectives we intend to achieve, the main contributions of our work and the outline for the remaining chapters of the document. 1.1 Motivation Matrix inversion is an important matrix operation that is widely used in several areas such as financial calculation, electrical simulation, cryptography and complex networks. One area of application of this work is in complex networks. These can be represented by a graph (e.g., the Internet, social networks, transport networks, neural networks, etc.), and a graph is usually represented by a matrix. In complex networks there are many features that can be studied, such as the node importance in a given network, node centrality, and the communicability between a pair of nodes that measures how well two nodes can exchange information with each other. These metrics are important when we want to the study of the topology of a complex network. There are several algorithms over matrices that allow us to extract important features of these systems. However, there are some properties which require the use of the inverse matrix, or other matrix functions, which is impractical to calculate for large matrices. Existing methods, whether direct or iterative, have a costly approach in terms of computational effort and memory needed for such problems. Therefore, Monte Carlo methods represent a viable alternative approach to this problem since they can be easily parallelized in order to obtain a good performance. 1

18 1.2 Objectives The main goal of this work, considering what was stated in the previous section, is to develop a parallel algorithm based on Monte Carlo for computing the inverse and other matrix functions of large sparse matrices in an efficient way, i.e., with a good performance. With this in mind, our objectives are: To implement an algorithm proposed by J. Von Neumann and S. M Ulam [1] that makes it possible to obtain the inverse matrix and other matrix functions based on Monte Carlo methods; To develop and implement a modified algorithm based on the item above that has its foundation on the Monte Carlo methods; To demonstrate that this new approach improves the performance of matrix inversion when compared to existing algorithms; To implement a parallel version of the new algorithm using OpenMP. 1.3 Contributions The main contributions of our work include: The implementation of a modified algorithm based on the Monte Carlo methods to obtain the inverse matrix and other matrix functions; The parallelization of the modified algorithm when we want to obtain the matrix function over the entire matrix, using OpenMP; Two versions of the parallelization of the algorithm when we want to obtain the matrix function for only one row of the matrix: one using omp atomic, and another one using omp declare reduction; A scalable parallelized version of the algorithm, using omp declare reduction, for the tested matrices. All the implementations stated above were successfully executed, with special attention to the version that calculates the matrix function for a single row of the matrix, using omp declare reduction, which is scalable and capable of reducing the computational effort compared with other existing methods, at least the synthetic matrices tested. This is due to the fact that instead of requiring the calculation of the matrix function over the entire matrix it calculates the matrix function for only one row of the matrix. It has a direct application, for example, when a study of the topology of a complex network is required, being able to effectively retrieve the node importance of a node in a given network, node centrality, and the communicability between a pair of nodes. 2

19 1.4 Thesis Outline The rest of this document is structured as follows. In Chapter 2, we present existent application areas, some background knowledge regarding matrix inversion classical methods, the Monte Carlo methods and some parallelization techniques, as well as some previous work on algorithms that aim to increase the performance of matrix inversion using the Monte Carlo methods and parallel programming. In Chapter 3, we describe our solution: an algorithm to perform matrix inversion and other matrix functions, as well as the underlying methods/techniques used in the algorithm implementation. In Chapter 4, we present the results, where we specify the procedures and measures that were used to evaluate the performance of our work. Finally, in Chapter 5, we summarize the highlights of our work and present some future work possibilities. 3

20 4

21 Chapter 2 Background and Related Work In this chapter we cover many aspects related to the computation of matrix inversion. Such aspects are important to situate our work, understand the state of the art and what we can learn and improve from that to accomplish our work. 2.1 Application Areas Nowadays, there are many areas where efficient matrix functions, such as the matrix inversion, are required. For example, in image reconstruction applied to computed tomography [2] and astrophysics [3], and in bioinformatics to solve the problem of protein structure prediction [4]. This work will mainly focus on complex networks, but it can easily be applied to other application areas. A Complex Network [5] is a graph (network) with very large dimension. So, a Complex Network is a graph with non-trivial topological features that represents a model of a real system. These real systems can be, for example: The Internet and the World Wide Web; Biological systems; Chemical systems; Neural networks. A graph G = (V, E) is composed of a set of nodes (vertices) V and edges (links) E represented by unordered pairs of vertices. Every network is naturally associated with a graph G = (V, E) where V is the set of nodes in the network and E is the collection of connections between nodes, that is E = {(i, j) there is an edge between node i and node j in G}. 5

22 One of the hardest and most important tasks in the study of the topology of such complex networks is to determine the node importance in a given network, and this concept may change from application to application. This measure is normally referred to as node centrality [5]. Regarding the node centrality and the use of matrix functions, Kylmko et al. [5] show that the matrix resolvent plays an important role. The resolvent of an n n matrix A is defined as: R(α) = (I αa) 1 (2.1) where I is the identity matrix and α C excluding the eigenvalues of A (that satisfy det(i αa) = 0), and 0 < α < 1 λ 1, where λ 1 is the maximum eigenvalue of A. The entries of the matrix resolvent count the number of walks in the network, penalizing longer walks. This can be seen by considering the power series expansion of (I αa) 1 : (I αa) 1 = I + αa + α 2 A α k A k + = α k A k (2.2) Here, [(I αa) 1 ] ij, counts the total number of walks from node i to node j, weighting walks of length k by α k. The bounds on α (0 < α < 1 λ 1 ) ensure that the matrix I αa is invertible and the power series in (2.2) converges to its inverse. Another property that is important when we are studying a complex network is the communicability between a pair of nodes i and j. This measures how well two nodes can exchange information with each other. According to Kylmko et al. [5], this can be obtained using the matrix exponential function [6] of a matrix A defined by the following infinite series: k=0 e A = I + A + A2 2! + A3 3! + = k=0 A k k! (2.3) with I being the identity matrix and with the convention that A 0 = I. In other words, the entries of the matrix, [e A ] ij, count the total number of walks from node i to node j, penalizing longer walks by scaling walks of length k by the factor 1 k!. As a result, the development and implementation of efficient matrix functions is an area of great interest since complex networks are becoming more and more relevant. 2.2 Matrix Inversion with Classical Methods The inverse of a square matrix A is the matrix A 1 that satisfies the following condition: AA 1 = I (2.4) 6

23 where I is the identity matrix. Matrix A only has an inverse if the determinant of A is not equal to zero, det(a) 0. If a matrix has an inverse, it is also called non-singular or invertible. To calculate the inverse of a n n matrix A, the following expression is used A 1 = 1 det(a) C (2.5) where C is the transpose of the matrix formed by all of the cofactors of matrix A. For example, to calculate the inverse of a 2 2 matrix, A = a b c d the following expression is used A 1 = 1 det(a) d b c a = 1 ad bc d b c a (2.6) and to calculate the inverse of a 3 3 matrix, A = a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 we use the following expression A 1 = 1 det(a) a 22 a 23 a 32 a 33 a 13 a 12 a 33 a 32 a 12 a 13 a 22 a 23 a 23 a 21 a 33 a 31 a 11 a 13 a 31 a 33 a 13 a 11 a 23 a 21 a 21 a 22 a 31 a 32 a 12 a 11 a 32 a 31 a 11 a 12 a 21 a 22. (2.7) The computational effort needed increases with the size of the matrix as we can see in the previous examples with 2 2 and 3 3 matrices. So, instead of computing the explicit inverse matrix, which is costly, we can obtain the inverse of an n n matrix by solving a linear system of algebraic equations that has the form Ax = b = x = A 1 b (2.8) where A is an n n matrix, b is a given n-vector, x is the n-vector unknown solution to be determined. These methods to solve linear systems can be either Direct or Iterative [6, 7], and they are presented in the next subsections. 7

24 2.2.1 Direct Methods Direct Methods for solving linear systems provide an exact solution (assuming exact arithmetic) in a finite number of steps. However, many operations need to be executed, which takes a significant amount of computational power and memory. For dense matrices, even sophisticated algorithms have a complexity close to T direct = O(n 3 ). (2.9) Regarding direct methods we have many ways for solving linear systems such as Gauss-Jordan Elimination and Gaussian Elimination, also known as LU factorization or LU decomposition (see Algorithm 1) [6, 7]. Algorithm 1 LU Factorization. 1: Initialize: U = A L = I 2: for k = 1 : n 1 do 3: for i = k + 1 : n do 4: L(i, k) = U(i, k)/u(k, k) 5: for j = k + 1 : n do 6: U(i, j) = U(i, j) L(i, k)u(k, j) 7: end for 8: end for 9: end for Iterative Methods Iterative Methods for solving linear systems consist of successive approximations to the solution that converge to the desired solution, x k. An iterative method is considered good depending on how quickly x k converges. To obtain this convergence, theoretically, an infinite number of iterations is needed to obtain the exact solution, although, in practice, the iteration stops when some norm of the residual error b Ax is as small as desired. Considering Equation (2.8), for dense matrices they have a complexity of T iter = O(n 2 k) (2.10) where k is the number of iterations. The Jacobi method (see Algorithm 2) and the Gauss-Seidel method [6, 7] are well known iterative methods, but they do not always converge because the matrix needs to satisfy some conditions for that to happen (e.g., if the matrix is diagonally dominant by rows for the Jacobi method, and, e.g., if the matrix is symmetric and positive definite for the Gauss-Seidel method). The Jacobi method has an unacceptably slow convergence rate and the Gauss-Seidel method, 8

25 Algorithm 2 Jacobi method. Input: A = a ij b x (0) T OL N 1: Set k = 1 2: while k N do 3: 4: for i = 1, 2,... n do 5: x i = 1 a ii [ n 6: end for 7: j=1,j i ( a ijx (0) 8: if x x (0) < T OL then 9: OUTPUT(x 1, x 2, x 3,... x n ) 10: STOP 11: end if 12: Set k = k : 14: for i = 1, 2,... n do 15: x (0) i = x i 16: end for 17: end while 18: OUTPUT(x 1, x 2, x 3,... x n ) 19: STOP j ) + b i ] tolerance maximum number of iterations despite the fact that is capable of converging quicker than the Jacobi method, it is often still too slow to be practical. 2.3 The Monte Carlo Methods The Monte Carlo methods [8] are a wide class of computational algorithms that use statistical sampling and estimation techniques, applied to synthetically constructed random populations with appropriate parameters, in order to evaluate the solutions to mathematical problems (whether they have a probabilistic background or not). This method has many advantages, especially when we have very large problems and when these problems are computationally hard to deal with, i.e., to solve analytically. There are many applications of the Monte Carlo methods in a variety of problems in optimization, operations research, and systems analysis, such as: integrals of arbitrary functions; predicting future values of stocks; solving partial differential equations; sharpening satellite images; 9

26 modeling cell populations; finding approximate solutions to NP-hard problems. The underlying mathematical concept is related with the mean value theorem which states that I = b a f(x) dx = (b a) f (2.11) where f represents the mean (average) value of f(x) in the interval [a, b]. Due to this, the Monte Carlo methods estimate the value of I by evaluating f(x) at n points selected from a uniform random distribution over [a, b]. The Monte Carlo methods obtain an estimate for f that is given by: f 1 n 1 f(x i ) (2.12) n i=0 The error in the Monte Carlo methods estimate decreases by the factor of 1 n, i.e., the accuracy increases at the same rate The Monte Carlo Methods and Parallel Computing Another advantage of choosing the Monte Carlo methods is that they are usually easy to migrate them onto parallel systems. In this case, with p processors, we can obtain an estimate p times faster and decrease error by p compared to the sequential approach. However, the enhancement of the values presented before depends on the fact that random numbers are statistically independent, because each sample can be processed independently. Thus, it is essential to develop/use good parallel random number generators and know which characteristics they should have Sequential Random Number Generators The Monte Carlo methods rely on efficient random number generators. The random number generators that we can find today are, in fact, pseudo-random number generators, for the reason that their operation is deterministic and the produced sequences are predictable. Consequently, when we refer to random number generators, we are referring, in fact, to pseudo-random number generators. Regarding random number generators, they are characterized by the following properties: 1. uniformly distributed, i.e., each possible number is equally probable; 2. the numbers are uncorrelated; 10

27 3. it never cycles, i.e., the numbers do not repeat themselves; 4. it satisfies any statistical test for randomness; 5. it is reproducible; 6. it is machine-independent, i.e., the generator has to produce the same sequence of numbers on any computer; 7. if the seed value is changed, the sequence has to change too; 8. it is easily split into independent sub-sequences; 9. it is fast; 10. it requires limited memory requirements. Observing the properties stated above we can conclude that there are no random number generators that adhere to all these requirements. For example, since the random number generator may take only a finite number of states, there will be a time when the numbers it produces will begin to repeat themselves. There are two important classes of random number generators [8]: Linear Congruential: produce a sequence X of random integers using the following formula: X i = (ax i 1 + c) mod M (2.13) where a is the multiplier, c is the additive constant, and M is the modulus. The sequence X depends on the seed X 0 and its length is 2 M at most. This method may also be used to generate floating-point numbers x i between [0, 1], dividing X i by M. Lagged Fibonacci: produces a sequence X and each element is defined as follows: X i = X i p X i q (2.14) where p and q are the lags, p > q, and is any binary arithmetic operation such as exclusive-or or addition modulo M. The sequence X can be a sequence of either integer or float-point numbers. When using this method it is important to choose the seed values, M, p and q well, resulting in sequences with very long periods and good randomness Parallel Random Number Generators Regarding parallel random number generators, they should ideally have the following properties: 11

28 1. no correlations among the numbers in different sequences; 2. scalability; 3. locality, i.e., a process should be able to spawn a new sequence of random numbers without interprocess communication. The techniques used to transform a sequential random number generator into a parallel random number generator are the following [8]: Centralized Methods Master-Slave approach: as Fig. 2.1 shows, there is a master process that has the task of generating random numbers and distributing them among the slave processes that consume them. This approach is not scalable and it is communication-intensive, so others methods are considered next. Figure 2.1: Centralized methods to generate random numbers - Master-Slave approach. Decentralized Methods Leapfrog method is comparable in certain respects to a cyclic allocation of data to tasks. Assuming that this method is running on p processes, the random samples interleave every p th element of the sequence beginning with X i, as shown in Fig Figure 2.2: Process 2 (out of a total of 7 processes) generating random numbers using the Leapfrog technique. This method has disadvantages: despite the fact that it has low correlation, the elements of the leapfrog subsequence may be correlated for certain values of p; this method does not support the dynamic creation of new random number streams. 12

29 Sequence splitting is similar to a block allocation of data of tasks. Considering that the random number generator has period P, the first P numbers generated are divided into equal parts (non-overlapping) per process. Independent sequences consist in having each process running a separate sequential random generator. This tends to work well as long as each task uses different seeds. Random number generators, specially for parallel computers, should not be trusted blindly. Therefore, the best approach is to do simulations with two or more different generators, and the results compared to check whether the random number generator is introducing a bias, i.e., a tendency. 2.4 The Monte Carlo Methods Applied to Matrix Inversion The result of the application of these statistical sampling methods depends on how an infinite sum of finite sums is done. An example of such methods is random walk, a Markov Chain Monte Carlo algorithm, which consists in the series of random samples that represents a random walk through the possible configurations. This fact leads to a variety of Monte Carlo estimators. The algorithm implemented in this thesis is based on a classic paper that describes a Monte Carlo method of inverting a class of matrices, devised by J. Von Neumann and S. M Ulam [1]. This method can be used to invert a class of n-th order matrices, but it is capable of obtaining a single element of the inverse matrix without determining the rest of the matrix. To better understand how this method works we present a concrete example and all the necessary steps involved. B A theoretical ===== results B 1 = (I A) Figure 2.3: Example of a matrix B = I A and A, and the theoretical result B 1 = (I A) 1 of the application of this Monte Carlo method. Firstly, there are some restrictions that, if satisfied, guarantee that the method produces a correct solution. Let us consider as an example the n n matrix A and B in Fig The restrictions are: Let B be a matrix of order n whose inverse is desired, and let A = I B, where I is the identity matrix. For any matrix M, let λ r (M) denote the r-th eigenvalue of M, and let m ij denote the element of 13

30 M in the i-th row and j-th column. The method requires that max r When (2.15) holds, it is known that 1 λ r (B) = max λ r (A) < 1. (2.15) r (B 1 ) ij = ([I A] 1 ) ij = (A k ) ij. (2.16) k=0 All elements of matrix A (1 i, j n) have to be positive, a ij 0, let us define p ij 0 and v ij the corresponding value factors, that satisfy the following: p ij v ij = a ij ; (2.17) n p ij < 1. (2.18) j=1 In the example considered, we can see that all this is verified in Fig. 2.4 and Fig. 2.5, except the sum of the second row of matrix A that is not inferior to 1, i.e., a 21 + a 22 + a 23 = = (see Fig. 2.3). In order to guarantee that the sum of the second row is inferior to 1, we divide all the elements of the second row by the total sum of that row plus some normalization constant (let us assume 0.8) so the value will be 2 and therefore the second row of V will be filled with 2 (Fig. 2.4). V Figure 2.4: Matrix with value factors v ij for the given example. A Figure 2.5: Example of stop probabilities calculation (bold column). In order to define a probability matrix given by p ij, an extra column in the initial matrix A should be added. This corresponds to the stop probabilities and are defined by the relations (see Fig. 2.5): n p i = 1 p ij (2.19) j=1 Secondly, once all the restrictions are met, the method proceeds in the same way to calculate each element of the inverse matrix. So, we are only going to explain how it works to calculate one element of the inverse matrix, that is the element (B 1 ) 11. As stated in [1], the Monte Carlo method to compute (B 1 ) ij is to play a solitaire game whose expected payment is (B 1 ) ij, and according to a result by Kolmogoroff [9] on the strong law of numbers, if one plays such a game repeatedly, the average 14

31 payment for N successive plays will converge to (B 1 ) ij as N, for almost all sequences of plays. Taking all this into account, to calculate one element of the inverse matrix we will need N plays, with N sufficiently large for an accurate solution. Each play has its own gain, i.e., its contribution to the final result, and the gain of one play is given by GainOfP lay = v i0i 1 v i1i 2 v ik 1 j (2.20) considering a route i = i 0 i 1 i 2 i k 1 j. Finally, assuming N plays, the total gain from all the plays is given by the following expression N (GainOfP lay) k T otalgain = k=1 N p j (2.21) which coincides with the expectation value in the limit N, being therefore (B 1 ) ij. To calculate (B 1 ) 11, one play of the game is explained with an example in the following steps, and knowing that the initial gain is equal to 1: 1. Since the position we want to calculate is in the first row, the algorithm starts in the first row of matrix A (see Fig. 2.6). Then, it is necessary to generate a random number uniformly between 0 and 1. Once we have the random number, let us consider 0.28, we need to know to which drawn position of matrix A it corresponds. To see what position we have drawn, we have to start with the value of the first position of the current row, a 11 and compare it with the random number. The search only stops when the random number is inferior to the value. In this case 0.28 > 0.2, so we have to continue accumulating the values of the visited positions in the current row. Now, we are in position a 12 and we see that 0.28 < a 11 + a 12 = = 0.4, so the position a 12 has been drawn (see Fig. 2.7) and we have to jump to the second row and execute the same operation. Finally, the gain of this random play is the initial gain multiplied by the value of the matrix with value factors correspondent with the position of a 12, which in this case is 1, as we can see in Fig random number = 0.28 A Figure 2.6: First random play of the method. Figure 2.7: Situating all elements of the first row given its probabilities. 2. In the second random play, we are in the second row and a new random number is generated, let us assume 0.1, which corresponds to the drawn position a 21 (see Fig. 2.8). Doing the same reasoning we have to jump to the first row. The gain at this point is equal to multiplying the existent 15

32 value of gain by the value of the matrix with value factors correspondent with the position of a 21, which in this case is 2, as we can see in Fig random number = 0.1 A Figure 2.8: Second random play of the method. 3. In the third random play, we are in the first row and generating a new random number, let us assume 0.6 which corresponds to the stop probability (see Fig. 2.9). The drawing of the stop probability has two particular properties considering the gain of the play, that follow: If the stop probability is drawn in the first random play, the gain is 1; In the remaining random plays, the stop probability gain is 0 (if i j) or p 1 j (if i = j), i.e., the inverse of the stop probability value from the row in which the position we want to calculate is. Thus, in this example, we see that the stop probability is not drawn in the first random play, but it is situated in the same row as the position we want to calculate the inverse matrix value, so the gain of this play is GainOfP lay = v 12 v 21 = 1 2. To obtain an accurate result N plays are needed, with N sufficiently large, and the T otalgain is given by Equation random number = 0.6 A Figure 2.9: Third random play of the method. Although the method explained in the previous paragraphs is expected to rapidly converge, it can be inefficient due to having many plays where the gain is 0. Our solution will take this in consideration in order to reduce waste. There are other Monte Carlo algorithms that aim to enhance the performance of solving linear algebra problems [10, 11, 12]. These algorithms are similar to the one explained above in this section and it is shown that, when some parallelization techniques are applied, the obtained results have a great potential. One of these methods [11] is used as a pre-conditioner, as a consequence of the costly approach of direct and iterative methods, and it has been proved that the Monte Carlo methods 16

33 present better results than the former classic methods. Consequently, our solution will exploit these parallelization techniques, explained in the next subsections, to improve our method. 2.5 Language Support for Parallelization Parallel computing [8] is the use of a parallel computer (i.e., a multiple-processor computer system supporting parallel programming) to reduce the time needed to solve a single computational problem. It is a standard way to solve problems like the one presented in this work. In order to use these parallelization techniques, we need a programming language that allows us to explicitly indicate how different portions of the computation may be executed concurrently by different processors. In the following subsections we present various kinds of parallel programming languages OpenMP OpenMP [13] is an extension of programming languages tailored for a shared-memory environment. It is an Application Program Interface (API) that consists of a set of compiler directives and a library of support functions. OpenMP was developed for Fortran, C, and C++. OpenMP is simple, portable and appropriate to program on multiprocessors. However, it has the limitation of not being suitable for generic multicomputers, since it only used on shared memory systems. On the other hand, OpenMP allows programs to be incrementally parallelized, i.e., a technique for parallelizing an existing program, in which the parallelization is introduced as a sequence of incremental changes, parallelizing one loop at a time. Following each transformation, the program is tested to ensure that its behavior does not change compared to the original program. Programs are usually not much longer than the modified sequential code MPI Message Passing Interface (MPI) [14] is a standard specification for message-passing libraries (i.e., a form of communication used in parallel programming, in which communications are completed by the sending of messages - functions, signals and data packets - to recipients.). MPI is virtually supported in every commercial parallel computer, and free libraries meeting the MPI standard are available for home-made commodity clusters. 17

34 MPI allows the portability of programs to different parallel computers, although the performance of a particular program may vary widely from one machine to another. It is suitable for programming in multicomputers. However, it requires extensive rewriting of the sequential programs GPUs The Graphic Processor Unit (GPU) [15] is a dedicated processor for graphics rendering. It is specialized for compute-intensive, parallel computation, and therefore designed in such way that more transistors are devoted to data processing rather than data caching and flow control. In order to use the power of a GPU, a parallel computing platform and programming model that leverages the parallel compute engine in NVIDIA GPUs can be used, CUDA (Compute Unified Device Architecture). This platform is designed to work with programming languages such as C, C++ and Fortran. 2.6 Evaluation Metrics To determine the performance of a parallel algorithm, evaluation is important since it helps us to understand the barriers to higher performance and estimates how much improvement our program will have when the number of processors increases. When we aim to analyse our parallel program, we can use the following metrics [8]: Speedup is used when we want to know how faster is the execution time of a parallel program when compared with the execution time of a sequential program. following: Speedup = Sequential execution time Parallel execution time The general formula is the (2.22) However, parallel programs operations can be put into three categories: computations that must be performed sequentially; computations that can be performed in parallel; and parallel overhead (communication operations and redundant computations). With these categories in mind, the speedup is denoted as ψ(n, p), where n is the problem size and p is the number of tasks. Taking into account the three aspects of the parallel programs, we have: σ(n) as the inherently sequential portion of the computation; ϕ(n) as the portion of the computation that can be executed in parallel; κ(n, p) as the time required for parallel overhead. The previous formula for speedup has the optimistic assumption that the parallel portion of the computation can be divided perfectly among the processors. But if this is not the case, the parallel execution time will be larger, and the speedup will be smaller. Hence actual speedup will be less 18

35 than or equal to the ratio between sequential execution time and parallel execution time as we have defined previously. Then, the complete expression for speedup is given by: ψ(n, p) σ(n) + ϕ(n) σ(n) + ϕ(n)/p + κ(n, p) (2.23) The efficiency is a measure of processor utilization that is represented by the following general formula: Efficiency = Sequential execution time Processors used Parallel execution time = Speedup Processors used (2.24) Having the same criteria as the speedup, efficiency is denoted as ε(n, p) and has the following definition: where 0 ε(n, p) 1. ε(n, p) σ(n) + ϕ(n) pσ(n) + ϕ(n) + pκ(n, p) Amdahl s Law can help us understand the global impact of local optimization and it is given by: (2.25) ψ(n, p) 1 f + (1 f)/p (2.26) where f is the fraction of sequential computation in the original sequential program. Gustafson-Barsis s Law is a way to evaluate the performance as it scales in size of a parallel program and it is given by: ψ(n, p) p + (1 p)s (2.27) where s is the fraction of sequential computation in the parallel program. The Karp-Flatt metric, e, can help decide whether the principal barrier to speedup is the amount of inherently sequential code or parallel overhead and it is given by the following formula: e = 1/ψ(n, p) 1/p 1 1/p (2.28) The isoefficiency metric is a way to evaluate the scalability of a parallel algorithm executing on a parallel computer and it can help us to choose the design that will achieve higher performance when the number of processors increases. The metric says that if we wish to maintain a constant level of efficiency as p increases, the fraction ε(n, p) 1 ε(n, p) (2.29) is a constant C, and the simplified formula is T (n, 1) CT 0 (n, p) (2.30) 19

36 where T 0 (n, p) is the total amount of time spent in all processes doing work not done by the sequential algorithm, and T (n, 1) represents the sequential execution time. 20

37 Chapter 3 Algorithm Implementation In this chapter we present the implementation of our proposed algorithm to obtain the matrix function, all the tools needed, issues found and solutions to overcome them. 3.1 General Approach The algorithm we propose is based on the algorithm presented in Section 2.4. For this reason all the assumptions are the same, except that our algorithm does not have the extra column corresponding to the stop probabilities and the matrix with value factors v ij is in this case a vector v i, where all values are the same for the same row. This new approach aims to reuse every single play, i.e., the gain of each play is never zero, and it is also possible to control the number of plays. It can be used as well to compute more general functions of a matrix. Coming back to the example of Section 2.4, the algorithm starts by assuring that the sum of all the elements of each row is equal to 1. So, if the sum of the row is different from 1, each element of one row is divided by the sum of all elements of that row, and the vector v i will contain the values, value factors, used to normalized the rows of the matrix. This process is illustrated in Fig. 3.1, Fig. 3.2 and Fig B A theoretical ===== results B 1 = (I A) Figure 3.1: Algorithm implementation - Example of a matrix B = I A and A, and the theoretical result B 1 = (I A) 1 of the application of this Monte Carlo method. 21

38 A ======= normalization A Figure 3.2: Initial matrix A and respective normalization. V Figure 3.3: Vector with value factors v i for the given example. Then, once we have the matrix written in the required form, the algorithm can be applied. The algorithm, as we can see in Fig. 3.4, has four main loops. The first loop defines the row that is being computed. The second loop defines the number of iterations, i.e., random jumps inside the probability matrix, and this relates to the power of the matrix in the corresponding series expansion. Then, for each number of iterations, N plays, i.e., the sample size of the Monte Carlo method, are executed for a given row. Finally, the remaining loop generates this random play with the number of random jumps given by the number of iterations. for ( i = 0 ; i < rowsize ; i ++) { for ( q = 0; q < NUM ITERATIONS ; q++) { for ( k = 0; k < NUM PLAYS; k++) { currentrow = i ; vp = 1; for ( p = 0; p < q ; p++) { Figure 3.4: Code excerpt in C with the main loops of the proposed algorithm. In order to better understand the algorithms behavior, two examples will be given: 1. In the case where we have one iteration, one possible play for that is the example of Fig That follows the same reasoning as the algorithm presented in Section 2.4, except for the matrix element where the gain is stored, i.e., in which position of the inverse matrix the gain is accumulated. This depends on the column where the last iteration stops and what is the row where it starts (first loop). The gain is accumulated in a position corresponding to the row in which it started and the column in which it finished. Let us assume that it started in row 3 and ended in column 1, the element to which the gain is added would be (B 1 ) 31. In this particular instance, it stops in the second column while it started in the first row, so the gain will be added in the element (B 1 ) When we have two iterations, one possible play for that is the example of Fig. 3.6 for the first 22

39 random number = 0.6 A Figure 3.5: Example of one play with one iteration. iteration, and Fig. 3.7 for the second iteration. In this case, it stops in the third column and it started in the first row, so the gain will count for the position (B 1 ) 13 of the inverse matrix. random number = 0.7 A Figure 3.6: Example of the first iteration of one play with two iterations. random number = 0.85 A Figure 3.7: Example of the second iteration of one play with two iterations. Finally, after the algorithm computes all the plays for each number of iterations, if we want to obtain the inverse matrix, we must retrieve the total gain for each position. This process consists in the sum of all the gains for each number of iterations divided by the N plays, as we can see in Fig for ( i =0; i < rowsize ; i ++) { for ( j =0; j < columnsize ; j ++) { for ( q=0; q < NUM ITERATIONS ; q++) { inverse [ i ] [ j ] += aux [ q ] [ i ] [ j ] ; } } } for ( i =0; i < rowsize ; i ++) { for ( j =0; j < columnsize ; j ++) { inverse [ i ] [ j ] = inverse [ i ] [ j ] / ( NUM PLAYS ) ; } } Figure 3.8: Code excerpt in C with the sum of all the gains for each position of the inverse matrix. The proposed algorithm was implemented in C, since it is a good programming language to manipulate the memory usage, and it provides language constructs that efficiently map machine in- 23

40 structions as well. One other reason is the fact that it is compatible/adaptable with all the parallelization techniques presented in Section 2.5. Concerning the parallelization technique, we used OpenMP since it is the simpler and easier way to transform a serial program into a parallel program. 3.2 Implementation of the Different Matrix Functions The algorithm we propose, depending on how we aggregate the output results, is capable of obtaining different matrix functions as a result. In this thesis, we are interested in obtaining the inverse matrix and the matrix exponential, since these functions give us important complex networks metrics: node centrality and node communicability, respectively (see Section 2.1). In Fig. 3.9, we can see how we obtain the inverse matrix of one single row, according to Equation 2.2. And in Fig we can observe how we obtain the matrix exponential taking into account Equation 2.3. If we iterate this process for a number of times equivalent to the number of lines (1 st dimension of the matrix), we get the results for the full matrix. for ( j = 0 ; j < columnsize ; j ++) { for ( q = 0 ; q < NUM ITERATIONS ; q ++) { inverse [ j ] += aux [ q ] [ j ] ; } } for ( j = 0 ; j < columnsize ; j ++) { inverse [ j ] = inverse [ j ] / ( NUM PLAYS ) ; } Figure 3.9: Code excerpt in C with the necessary operations to obtain the inverse matrix of one single row. for ( j = 0 ; j < columnsize ; j ++) { for ( q = 0 ; q < NUM ITERATIONS ; q ++) { exponential [ j ] += aux [ q ] [ j ] / f a c t o r i a l ( q ) ; } } for ( j = 0 ; j < columnsize ; j ++) { exponential [ j ] = exponential [ j ] / ( NUM PLAYS ) ; } Figure 3.10: Code excerpt in C with the necessary operations to obtain the matrix exponential of one single row. 24

Combinatorial Search; Monte Carlo Methods

Combinatorial Search; Monte Carlo Methods Combinatorial Search; Monte Carlo Methods Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 02, 2016 CPD (DEI / IST) Parallel and Distributed

More information

Monte Carlo Methods; Combinatorial Search

Monte Carlo Methods; Combinatorial Search Monte Carlo Methods; Combinatorial Search Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 22, 2012 CPD (DEI / IST) Parallel and

More information

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation Unit 5 SIMULATION THEORY Lesson 39 Learning objective: To learn random number generation. Methods of simulation. Monte Carlo method of simulation You ve already read basics of simulation now I will be

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn (revised by L.M. Liebrock) Chapter 7 Performance Analysis Learning Objectives Predict performance of parallel programs Understand barriers to higher

More information

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

The typical speedup curve - fixed problem size

The typical speedup curve - fixed problem size Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. The typical speedup curve

More information

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited. page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

Performance analysis. Performance analysis p. 1

Performance analysis. Performance analysis p. 1 Performance analysis Performance analysis p. 1 An example of time measurements Dark grey: time spent on computation, decreasing with p White: time spent on communication, increasing with p Performance

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 7 Performance Analysis Learning Objectives n Predict performance of parallel programs n Understand barriers to higher performance Outline

More information

Computational Methods. Randomness and Monte Carlo Methods

Computational Methods. Randomness and Monte Carlo Methods Computational Methods Randomness and Monte Carlo Methods Manfred Huber 2010 1 Randomness and Monte Carlo Methods Introducing randomness in an algorithm can lead to improved efficiencies Random sampling

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

ECE 563 Second Exam, Spring 2014

ECE 563 Second Exam, Spring 2014 ECE 563 Second Exam, Spring 2014 Don t start working on this until I say so Your exam should have 8 pages total (including this cover sheet) and 11 questions. Each questions is worth 9 points. Please let

More information

Reproducibility in Stochastic Simulation

Reproducibility in Stochastic Simulation Reproducibility in Stochastic Simulation Prof. Michael Mascagni Department of Computer Science Department of Mathematics Department of Scientific Computing Graduate Program in Molecular Biophysics Florida

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

Parallel quicksort algorithms with isoefficiency analysis. Parallel quicksort algorithmswith isoefficiency analysis p. 1

Parallel quicksort algorithms with isoefficiency analysis. Parallel quicksort algorithmswith isoefficiency analysis p. 1 Parallel quicksort algorithms with isoefficiency analysis Parallel quicksort algorithmswith isoefficiency analysis p. 1 Overview Sequential quicksort algorithm Three parallel quicksort algorithms Isoefficiency

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 13 Random Numbers and Stochastic Simulation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS STEPHEN WOLFRAM MATHEMATICADO OO Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS Table of Contents XXI a section new for Version 3 a section new for Version 4 a section substantially modified for

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 2 nd Exam January 29, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

Scalability of Processing on GPUs

Scalability of Processing on GPUs Scalability of Processing on GPUs Keith Kelley, CS6260 Final Project Report April 7, 2009 Research description: I wanted to figure out how useful General Purpose GPU computing (GPGPU) is for speeding up

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs 2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Monte Carlo Integration and Random Numbers

Monte Carlo Integration and Random Numbers Monte Carlo Integration and Random Numbers Higher dimensional integration u Simpson rule with M evaluations in u one dimension the error is order M -4! u d dimensions the error is order M -4/d u In general

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

CS321 Introduction To Numerical Methods

CS321 Introduction To Numerical Methods CS3 Introduction To Numerical Methods Fuhua (Frank) Cheng Department of Computer Science University of Kentucky Lexington KY 456-46 - - Table of Contents Errors and Number Representations 3 Error Types

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Linear Equation Systems Iterative Methods

Linear Equation Systems Iterative Methods Linear Equation Systems Iterative Methods Content Iterative Methods Jacobi Iterative Method Gauss Seidel Iterative Method Iterative Methods Iterative methods are those that produce a sequence of successive

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

TABLES AND HASHING. Chapter 13

TABLES AND HASHING. Chapter 13 Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ TABLES AND HASHING Chapter 13

More information

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL A METHOD TO MODELIE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL Marc LEBELLE 1 SUMMARY The aseismic design of a building using the spectral analysis of a stick model presents

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

Outline of High-Speed Quad-Precision Arithmetic Package ASLQUAD

Outline of High-Speed Quad-Precision Arithmetic Package ASLQUAD Outline of High-Speed Quad-Precision Arithmetic Package ASLQUAD OGATA Ryusei, KUBO Yoshiyuki, TAKEI Toshifumi Abstract The ASLQUAD high-speed quad-precision arithmetic package reduces numerical errors

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications D.A. Karras 1 and V. Zorkadis 2 1 University of Piraeus, Dept. of Business Administration,

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Superdiffusion and Lévy Flights. A Particle Transport Monte Carlo Simulation Code

Superdiffusion and Lévy Flights. A Particle Transport Monte Carlo Simulation Code Superdiffusion and Lévy Flights A Particle Transport Monte Carlo Simulation Code Eduardo J. Nunes-Pereira Centro de Física Escola de Ciências Universidade do Minho Page 1 of 49 ANOMALOUS TRANSPORT Definitions

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

Parallelization Principles. Sathish Vadhiyar

Parallelization Principles. Sathish Vadhiyar Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs

More information

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (2000/2001)

An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (2000/2001) An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (000/001) Summary The objectives of this project were as follows: 1) Investigate iterative

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet

More information

Figure 6.1: Truss topology optimization diagram.

Figure 6.1: Truss topology optimization diagram. 6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.

More information

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical

More information

Efficient Solution Techniques

Efficient Solution Techniques Chapter 4 The secret to walking on water is knowing where the rocks are. Herb Cohen Vail Symposium 14 poster Efficient Solution Techniques In the previous chapter, we introduced methods for implementing

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Curriculum Map: Mathematics

Curriculum Map: Mathematics Curriculum Map: Mathematics Course: Honors Advanced Precalculus and Trigonometry Grade(s): 11-12 Unit 1: Functions and Their Graphs This chapter will develop a more complete, thorough understanding of

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Writing Parallel Programs; Cost Model.

Writing Parallel Programs; Cost Model. CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing

More information

Stochastic Simulation: Algorithms and Analysis

Stochastic Simulation: Algorithms and Analysis Soren Asmussen Peter W. Glynn Stochastic Simulation: Algorithms and Analysis et Springer Contents Preface Notation v xii I What This Book Is About 1 1 An Illustrative Example: The Single-Server Queue 1

More information

x = 12 x = 12 1x = 16

x = 12 x = 12 1x = 16 2.2 - The Inverse of a Matrix We've seen how to add matrices, multiply them by scalars, subtract them, and multiply one matrix by another. The question naturally arises: Can we divide one matrix by another?

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Parallel Computing for Process Mining

Parallel Computing for Process Mining Parallel Computing for Process Mining Rui Miguel Monte Pegado dos Santos Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisor: Prof. Diogo Manuel Ribeiro

More information

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem Guan Wang and Matthias K. Gobbert Department of Mathematics and Statistics, University of

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information