Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network Meeting for Sida- and ISP-funded PhD Students in Mathematics Stockholm 26 27 Feburary 2018 1 / 16
References My Advisors Sergei Silvestrov Anatoliy Malyarenko John Mango M alardalen University M alardalen University Makerere University Christopher Engstr om Godwin Kakuba M alardalen University Makerere University 2 / 16
Research Topic The research will focus on analysis of complex networks. 1 Designing algorithms for analysis of large graphs. Particularly, centrality measure of directed graphs, i.e PageRank. 2 Investigating the applications of the algorithm in (1) on real data. 3 / 16
References Example A Network of arts and sciences based on citation patterns 4 / 16
Evaluation of Stopping Criteria for Ranks in Solving Linear Systems Linear systems of algebraic equations of the arising from mathematical formulation of natural phenomena or technological processes are common. Bioinformatics, internet search engines (web pages), financial and social networks are some of the examples with large and high sparsity matrices. With growing technology, the size of data from these fields are reaching billions and numerical computations of the solutions are becoming more demanding, (Boldi, Santini, and Vigna (2005)). For some of these systems only the actual ranks of the solution vector is interesting rather than the solution vector itself. 5 / 16
One of motivating study of complex networks was by Brin and Page (1998a) on search engine Google. The Authors used random walk based centrality measures called PageRank. PageRank is a method used to rank pages in link structure (Internet webpages) in the order of importance. A page is important if is pointed to by other important pages. 6 / 16
Definition of PageRank from Brin and Page (1998b) Definition 1 The PageRank of a page p i, denoted r(p i ), is the sum of the PageRanks of all pages pointing into R i, Brin and Page (1998b) r(p i ) = r(p j ) p j p j B pi Definition 2: For system S, the PageRank R (1) is defined as a stationary distribution of Markov chain whose state space p V (set of vertices), and the transition matrix M = c(a + g u T ) T + (1 c) u e T. Definition 3: The PageRank R (2) for the system S is defined as R (2) = (I ca T ) 1 n u, where c (0, 1) is the probability of following a graph (hyperlink). 7 / 16
Linear formulation of PageRank problem Rewriting Definition 3 we get (I ca ) R = v, (1) where v is uniform distribution vector, Langville and Meyer (2011).. PageRank vector, R = [ R1 R2 Rns ] and R i is PageRank score for vertex v i. Ranking R means comparing its elements. For instance, if R i > R j for i j implies vertex i has higher importance than vertex j. When a linear system of equations is large, we approximate the solution using iterative methods. Eg. Jacobi, SOR and Power methods. The approx. solution vector is the PageRank vector. If A is sparse and large, numerical computation is demanding, Boldi et al. (2005). 8 / 16
Motivation to the Problem A = 1 2 0 0 0 0 0 0 0 1 0 2 1 1 0 2 0 2 0 0 0 0 0 0 0 0 0 1 0. PR (rk) N.1 N.2 N.3 N.4 N.5 1st iter.(rk) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 9 / 16
Motivation to the Problem A = 1 2 0 0 0 0 0 0 0 1 0 2 1 1 0 2 0 2 0 0 0 0 0 0 0 0 0 1 0. PR (rk) N.1 N.2 N.3 N.4 N.5 1st iter.(rk) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 2nd iter.(rk) 0.098(4.5) 0.268(2) 0.183(3) 0.353(1) 0.098(4.5) 3rd iter.(rk) 0.135(4.5) 0.255(2) 0.177(3) 0.296(1) 0.135(4.5).................. 7th iter.(rk) 0.126(4.5) 0.256(2) 0.180(3) 0.310(1) 0.126(4.5) Table: Convergence of PageRank vector verses Rank vector using Power method. Column sum of A gives some hint on rank. Question: How many iterations should be performed to 9 / 16
Stopping Criteria Criterion I: Norm of residual r (m) 1 = max i x (m) Ax (m 1). Criterion II: Componentwise backward error, r m i max i ( A. x m. + b ) i Criterion III:Normwise backward, r m A. x m 1 + b. Criterion IV: The ratio of residual, κ(a) rm b. c Criterion V: 1 c x(m) x (m 1). Criterion VI:Kendall s τ correlation coefficient n c n d n(n 1)/2 τ =, if no tie n c n d, otherwise, (2) (n0 n 1 )(n 0 n 2 ) 10 / 16
Experiments on real data set on rank convergence, size(n 340, 000) Figure: Index of first 100 top ranks and error (criterion I) against 11 / 16
Convergence of ranks using SOR iterative method Figure: Kendall s τ correlation coefficient: -o- Top-300; -o- Top-100 12 / 16
Conclusion 1 This method which is based on correlation coefficient between successive iterates of solution vectors together with two Top-k lists seem to be preferred than other. 2 Further, criterion I and IV were found to be completer of Kendall s τ Top-k list method. 13 / 16
Boldi, P., Santini, M., & Vigna, S. (2005). Paradoxical effects in pagerank incremental computations. Internet Mathematics, 2(3), 387 404. Brin, S., & Page, L. (1998a). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7), 107 117. Brin, S., & Page, L. (1998b). The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (www 1998). Retrieved from http://ilpubs.stanford.edu:8090/361/ Langville, A. N., & Meyer, C. D. (2011). Google s pagerank and beyond: The science of search engine rankings. Princeton University Press. 14 / 16
Acknowledgements This research was supported by the Swedish International Development Cooperation Agency (Sida), International Science Programme (ISP) in Mathematical Sciences (IPMS), Sida Bilateral Research Program (Makerere University and University of Dar-es-Salaam). We are also grateful to the research environment Mathematics and Applied Mathematics (MAM), Division of Applied Mathematics, Mälardålen University for providing an excellent and inspiring environment for research education and research. 15 / 16
Thank you! 16 / 16