Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Similar documents
COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

COMP Page Rank

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

COMP 4601 Hubs and Authorities

Link Analysis and Web Search

A Reordering for the PageRank problem

A PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

PageRank and related algorithms

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Lecture #3: PageRank Algorithm The Mathematics of Google Search

A brief history of Google

Motivation. Motivation

Part 1: Link Analysis & Page Rank

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS6200 Information Retreival. The WebGraph. July 13, 2015

Lecture 27: Learning from relational data

Weighted Page Rank Algorithm based on In-Out Weight of Webpages

c 2006 Society for Industrial and Applied Mathematics

A P2P-based Incremental Web Ranking Algorithm

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

How to organize the Web?

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS

Using PageRank in Feature Selection

Multigrid Methods for Markov Chains

Big Data Analytics CSCI 4030

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Learning to Rank Networked Entities

Big Data Analytics CSCI 4030

Advanced Computer Architecture: A Google Search Engine

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Page Rank Algorithm. May 12, Abstract

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

Using PageRank in Feature Selection

Trust in the Internet of Things From Personal Experience to Global Reputation. 1 Nguyen Truong PhD student, Liverpool John Moores University

Information Networks: PageRank

Phd. studies at Mälardalen University

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

Social Network Analysis

On Finding Power Method in Spreading Activation Search

Mining Web Data. Lijun Zhang

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

The PageRank Citation Ranking: Bringing Order to the Web

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

Link Structure Analysis

Parallel Implementations of Gaussian Elimination

Mining Web Data. Lijun Zhang

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Graph and Web Mining - Motivation, Applications and Algorithms PROF. EHUD GUDES DEPARTMENT OF COMPUTER SCIENCE BEN-GURION UNIVERSITY, ISRAEL

1.6 Case Study: Random Surfer

PAGE RANK ON MAP- REDUCE PARADIGM

A Survey of Google's PageRank

COMP5331: Knowledge Discovery and Data Mining

Ranking of nodes of networks taking into account the power function of its weight of connections

Mathematical Analysis of Google PageRank

Reading Time: A Method for Improving the Ranking Scores of Web Pages

Optimizing Search Engines using Click-through Data

Social Networks 2015 Lecture 10: The structure of the web and link analysis

The PageRank Citation Ranking

Introduction to Data Mining

How Google Finds Your Needle in the Web's

Information Retrieval. Lecture 11 - Link analysis

Link analysis. Query-independent ordering. Query processing. Spamming simple popularity

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Lecture 17 November 7

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

Centrality in Large Networks

On Page Rank. 1 Introduction

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

PageRank for ranking authors in co-citation networks

Lecture 8: Linkage algorithms and web search

Link Analysis. Link Analysis

arxiv: v1 [cs.na] 27 Apr 2012

Adaptive methods for the computation of PageRank

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

CS 137 Part 4. Structures and Page Rank Algorithm

Supervised Random Walks

Searching the Web [Arasu 01]

Finding Top UI/UX Design Talent on Adobe Behance

A New Technique for Ranking Web Pages and Adwords

The application of Randomized HITS algorithm in the fund trading network

Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights

Graph Theory Problem Ideas

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Extracting Rankings for Spatial Keyword Queries from GPS Data

Transcription:

Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network Meeting for Sida- and ISP-funded PhD Students in Mathematics Stockholm 26 27 Feburary 2018 1 / 16

References My Advisors Sergei Silvestrov Anatoliy Malyarenko John Mango M alardalen University M alardalen University Makerere University Christopher Engstr om Godwin Kakuba M alardalen University Makerere University 2 / 16

Research Topic The research will focus on analysis of complex networks. 1 Designing algorithms for analysis of large graphs. Particularly, centrality measure of directed graphs, i.e PageRank. 2 Investigating the applications of the algorithm in (1) on real data. 3 / 16

References Example A Network of arts and sciences based on citation patterns 4 / 16

Evaluation of Stopping Criteria for Ranks in Solving Linear Systems Linear systems of algebraic equations of the arising from mathematical formulation of natural phenomena or technological processes are common. Bioinformatics, internet search engines (web pages), financial and social networks are some of the examples with large and high sparsity matrices. With growing technology, the size of data from these fields are reaching billions and numerical computations of the solutions are becoming more demanding, (Boldi, Santini, and Vigna (2005)). For some of these systems only the actual ranks of the solution vector is interesting rather than the solution vector itself. 5 / 16

One of motivating study of complex networks was by Brin and Page (1998a) on search engine Google. The Authors used random walk based centrality measures called PageRank. PageRank is a method used to rank pages in link structure (Internet webpages) in the order of importance. A page is important if is pointed to by other important pages. 6 / 16

Definition of PageRank from Brin and Page (1998b) Definition 1 The PageRank of a page p i, denoted r(p i ), is the sum of the PageRanks of all pages pointing into R i, Brin and Page (1998b) r(p i ) = r(p j ) p j p j B pi Definition 2: For system S, the PageRank R (1) is defined as a stationary distribution of Markov chain whose state space p V (set of vertices), and the transition matrix M = c(a + g u T ) T + (1 c) u e T. Definition 3: The PageRank R (2) for the system S is defined as R (2) = (I ca T ) 1 n u, where c (0, 1) is the probability of following a graph (hyperlink). 7 / 16

Linear formulation of PageRank problem Rewriting Definition 3 we get (I ca ) R = v, (1) where v is uniform distribution vector, Langville and Meyer (2011).. PageRank vector, R = [ R1 R2 Rns ] and R i is PageRank score for vertex v i. Ranking R means comparing its elements. For instance, if R i > R j for i j implies vertex i has higher importance than vertex j. When a linear system of equations is large, we approximate the solution using iterative methods. Eg. Jacobi, SOR and Power methods. The approx. solution vector is the PageRank vector. If A is sparse and large, numerical computation is demanding, Boldi et al. (2005). 8 / 16

Motivation to the Problem A = 1 2 0 0 0 0 0 0 0 1 0 2 1 1 0 2 0 2 0 0 0 0 0 0 0 0 0 1 0. PR (rk) N.1 N.2 N.3 N.4 N.5 1st iter.(rk) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 9 / 16

Motivation to the Problem A = 1 2 0 0 0 0 0 0 0 1 0 2 1 1 0 2 0 2 0 0 0 0 0 0 0 0 0 1 0. PR (rk) N.1 N.2 N.3 N.4 N.5 1st iter.(rk) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 0.2(3) 2nd iter.(rk) 0.098(4.5) 0.268(2) 0.183(3) 0.353(1) 0.098(4.5) 3rd iter.(rk) 0.135(4.5) 0.255(2) 0.177(3) 0.296(1) 0.135(4.5).................. 7th iter.(rk) 0.126(4.5) 0.256(2) 0.180(3) 0.310(1) 0.126(4.5) Table: Convergence of PageRank vector verses Rank vector using Power method. Column sum of A gives some hint on rank. Question: How many iterations should be performed to 9 / 16

Stopping Criteria Criterion I: Norm of residual r (m) 1 = max i x (m) Ax (m 1). Criterion II: Componentwise backward error, r m i max i ( A. x m. + b ) i Criterion III:Normwise backward, r m A. x m 1 + b. Criterion IV: The ratio of residual, κ(a) rm b. c Criterion V: 1 c x(m) x (m 1). Criterion VI:Kendall s τ correlation coefficient n c n d n(n 1)/2 τ =, if no tie n c n d, otherwise, (2) (n0 n 1 )(n 0 n 2 ) 10 / 16

Experiments on real data set on rank convergence, size(n 340, 000) Figure: Index of first 100 top ranks and error (criterion I) against 11 / 16

Convergence of ranks using SOR iterative method Figure: Kendall s τ correlation coefficient: -o- Top-300; -o- Top-100 12 / 16

Conclusion 1 This method which is based on correlation coefficient between successive iterates of solution vectors together with two Top-k lists seem to be preferred than other. 2 Further, criterion I and IV were found to be completer of Kendall s τ Top-k list method. 13 / 16

Boldi, P., Santini, M., & Vigna, S. (2005). Paradoxical effects in pagerank incremental computations. Internet Mathematics, 2(3), 387 404. Brin, S., & Page, L. (1998a). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7), 107 117. Brin, S., & Page, L. (1998b). The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (www 1998). Retrieved from http://ilpubs.stanford.edu:8090/361/ Langville, A. N., & Meyer, C. D. (2011). Google s pagerank and beyond: The science of search engine rankings. Princeton University Press. 14 / 16

Acknowledgements This research was supported by the Swedish International Development Cooperation Agency (Sida), International Science Programme (ISP) in Mathematical Sciences (IPMS), Sida Bilateral Research Program (Makerere University and University of Dar-es-Salaam). We are also grateful to the research environment Mathematics and Applied Mathematics (MAM), Division of Applied Mathematics, Mälardålen University for providing an excellent and inspiring environment for research education and research. 15 / 16

Thank you! 16 / 16