COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

Similar documents
Proximity Prestige using Incremental Iteration in Page Rank Algorithm

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Web Structure Mining using Link Analysis Algorithms

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

Link Analysis and Web Search

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

How to organize the Web?

Popularity of Twitter Accounts: PageRank on a Social Network

PageRank and related algorithms

Searching the Web [Arasu 01]

A Reordering for the PageRank problem

An Adaptive Approach in Web Search Algorithm

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA

c 2006 Society for Industrial and Applied Mathematics

Weighted Page Rank Algorithm based on In-Out Weight of Webpages

Information Networks: PageRank

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

How Google Finds Your Needle in the Web's

Word Disambiguation in Web Search

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Experimental study of Web Page Ranking Algorithms

On Finding Power Method in Spreading Activation Search

An Improved Computation of the PageRank Algorithm 1

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

Social Network Analysis

Part 1: Link Analysis & Page Rank

An Enhanced Page Ranking Algorithm Based on Weights and Third level Ranking of the Webpages

Big Data Analytics CSCI 4030

A P2P-based Incremental Web Ranking Algorithm

International Journal of Advance Engineering and Research Development. A Review Paper On Various Web Page Ranking Algorithms In Web Mining

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

CS6200 Information Retreival. The WebGraph. July 13, 2015

Ranking of nodes of networks taking into account the power function of its weight of connections

Personalizing PageRank Based on Domain Profiles

Adaptive methods for the computation of PageRank

Analytical survey of Web Page Rank Algorithm

Local Methods for Estimating PageRank Values

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

An Application of Personalized PageRank Vectors: Personalized Search Engine

A PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations

Query Independent Scholarly Article Ranking

Lecture #3: PageRank Algorithm The Mathematics of Google Search

arxiv: v1 [cs.na] 27 Apr 2012

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search

COMP5331: Knowledge Discovery and Data Mining

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

A Review Paper on Page Ranking Algorithms

Big Data Analytics CSCI 4030

I/O-Efficient Techniques for Computing Pagerank

Link Analysis. Link Analysis

A Parallel PageRank Algorithm with Power Iteration Acceleration

Lecture 17 November 7

Comparative Study of Web Structure Mining Techniques for Links and Image Search

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

Divide and Conquer Approach for Efficient PageRank Computation

Collaborative Filtering using Euclidean Distance in Recommendation Engine

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Reading Time: A Method for Improving the Ranking Scores of Web Pages

Ranking Techniques in Search Engines

A GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE

Computer Engineering, University of Pune, Pune, Maharashtra, India 5. Sinhgad Academy of Engineering, University of Pune, Pune, Maharashtra, India

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Weighted PageRank using the Rank Improvement

Distributed Pagerank for P2P Systems

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

The PageRank Citation Ranking

A Survey on k-means Clustering Algorithm Using Different Ranking Methods in Data Mining

COMP Page Rank

Recent Researches on Web Page Ranking

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

Information Retrieval and Web Search Engines

A Survey of Google's PageRank

Link Analysis. Paolo Boldi DSI LAW (Laboratory for Web Algorithmics) Università degli Studi di Milan

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Calculating Web Page Authority Using the PageRank Algorithm. Math 45, Fall 2005 Levi Gill and Jacob Miles Prystowsky

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET

TODAY S LECTURE HYPERTEXT AND

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

On Page Rank. 1 Introduction

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

COMP 4601 Hubs and Authorities

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights

Link Analysis. Hongning Wang

The application of Randomized HITS algorithm in the fund trading network

My Best Current Friend in a Social Network

A project report submitted to Indiana University

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Survey on Web Structure Mining

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

Web Mining: A Survey on Various Web Page Ranking Algorithms

Transcription:

International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION Atul Kumar Srivastava 1, Mitali Srivastava 2, Rakhi Garg 3, P. K. Mishra 4 1, 2, 4 Department of Computer Science, Faculty of Science, Banaras Hindu University, Varanasi, India 3 Computer Science Section, Mahila Maha Vidayalaya, Banaras Hindu University, Varanasi, India ABSTRACT: Web search engine uses several ranking algorithms to determine the ordering of web pages. PageRank method has become one of the most popular and successful method after used by Google search engine to rank web pages. Power method takes more computation time and resources due to iterative nature of PageRank method. To reduce its computing time many researchers have focused on an efficient method to compute PageRank score for a very large web graph. An algebraic Gauss-Seidel method is used by several researchers to compute PageRank score and observed that it takes less number of iterations than power method. In this paper, we have done experimental analysis of Power method and Gauss-Seidel method with Hash-map data-structure to compute PageRank score and observed that Gauss-Seidel method takes 40%-45% less number of iteration than Power method to compute PageRank score. Keywords: Hash-map, PageRank method, Power method, Gauss-Seidel method, Experimental analysis of PageRank Power method and Gauss-Seidel method. [1] INTRODUCTION Today, Web is becoming one of the most popular medium for web users to access information. Due to huge amount of data on web, it is very crucial for the web users to access relevant information in efficient time [3]. Several web page ranking methods are used to rank the web pages according to relevancy so that web user get the required web page. There are two important web page ranking algorithms: PageRank and HITS proposed by Brin & Page and John Kleinberg respectively [1, 3]. These two algorithms iteratively computes rank of the web pages. PageRank algorithm computes the rank of web pages based on single prestige score while HITS algorithm compute the rank of web pages taking account of two prestige score i.e. hub and authority score [7, 13]. Brin & Page computed rank of web pages by power method [1]. PageRank computation is very compute-intensive and resource deprivation method. It take several days to compute the rank of billion web pages. Since web pages of many web sites regularly updated so there is need of re-computation of rank of web pages to maintain the relevancy of search results [5, 13]. Due to these factors, the effective and efficient PageRank computation is demanded. Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra 1

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION Many researchers have tried to make PageRank efficient by using system architectures. Boldi and Vigma proposed a method to compute PageRank in main memory by compressing the large web graph [2]. Both Haveliwala [8] & Chen et al. [5] efficiently computed PageRank in external memory by minimizing the overhead of I/O sequence of operations. In addition, several researchers has used some algebraic technique to compute PageRank efficiently e.g. Kamvar er al. has taken large web graph as a local blocks of many inter-domain hyper-links, and compute PageRank of these local blocks before combining the results to get global rank [11]. They have also avoid re-computation of previously fixed PageRank values and speed up the PageRank computation by frequently removal of approximation of principal Eigen-vector from present iteration [10]. Kamvar & Haveliwala [9] observed the eigen-value of the equation and improves the convergence rate of PageRank method. Arasu et al. [4] used Gauss-Seidel method to compute PageRank score because it converges rate is faster than Power method for large dataset. In this paper, we computed PageRank algorithm by using Power method and Gauss- Seidel method. We have observed the result of both method on the basis of number of iteration and time taken to converge the methods and analysed that Gauss-Seidel method is more effective for large dataset to compute the PageRank algorithm as it takes approximate 40%-45% less number of iteration than the power method. The rest of paper is organized as follows, Section 2 describes some basic terminology of graph and data-structure to store the hyperlink matrix. Section 3 discuss the computation of PageRank method by Power method and Gauss-Seidel method. In Section 4, we do the comparative analysis of these algorithms based on number of iteration and time taken in its convergence. Further Section 5 concludes the paper. [2] DATASET AND SOME BASIC TERMINOLOGY To compute the PageRank scores, we store the web graph into Hash-map datastructure. Because sparse matrix takes large storage due to many zero elements present in hyperlink matrices. For example- let a small graph contains only six nodes as shown in [Figure-1]. The corresponding hyperlink matrix contains both 0 and 1 entry and is of order 6*6 i.e. n*n. Figure 1: Web graph and corresponding hyperlink matrix of six nodes 2

International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 Since in the PageRank computation only nonzero entry of hyperlink matrix is required. So we store only the non-zero entry in Hash-map data-structure thus not only the storage gets reduced but it also faster the access of data [13]. We need following data-structure during PageRank computation: Hash-map (Key: Values) where web-pages corresponding to key point to web pages contained in Values. From graph shown in [Figure 1] of six node we can see that for key containing webpage 1 the corresponding web pages in value becomes 2, 3 and similarly, we obtain the web pages for values corresponding to keys 2, 3, 4, 5, and 6 are shown in [Figure 2]. Figure 2: Data-structure to store web-graph Here -1 in Hash-map denotes that key 2 is a dangling node. Reverse Hash-map (Key: Values) where web pages corresponding to Key is pointed by the web pages corresponds to Values. From [Figure-1] we can see that the web page 1 contained in key is pointed by web page 3. Similarly we obtain the corresponding value of by 2, 3, 4, 5 and 6 as shown in [Figure-2]. One Single column array corresponding to the out-degree of every web pages in graph. If any node is dangling node then its out-degree is number of nodes in the web graph i.e. n. From figure 1 we obtain following single array for web pages 1, 2, 3, 4, 5 and 6". Out-degree: [2 6 3 2 2 1] Single column array which contains the dangling nodes. From figure 1 as we can see that node 2 is a dangling node: [2] By using Hash-map data-structure we only store non-zero entry per row. In above example hyperlink data-structure takes (n*n) i.e. 36 storage element while Hash-map takes only 10 storage element. As shown in figure 2 for large datasets Hash-map data structure would be better in terms: Accessing of the element and storage of elements than hyperlink matrix [3, 13]. We have implemented Hash-map data-structure in Java language using Guava library provided by Google [14]. [3] COMPUTATION OF PAGERANK METHOD The PageRank method was proposed by the founders of Google search engine Brin & Page in the late 1990 s and has been applied on Google search engine [1]. Specifically it is Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra 3

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION computed by the number of incoming links to the web page as well as the rank of the web pages from those links initiated. PageRank compute rank of web pages offline and it does not influenced by user s search query. Recently, application of the PageRank have been used to rank many other objects in order of significance e.g. Scientific articles or manuscript linked by citation, author linked by co-authorship and protein in biology system [3, 5, 13]. To formulate the above concepts, we treat web as a directed graph where web pages are treated as nodes and edge corresponding to hyper-links. The total number of web pages in web graph is denoted by n= V. The PageRank score of web page i is defined by Brin & Page as [13]:- Where is the out-degree of web page i. Mathematically, we have n linear equations with n unknowns variables. Let A be the adjacency matrix of web graph by following definition: We can write system on n linear equation as following:- In the above equation is the PageRank vector, this equation is the characteristic equation of Eigen-system, where the solution to vector P is an Eigen-vector with the corresponding Eigenvalue 1. Due to circular definition of the above equation iterative methods is used to solve it. There are two issues with this iterative procedure on web graph: one is rank sink issue and another is cycle problem [7, 13]. After solving these two issues PageRank score of web page is computed as:- Here α denotes damping factor it takes values between o to 1 (normally α=0.85). [4.1] PAGERANK POWER METHOD In this section we discuss Power method that is the basic method used by Brin & Page to compute the PageRank vector. Power method is one of the simplest & oldest iterative method to find out the dominant eigen-value and eigen-vector of a sparse hyperlink matrix 4

International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 [13]. The following equation is used to compute the PageRank of web pages by power method: In power method, we initially assign 1 n rank to all web pages of the PageRank vector. The iteration starts with the initial assignment of PageRank values and ends when the PageRank values do not change much in successive iteration then it converges to a particular PageRank vector. The convergence criteria i.e. threshold value taken ε = 10-7. The algorithm to compute PageRank by power method proposed by Brin & Page is given below: Figure 3: PageRank Power method Algorithm The PageRank Power method computed on the following dataset which is collected from Stanford large network dataset collection website which contains various type of datasets crawled from social network sites, road networks, autonomous system graphs etc. [12]: Table 1: Description of Datasets Dataset Number of nodes Dangling nodes Dataset 1 8846 4996 Dataset 2 22687 16466 Dataset 3 36692 0 The results obtained after the execution of above algorithm on datasets are analysed on Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra 5

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION the basis of two factor i.e. number of iteration and time taken to converge the algorithm. From figure 4 and figure 5 we can see that as we increase the value of damping factor the number of iteration and time taken to converge also increases. For the damping factor α 0.5 there is slightly change in number of iterations and time of convergence while for 0.5 α 0.9 there is huge increment. Figure 4: No. of Iteration taken to converge Power method for Figure 5: Time taken to converge Power method for [5] GAUSS-SEIDEL PAGERANK ALGORITHM Gauss-Seidel method is also an iterative method to compute linear system of equations one at a time in sequence way and uses earlier computed results in current iteration as soon as they are available. The only difference in Power method and Gauss-Seidel method is that in power method the rank of web pages obtained in k th iteration will be used in computation of (k+1) th iteration while in Gauss-Seidel method the rank value of web pages obtained in k th iteration will be used in k th iteration for other web pages. Following equation is used to compute PageRank using Gauss-Seidel method [4]:- 6

International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 We initialize 1 n as rank score of all web pages in PageRank vector. The Gauss-Seidel method starts with this initial assignment of the PageRank vector and computes the rank value of web pages in iterative nature by using above formula and uses previously computed results as soon as they become available for that iteration. [Figure-6] presented algorithm to compute PageRank vector using Gauss-Seidel method by Hash-map data-structure proposed by Arasu et al. [4]. Figure 6: PageRank Computation using Gauss-Seidel Algorithm The result are observed after the computation of PageRank algorithm using Gauss-Seidel method on the dataset. From [Figure-7] and [Figure-8] we can say that number of iteration and time taken to converge the Gauss-Seidel method increases with the increase of value of damping factor. Figure 7: Number of iteration taken to converge by Gauss-Seidel method Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra 7

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION Figure 8: Time taken by Gauss-Seidel method [6] OBSERVATIONS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION We have implemented these two methods in JAVA language, and used Guava library to implement Hash-map and Immutable Multi-map data-structure. We have done experiment on single Linux machine (Ubuntu 14.04 LTS), an Intel Core i5 CPU 3.2 GHz. From [Figure 9] it is very clear that for a given dataset 1, dataset 2 and dataset 3 there is minute differences in number of iteration generated by Gauss-Seidel and Power method for damping factor value α in rang [0.1, 0.6] but a huge gap can be seen for value of α in between 0.6 and 0.9 i.e. [0.6, 0.9]. Also it has been observed from [Figure 9] that is for the value α=0.85 Gauss-Seidel method takes about 40% to 45% less number of iteration than Power method to converge. Figure 9 (a) Figure 9(b) 8

International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 Figure 9(c) Figure 9 (a, b, c) shows the comparisons of Gauss-Seidel and Power method for different Datasets with tolerance value = 10-7 [6] CONCLUSION Web search engines uses several ranking algorithms to determine the ordering of web pages. PageRank method is one of the most widely used. To compute PageRank score datastructure is to be used that takes less storage to faster its access. It has been observed that for the large web graphs hyperlink matrix takes more storage and access time than Hash-map data structure. From the experiments performed by us on different datasets observed that as the web graph gets larger in size the Power method to compute PageRank score takes more number of iterations as compared to Gauss-Seidel method. It concludes that for large web graph Gauss-Seidel is preferred over Power method to compute PageRank. REFERENCES [1] S. Brin, L. Page (1998), The Anatomy of a Large-scale Hyper textual Web Search Engine Proceedings of the Seventh International World Wide Web Conference, Page(s):107-117. [2] Boldi, Paolo, and Sebastiano Vigna. "The webgraph framework I: compression techniques." Proceedings of the 13th international conference on World Wide Web. ACM, 2004. [3] Pavel Berkhin (2005), A survey on PageRank computing, Internet Mathematics 2, Vol.1, Page(s):73 120. [4] Arasu, Arvind, et al. "PageRank computation and the structure of the web: Experiments and algorithms." Proceedings of the Eleventh International World Wide Web Conference, Poster Track. 2002. [5] Pretto, L.: A theoretical analysis of googles PageRank. In: Laender,A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 131144. Springer, Heidelberg (2002). [6] Chen, Yen-Yu, Qingqing Gan, and Torsten Suel. "I/O-efficient techniques for computing PageRank." Proceedings of the eleventh international conference on Information and knowledge management. ACM, 2002. Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra 9

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION [7] Srivastava, Atul Kumar, et al. "International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www. iasir. net." algorithms 3.7: 14. [8] Haveliwala, Taher. "Efficient computation of PageRank." (1999). [9] Haveliwala, Taher, and Sepandar Kamvar. "The second eigenvalue of the Google matrix." Stanford University Technical Report (2003). [10] Kamvar, Sepandar, Taher Haveliwala, and Gene Golub. "Adaptive methods for the computation of PageRank." Linear Algebra and its Applications 386 (2004): 51-65. [11] Kamvar, Sepandar, et al. "Exploiting the block structure of the web for computing pagerank." Stanford University Technical Report (2003). [12] Jure Leskovec and Andrej Krevl, Stanford Large Network Dataset Collection, http://snap.stanford.edu/data, june-2014. [13] Langville, A.N., Meyer, C.D.: Googles PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006). [14] http://code.google.com/p/guava-libraries/ 10