On Finding Power Method in Spreading Activation Search

Similar documents
Caching Spreading Activation Search

Link Analysis and Web Search

An Approach to Interactive Social Network Geo-Mapping

PageRank and related algorithms

Lecture 17 November 7

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

COMP 4601 Hubs and Authorities

Ranking web pages using machine learning approaches

Information Networks: PageRank

Information Retrieval. Lecture 11 - Link analysis

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

Part 1: Link Analysis & Page Rank

Link Analysis. Hongning Wang

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

Personalized Navigation in the Semantic Web

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Introduction to Data Mining

Lecture 8: Linkage algorithms and web search

Web Structure Mining using Link Analysis Algorithms

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Full Text Search Engine as Scalable k-nearest Neighbor Recommendation System

Topology Generation for Web Communities Modeling

Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity

Survey on Web Structure Mining

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

COMP5331: Knowledge Discovery and Data Mining

Collaborative filtering based on a random walk model on a graph

An Improved Computation of the PageRank Algorithm 1

Big Data Analytics CSCI 4030

Recent Researches on Web Page Ranking

Personalized Faceted Navigation in the Semantic Web

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub

Searching the Web [Arasu 01]

Slides based on those in:

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS6200 Information Retreival. The WebGraph. July 13, 2015

Word Disambiguation in Web Search

Modeling Systems Using Design Patterns

CS425: Algorithms for Web Scale Data

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Improving Adaptive Hypermedia by Adding Semantics

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

How to organize the Web?

Generalized Social Networks. Social Networks and Ranking. How use links to improve information search? Hypertext

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

A ew Algorithm for Community Identification in Linked Data

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks

Bruno Martins. 1 st Semester 2012/2013

Big Data Analytics CSCI 4030

The application of Randomized HITS algorithm in the fund trading network

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

COMP Page Rank

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Ranking of nodes of networks taking into account the power function of its weight of connections

An Adaptive Approach in Web Search Algorithm

Weighted Page Rank Algorithm based on In-Out Weight of Webpages

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

Adaptive methods for the computation of PageRank

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

A P2P-based Incremental Web Ranking Algorithm

Experimental Node Failure Analysis in WSNs

Similarities in Source Codes

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Deep Web Crawling and Mining for Building Advanced Search Application

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

The PageRank Citation Ranking

Information Dissemination in Socially Aware Networks Under the Linear Threshold Model

Link Analysis SEEM5680. Taken from Introduction to Information Retrieval by C. Manning, P. Raghavan, and H. Schutze, Cambridge University Press.

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis

Popularity Weighted Ranking for Academic Digital Libraries

Personalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Advanced Computer Architecture: A Google Search Engine

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

ECS 289 / MAE 298, Lecture 9 April 29, Web search and decentralized search on small-worlds

A brief history of Google

Social Network Analysis

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017

Mining Web Data. Lijun Zhang

Introduction to Information Retrieval

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

A New Technique for Ranking Web Pages and Adwords

Transcription:

On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova 3, 842 16 Bratislava, Slovakia suchal@fiit.stuba.sk Abstract. Graph ranking algorithms such as PageRank and HITS share the common idea of calculating eigenvectors of graph adjacency matrices. This paper shows that the power method usually used to calculate eigenvectors is also present in a spreading activation search. In addition, we empirically show that spreading activation algorithm is able to converge on periodic graphs, where power method fails. Furthermore, an extension to graph ranking calculation scheme is proposed unifying calculation of PageRank, HITS, NodeRanking and spreading activation search. 1 Introduction Rapidly growing information volume in world-wide web is forcing search engines to find new ways of computing better results. Recent works show that the usage of additional metadata, such as the link structure and usage data of the web [1], or even topics [2] can be used to improve search and recommendation results. While some works focus on exploring the colorful spectra of possible metadata features the core idea of the most popular and well known PageRank and HITS algorithms usually used as a basis still remains the same in calculating eigenvectors of graph adjacency matrices. These computed eigenvectors usually represent an additional relevance measure of vertices in graph and are then used to alter (and hopefully improve) ordering in search and/or recommendation results. Such recommendations have shown usefulness in many domains, including web search [3], job offers [4], publications, or even e-learning. After a short description of an iterative process called power method that can be used to calculate matrix eigenvectors in Section 2, we concentrate on an slightly different approach to graph ranking, which is based on a theoretical model of human semantic memory known as spreading activation search (Section 3). In Section 4 we show that recursive procedure of spreading activation search can be reformulated using matrix algebra resulting into a minor variation of classic power method. Section 5 empirically shows that the power method and spreading activation search can converge to identical results. Section 6 revisits PageRank, HITS and NodeRanking algorithms unifying them in a proposed scheme extension for iterative graph ranking calculation.

2 The Power Method Power method is an iterative procedure wich can be used to calculate matrix eigenvectors. If A is a matrix 1, eigenvector r can be calculated simply by applying matrix multiplication in an infinite loop 2 starting from a randomly chosen vector r 0. r k = r k 1 A (1) 3 Spreading Activation Search lim r k = r = r A (2) k Spreading activation search [5] is based on a simple recursive procedure that distributes energy on vertices through graph edges. Similarity of graph vertices is then expressed by overall energy amounts accumulated on vertices that resulted from this energy distribution process. This energy distribution process usually starts by energizing a starting vertex of graph with a given amount of energy and can be formulated as follows. Whenever vertex v is activated by energy e, energy e is added to overall activation vector c, where component c v = c v + e. Subsequently all vertices directly connected to vertex v are activated by energy e = e/ρ(v), where ρ(v) is the degree of vertex v. For convergence reasons vertex activation energy e must also satisfy e > θ, where θ is a given small energy threshold value. Algorithm 1 describes this process using a recursive function. Algorithm 1 Spread-activation(v, e, c 0) Require: Starting vertex v. Require: Activation energy e > 0. Require: Energy c accumulated on graph vertices. 1: c v c v + e 2: e e/vertex-degree(v) 3: if e > θ then 4: for all vertices t such as, there exists an edge from v to t do 5: c Spread-activation(t, e, c) 6: end for 7: end if 8: return c 1 This matrix must satisfy some additional constraints such as being stochastic, irreducible and aperiodic. We explain and discuss these properties later. 2 In an actual implementation an infinite loops would of course take infinite time to calculate, thus a small convergence threshold is usually used to stop computation resulting into an acceptable eigenvector approximation.

4 Rewriting Spreading Activation Search Let us start with a simple adjacency matrix B of some graph G defined as { 1 if G contains an edge from vertex i to j B ij = 0 otherwise (3) Stochastic matrix A can be created from B by dividing each matrix element A ij by corresponding vertex degree ρ(i). In matrix algebra this is equivalent to matrix row normalization, where k is the total number of vertices. A ij = B ij ρ(i) = B ij (4) B ik Such stochastic matrix can be now easily exploited to calculate one energy distribution step of spreading activation search. Simple matrix multiplication of a starting vector r 0 by matrix A results into a new vector r 1. This vector represents energy on each vertex of graph which has been distributed by neighboring edges, while r 0 consists of starting activation energies for each vertex of graph G. By applying these matrix multiplications in an iterative fashion, successive energy distributions steps can be calculated. k r k = r k 1 A (5) In spreading activation search, final vertex ranks are obtained by accumulating energies, which were propagated through neighboring edges. Since we have shown that r 0, r 1,..., r correspond to these energy propagation steps, final ranks can be easily calculated by summing up all propagated energies. After k steps final ranks c of graph vertices are computed as follows. c k = k r i (6) With a silent assumption of discarding threshold a comparison of equations 1 and 5 reveals that power method (in its purest form) is also present in spreading activation search. Thresholding can be included back into this calculation by adding a thresholding function τ θ with threshold θ at each energy propagation step. Where τ θ is defined as i=0 r k = τ θ (r k 1 A) (7) (τ θ (r)) i = { ri if r i > θ 0 otherwise (8)

5 Comparison of Rank Computation The main difference between power method and spreading activation search lies in final rank calculation. While ranks computed by power method use only the last step (an eigenvector approximation), spreading activation search ranks sum up the history of the whole process of finding an eigenvector by power method. Figure 1 shows an example of eigenvector approximation using the power method (a) and a calculation of ranks by spreading activation search (b) on a small graph (4 vertices, 6 edges). Normalized values of ranks for both methods clearly converge to the same result of normalized eigenvector approximation. 5 10 15 20 (a) Power method. 5 10 15 20 (b) Spreading activation search. Fig. 1: Convergence of ranks computation on a small graph (4 vertices, 6 edges). Figure 2 shows another example of rank calculation on a small periodic graph (5 vertices, 12 edges). By definition power method fails to converge on periodic graphs and starts to oscillate (a). However, spreading activation search successfully converges even on such periodic graph (b). 0 5 10 15 20 25 (a) Power method. 0 20 40 60 80 100 (b) Spreading activation search. Fig. 2: Comparison of ranks computation on a periodic graph (5 vertices, 12 edges). A large adjacency matrix constructed from CiteSeer citation dataset consisting of approximately 3300 vertices has been used for further experimentation. This matrix has been transformed into PageRank form with damping factor

0.85 to ensure convergence of power method. Figure 5 shows the convergence of mean vertex rank difference of spreading activation calculation and power method calculation. Mean vertex rank difference is calculated as mean vertex rank difference = N i=1 a i b i N (9) where N is number of vertices. Mean rank difference 0.000 0.010 0.020 0.030 0 10 20 30 40 50 Fig. 3: Convergence of mean vertex rank difference of spreading activation search and power method calculation on CiteSeer dataset. 6 Revisiting PageRank, HITS and NodeRanking The process of calculating ranks using spreading activation search starts by initializing a starting vector r 0 which represents starting activation energies on vertices, then the power method is used to calculate energy distribution steps which are finally summed up resulting into final ranks. Table 1 shows the power method, PageRank [3], HITS [6], NodeRanking [7] and spreading activation search ranking algorithms rewritten into this initialization, iteration and ranking scheme.

Table 1: Graph ranking algorithms comparison. Algorithm Initialization Ranking Power method r 0 r k = r k 1 A r k PageRank 3 M {}}{ (1 d) E n + da r 0 r k = r k 1M r k HITS 4 a 0, h 0 a k = a k 1 A T A h k = h k 1 AA T NodeRanking 5 Spreading activation with threshold θ M {}}{ J E + (E J)A n J ii = 1 σ(i)+1 r 0 r k = r k 1 M r 0 r k = τ θ (r k 1 A) a k, h k r k k i=0 r i 7 Future Work We have shown that the power method used to calculate eigenvectors is also present in spreading activation search. Since fast eigenvector calculation is still a demanding and open problem we focus our further research to exploitation of various spreading activation search speedup techniques such as constrained spreading activation [8] or caching spreading activation [9] which can be used in distributed enviroment. By reusing these techniques we believe to find an alternative method for eigenvector calculation, however convergence characteristics for large graphs are still being explored. 8 Conclusion By rewriting spreading activation search using matrix algebra we have shown that this graph ranking approach contains a minor variation of power method which is used to calculate eigenvectors of matrices. We have unified the notation of PageRank, HITS, NodeRanking and spreading activation search algorithms in a simple scheme consisting of three steps (initialization, iteration and ranking) demonstrating a significant similarity between them. We have empirically shown 3 d is damping factor (probability of jumping to a random vertex), E is a matrix of all ones, n is number of all vertices in graph. 4 a is authority rank, h is hubness rank. 5 J is a probability matrix of a random jump from a vertex.

that normalized ranks calculated by power method and spreading activation search can converge to the same result, thus hopefully opening a new possibility to calculate eigenvectors by exploiting speedup techniques originally designed for spreading activation search. Acknowledgment. This work was partially supported by the Slovak Research and Development Agency under the contract No. APVT-20-007104, the State programme of research and development Establishing of Information Society under the contract No. 1025/04 and the Cultural and Educational Grant Agency of the Slovak Republic, grant No. KEGA 3/5187/07. References 1. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data- Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006) 2. Wu, B., Goel, V., Davison, B.D.: Topical trustrank: using topicality to combat web spam. In: WWW 06: Proceedings of the 15th international conference on World Wide Web, New York, NY, USA, ACM Press (2006) 63 72 3. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia (1998) 161 172 4. Návrat, P., Bieliková, M., Rozinajová, V.: Acquiring, organising and presenting information and knowledge from the web. Communication and Cognition 40(1 2) (2007) 37 44 5. Maciej Ceglowski, A.C., Cuadrado, J.: Semantic search of unstructured data using contextual network graphs (2003) 6. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5) (1999) 604 632 7. Pujol, J.M., S.R., Delgado, J.: A Ranking Algorithm Based on Graph Topology to Generate Reputation or Relevance. In: Web Intelligence. Springer Verlag (2003) 8. Crestani, F., Lee, P.L.: WebSCSA: Web search by constrained spreading activation. In: ADL 99: Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, IEEE Computer Society (1999) 163 9. Suchal, J.: Caching Spreading Activation Search. In Bieliková, M., ed.: IIT.SRC 2007: Student Research Conference, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava (April 2007) 151 155