An applica)on of Markov Chains: PageRank. Finding relevant informa)on on the Web

Size: px
Start display at page:

Download "An applica)on of Markov Chains: PageRank. Finding relevant informa)on on the Web"

Transcription

1 An applica)on of Markov Chains: PageRank Finding relevant informa)on on the Web

2 Please Par)cipate h>p://

3

4 How much do you know about PageRank? 1) Nothing. 2) I heard about it, but don t know what it is 3) I learned about it but I forgot 4) I know it well

5 Key Ques)on: Best Answer (thus far): Best Part: find relevant informa)on in this large- scale distributed system PageRank you know the tools to appreciate it!

6 Map of Internet 2003 h>p:// web: 1 trillion pages - 1 trillion links brain: 0.1 trillion neurons trillion links

7 TU DelC: 8 ES group: 5

8 Ques)on 1) Which page is the most relevant? This is a good university This is a great university This is a university Answers: 1) page A 2) page B 3) page C 4) All

9 Ques)on 2) Which page is the most relevant, now? This is a good university This is a great university This is a university Answers: 1) page A 2) page B 3) page C 4) All

10 The Story global search share (%) Problem: how to find relevant informa;on? PageRank Algorithm Google Yahoo! ) PageRank Two CS PhD students 1) Markov insight 2) Before PageRank 4) ACer PageRank 5) Bracelet experiment

11 1) The Markov Insight: on probability, free- will and poetry 1713: Independent events [Bernoulli] Gambling: coin flipping and dice rolling steady state: weak law of large numbers 1902: Free- will [Nekrasov] theory used to prove free- will (over predes)na)on). 1906: Dependent events [Markov] Markov didn t like this abuse of mathema)cs steady state under some condi)ons: used a poem as test case p(vowel) = 0.43 Independent: p(vowel vowel) = 0.43^2 = 0.19 Dependent: p(vowel vowel) = 0.06 Nice ar)cle: h>p:// links- in- the- markov- chain/

12 Example Localiza)on Your posi)on at t+1 depends on where you were at t. You don t teletransport! There are many many applica)ons, stock market, social sciences, biology, gene)cs, voice recogni)on, etc

13

14 So what What does the Markov insight has to do with PageRank? Everything Your Turn!

15 2) Life Before PageRank

16 Imperfect Solu)on 1: Word Frequency - problem This is a great University. Our University provides the best educa)on. No other University matches our strength. Our University has a beau)ful campus.

17 Imperfect Solu)on 1: Word Frequency - outcome President of Stanford did his own search

18 But the WWW is a graph Wouldn t it make sense to look at the graph structure? But what characteris)cs of the graph are the ones what ma>er?

19 Ques)on 3) A graph consists of nodes connected via links: incoming and outgoing links. We will see that both play a major role in PageRank, but for now use your intui)on to iden)fy the most relevant source: A B C D F Answers: 1) page A 2) page B 3) page D 4) page F E

20 Imperfect Solu)on 2: Incoming Links - metric Which orange node is more relevant? example: research papers centrality = degree(v)/ E problem of cita)ons

21 Imperfect Solu)on 2: Incoming Links - problem reputa)on of the source ma>ers centrality = degree(v)/ E reputa)on = Σ r(j), where j are neighbors

22 Outgoing Links Incoming links ma>er Reputa)on of sources ma>ers What about outgoing links?

23 Outgoing Links: set up

24 Outgoing Links: Problem not only ma>ers reputa)on of source, but also if we are special centrality = degree(v)/ E reputa)on = Σ r(j) r(i) = Σ ( r(j) / N(j) )

25 Ques)on 4) Recall: incoming links ma>er, outgoing links ma>er, reputa)on ma>ers. Assume that ini)ally all nodes have the same reputa)on. Which node is the most relevant? C Answers: A B D F 1) page A 2) page B 3) page D 4) page F E

26 Markov Chains to the rescue! Incoming links ma>er Reputa)on ma>ers Outgoing links ma>er r(i) = Σ ( r(j) / N(j) ) S)ll, how do we get reputa)ons?

27 Reputa)ons look like a Markov Chain.Remember the water filling slide?

28 Ques)on 5) What is the node with the highest page rank? And what is its rank? Answers: 1) page A 2) page B 3) page C 4) pages B and C Answer format: (page X, rank Y);(page Z, rank R); Got stuck? State where! Calculator: h>p://wims.unice.fr/wims/en_tool~linear~linsolver.en.html

29 Solu)on 5) Linear System a = c/2 b = a + c/2 c = b a + b + c = 1 Solu)on: a=0.2, b=0.4, c=0.4

30 What are the problems with this Naïve Page Rank? You are given the web graph and are asked to run the Markov Chain you know. What problems do you foresee occurring? Your turn!

31 Irreducibility Absorp)on states Graph par))ons Steady state (Rank) depends on inputs How do we solve it? C A B D F G H E

32 The Story global search share (%) Theore)cally there were other seminal studies. Authorita)ve Sources in a Hyperlinked Environment, Jon M. Kleinberg, Google Yahoo! ) PageRank 1) Markov insight 2) Before PageRank 4) ACer PageRank

33 Damping factor Create virtual connec)ons to all other nodes. Weight of new connec)ons (red) is lower than original ones (blue) C A B D F G H E

34 Ques)on 6) What is the effect of adding links among all nodes to the final steady state (rank)? 1) The reputa)ons will be more spread out (capitalism). 2) The reputa)ons will be more similar (socialism). 3) The reputa)ons will remain the same (no effect) C A B D F G H E

35 Recap Incoming links ma>er Reputa)on ma>ers Outgoing links ma>er These graph proper)es can be combined with Markov Chains to obtain the ranks of pages (steady state) But applying blindly MC have problems. Is the WWW graph irreducible and aperiodic? The damping factor overcomes the irreducibility problem. Let s do an example

36 Damping factor C A B D F G H E

37 Itera)ve Solu)on The PageRank cita)on ranking: Bringing order to the web. L Page, S Brin, R Motwani, T Winograd, 1999

38 The PageRank or Google Matrix

39 Ques)on 7) Assume a damping factor of p = 0.3. Find the nodes ranks? Answer format: (page X, rank Y);(page Z, rank R); Matrix calculator: h>p://matrix.reshish.com/ h>p://wims.unice.fr/wims/en_tool~linear~linsolver.en.html

40 Solu)on matrix and rank p = 0.3, n=3, hence B = 0.3/ A = M Rank = ~{a=0.23, b=0.39, c=0.37} w/o damping factor: a=0.2, b=0.4, c=0.4 why is this happening?

41 First a simple test samplematrixmul)plica)on.m Setup = 1, df = 0 & df = 0.15; Setup = 2, df = 0 & df =0.15 Setup = 3, df = 0 & df = 0.15; What s the problem (discussion board) setup = 1 setup = 2 setup = 3

42

43 Ques)on 8) Matrix calculator: h>p://matrix.reshish.com/

44 p=0.4

45 Periodicity Problem is that there is no steady state How do you solve it? is guaranteed in prac)ce for the Web Topic- Sensi)ve PageRank, Taher H. Haveliwala, WWW Page 2 column 2

46

47 Experimental data matrices from experiments 1) Check if matrix is column stochas)c and irreducible 2) (If needed) Make matrix irreducible. Add self links 3) (if needed) Normalize matrix: column stochas)c. 4) Calculate B matrix (all ones). Damping factor = ) Obtain M matrix, M = p*b + (1- p)a 6) Obtain the steady state 7) Enumerate the top- 5 social bu>erflies.

48 Data descrip)on Proximity: RSS Time of contact: Number of periods Directed graphs (friendship is not mutual) Data depic)on graph_- 50_0_lunch graph_- 50_50_lunch

49 Data is anonymous lunch: 60 par)cipants drinks: 30 par)cipants Active IDs on each experiment drinks lunch Connec)vity Matrix

50 Recap Markov insight, 1906 WWW, 1990 s First search engines based on word frequency Proper)es of graph are important PageRank, 1998 A Markov Chain with a damping factor Personalized PageRank

51 Ques)on 10) Compute the PageRank assume a damping constant p =0.2

52 Solu)on Matrix & Rank p=0.2, n=8, B = [ 0.2/8 = 0.025], M Rank = { 0.725, 0.065, 0.085, 0.025, 0.025, 0.025, 0.025, 0.025}

53 Links The PageRank cita)on ranking: Bringing order to the web. L Page, S Brin, R Motwani, T Winograd, 1999, Stanford University h>p://ilpubs.stanford.edu:8090/422/1/ pdf Networked Markets. Ashish Goel, Stanford University h>p://web.stanford.edu/class/msande233/handouts/lectures6-7.pdf The Mathema)cs of Web Search. Raluca Tanase and Remus Radu, Cornell University. h>p:// lecture3.html PageRank Examples. Dell Zhang, University of London h>p:// Topic- Sensi)ve PageRank. Taher H. Haveliwala, Stanford University h>p://research.taherh.org/pubs/topic- sensi)ve- pagerank.pdf

54 or Subject QEES pra)cum

CS60092: Informa0on Retrieval

CS60092: Informa0on Retrieval Introduc)on to CS60092: Informa0on Retrieval Sourangshu Bha1acharya Today s lecture hypertext and links We look beyond the content of documents We begin to look at the hyperlinks between them Address ques)ons

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Today s lecture hypertext and links

Today s lecture hypertext and links Today s lecture hypertext and links Introduc*on to Informa(on Retrieval CS276 Informa*on Retrieval and Web Search Chris Manning and Pandu Nayak Link analysis We look beyond the content of documents We

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval CS276 Informa*on Retrieval and Web Search Chris Manning and Pandu Nayak Link analysis Today s lecture hypertext and links We look beyond the content of documents We

More information

PageRank and related algorithms

PageRank and related algorithms PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

On Finding Power Method in Spreading Activation Search

On Finding Power Method in Spreading Activation Search On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval CS276 Informa*on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 18: Link analysis Today s lecture hypertext and links We look beyond the content

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

PPI Network Alignment Advanced Topics in Computa8onal Genomics

PPI Network Alignment Advanced Topics in Computa8onal Genomics PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics PPI Network Alignment Compara8ve analysis of PPI networks across different species by aligning the PPI networks Find func8onal orthologs

More information

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Link Analysis Informa0on Retrieval. Evangelos Kanoulas

Link Analysis Informa0on Retrieval. Evangelos Kanoulas Link Analysis Informa0on Retrieval Evangelos Kanoulas e.kanoulas@uva.nl How Search Works Logging Clicks Context Crawling Quality Freshness Spaminess Text processing & Indexing Ranking Algorithm Content

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

Link Analysis. Hongning Wang

Link Analysis. Hongning Wang Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query

More information

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss

More information

SEMINAR: GRAPH-BASED METHODS FOR NLP

SEMINAR: GRAPH-BASED METHODS FOR NLP SEMINAR: GRAPH-BASED METHODS FOR NLP Organisatorisches: Seminar findet komplett im Mai statt Seminarausarbeitungen bis 15. Juli (?) Hilfen Seminarvortrag / Ausarbeitung auf der Webseite Tucan number for

More information

A Reordering for the PageRank problem

A Reordering for the PageRank problem A Reordering for the PageRank problem Amy N. Langville and Carl D. Meyer March 24 Abstract We describe a reordering particularly suited to the PageRank problem, which reduces the computation of the PageRank

More information

Motivation. Motivation

Motivation. Motivation COMS11 Motivation PageRank Department of Computer Science, University of Bristol Bristol, UK 1 November 1 The World-Wide Web was invented by Tim Berners-Lee circa 1991. By the late 199s, the amount of

More information

An Adaptive Approach in Web Search Algorithm

An Adaptive Approach in Web Search Algorithm International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1575-1581 International Research Publications House http://www. irphouse.com An Adaptive Approach

More information

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018 PageRank CS16: Introduction to Data Structures & Algorithms Spring 2018 Outline Background The Internet World Wide Web Search Engines The PageRank Algorithm Basic PageRank Full PageRank Spectral Analysis

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

CS 315 Intro to Human Computer Interac4on (HCI)

CS 315 Intro to Human Computer Interac4on (HCI) 1 CS 315 Intro to Human Computer Interac4on (HCI) 2 HCI So what is it? 3 4 Hall of Fame or Shame? Page setup in IE5 (example courtesy of James Landay) 5 Hall of Shame! Page setup in IE5 Page preview nice,

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

Social Networks Measures. Single- node Measures: Based on some proper7es of specific nodes Graph- based measures: Based on the graph- structure

Social Networks Measures. Single- node Measures: Based on some proper7es of specific nodes Graph- based measures: Based on the graph- structure Social Networks Measures Single- node Measures: Based on some proper7es of specific nodes Graph- based measures: Based on the graph- structure of the network Graph- based measures of social influence Previously

More information

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants

More information

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet

More information

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases Roadmap Random Walks in Ranking Query in Vagelis Hristidis Roadmap Ranking Web Pages Rank according to Relevance of page to query Quality of page Roadmap PageRank Stanford project Lawrence Page, Sergey

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Giri Iyengar Cornell University gi43@cornell.edu March 14, 2018 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 1 / 24 Overview 1 Social Networks 2 HITS 3 Page

More information

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) ' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

Collaborative filtering based on a random walk model on a graph

Collaborative filtering based on a random walk model on a graph Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:

More information

CS6200 Information Retreival. The WebGraph. July 13, 2015

CS6200 Information Retreival. The WebGraph. July 13, 2015 CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects

More information

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Link Analysis in Graphs: PageRank Link Analysis Graphs Recall definitions from Discrete math and graph theory. Graph. A graph

More information

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches

More information

Queries. Inf 2B: Ranking Queries on the WWW. Suppose we have an Inverted Index for a set of webpages. Disclaimer. Kyriakos Kalorkoti

Queries. Inf 2B: Ranking Queries on the WWW. Suppose we have an Inverted Index for a set of webpages. Disclaimer. Kyriakos Kalorkoti Qeries Inf B: Ranking Qeries on the WWW Kyriakos Kalorkoti School of Informatics Uniersity of Edinbrgh Sppose e hae an Inerted Index for a set of ebpages. Disclaimer I Not really the scenario of Lectre.

More information

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

DSCI 575: Advanced Machine Learning. PageRank Winter 2018 DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on

More information

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank Hans De Sterck Department of Applied Mathematics University of Waterloo, Ontario, Canada joint work with Steve McCormick,

More information

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017

More information

Automated Program Debugging Research vs. Prac7ce?

Automated Program Debugging Research vs. Prac7ce? Automated Program Debugging Research vs. Prac7ce? Franz Wotawa Technische Universität Graz Ins2tute for So7ware Technology Inffeldgasse 16b/2, 8010 Graz, Austria wotawa@ist.tugraz.at Some ques7ons asked

More information

Information Retrieval. Lecture 11 - Link analysis

Information Retrieval. Lecture 11 - Link analysis Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks

More information

Web Mining: A Survey on Various Web Page Ranking Algorithms

Web Mining: A Survey on Various Web Page Ranking Algorithms Web : A Survey on Various Web Page Ranking Algorithms Saravaiya Viralkumar M. 1, Rajendra J. Patel 2, Nikhil Kumar Singh 3 1 M.Tech. Student, Information Technology, U. V. Patel College of Engineering,

More information

Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science

Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking A Writing Project Presented to The Faculty of the Department of Computer Science San José State University In Partial Fulfillment of the

More information

Social Dynamics of Informa0on Kris0na Lerman USC Informa0on Sciences Ins0tute

Social Dynamics of Informa0on Kris0na Lerman USC Informa0on Sciences Ins0tute Social Dynamics of Informa0on Kris0na Lerman USC Informa0on Sciences Ins0tute h"p://www.isi.edu/~lerman Social media has changed how people create, share and consume informa:on h"p://blog.socialflow.com/post/5246404319/

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question

More information

A brief history of Google

A brief history of Google the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 21: Link Analysis Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-06-18 1/80 Overview

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

Best Prac*ces in Accessibility and Universal Design for Learning. Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology

Best Prac*ces in Accessibility and Universal Design for Learning. Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology Best Prac*ces in Accessibility and Universal Design for Learning Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology Purpose The purpose of this session is to iden*fy best

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

CS 6140: Machine Learning Spring 2017

CS 6140: Machine Learning Spring 2017 CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Grades

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually

More information

A New Technique for Ranking Web Pages and Adwords

A New Technique for Ranking Web Pages and Adwords A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Lec 8: Adaptive Information Retrieval 2

Lec 8: Adaptive Information Retrieval 2 Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:

More information

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science UC Davis, ECS20, Winter 2017 Discrete Mathematics for Computer Science Prof. Raissa D Souza (slides adopted from Michael Frank and Haluk Bingöl) Lecture 11 Algorithms 3.1-3.2 Algorithms Member of the House

More information

Mining The Web. Anwar Alhenshiri (PhD)

Mining The Web. Anwar Alhenshiri (PhD) Mining The Web Anwar Alhenshiri (PhD) Mining Data Streams In many data mining situations, we know the entire data set in advance Sometimes the input rate is controlled externally Google queries Twitter

More information

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search

More information

Graph and Link Mining

Graph and Link Mining Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

Advanced Computer Architecture: A Google Search Engine

Advanced Computer Architecture: A Google Search Engine Advanced Computer Architecture: A Google Search Engine Jeremy Bradley Room 372. Office hour - Thursdays at 3pm. Email: jb@doc.ic.ac.uk Course notes: http://www.doc.ic.ac.uk/ jb/ Department of Computing,

More information

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff Dr Ahmed Rafea

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff  Dr Ahmed Rafea Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff http://www9.org/w9cdrom/68/68.html Dr Ahmed Rafea Outline Introduction Link Analysis Path Analysis Using Markov Chains Applications

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

PAGE RANK ON MAP- REDUCE PARADIGM

PAGE RANK ON MAP- REDUCE PARADIGM PAGE RANK ON MAP- REDUCE PARADIGM Group 24 Nagaraju Y Thulasi Ram Naidu P Dhanush Chalasani Agenda Page Rank - introduction An example Page Rank in Map-reduce framework Dataset Description Work flow Modules.

More information

Informa/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields

Informa/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,

More information

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12 Midterm Grades and solu4ons are (and have been) on Moodle The midterm was hard[er than I thought] grades will be scaled I gave everyone a 10 bonus point (already included in your total) max: 98 mean: 71

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis Content Anchor text Link analysis for ranking Pagerank and variants HITS The Web as a Directed Graph Page A Anchor

More information

Annual Reviews: A Nonprofit Scien.fic Publisher. Bringing the Best Review Literature to the Worldwide Scien9fic Community for over 75 Years

Annual Reviews: A Nonprofit Scien.fic Publisher. Bringing the Best Review Literature to the Worldwide Scien9fic Community for over 75 Years Annual Reviews: A Nonprofit Scien.fic Publisher Bringing the Best Review Literature to the Worldwide Scien9fic Community for over 75 Years In this brief presenta9on, you will learn how to: 1) Navigate

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Clustering Chris Manning, Pandu Nayak, and Prabhakar Raghavan Today s Topic: Clustering Document clustering Mo*va*ons Document representa*ons Success criteria Clustering

More information

COMP Page Rank

COMP Page Rank COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper

More information

F. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google

F. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google Web Search Engines 1 Web Search before Google Web Search Engines (WSEs) of the first generation (up to 1998) Identified relevance with topic-relateness Based on keywords inserted by web page creators (META

More information

B490 Mining the Big Data. 5. Models for Big Data

B490 Mining the Big Data. 5. Models for Big Data B490 Mining the Big Data 5. Models for Big Data Qin Zhang 1-1 2-1 MapReduce MapReduce The MapReduce model (Dean & Ghemawat 2004) Input Output Goal Map Shuffle Reduce Standard model in industry for massive

More information

Brief (non-technical) history

Brief (non-technical) history Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University

More information

Ar#ficial Intelligence

Ar#ficial Intelligence Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for

More information

A two-stage strategy for solving the connection subgraph problem

A two-stage strategy for solving the connection subgraph problem Graduate Theses and Dissertations Graduate College 2012 A two-stage strategy for solving the connection subgraph problem Heyong Wang Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Pagerank Scoring. Imagine a browser doing a random walk on web pages: Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably

More information

An Introduction to Search Engines and Web Navigation

An Introduction to Search Engines and Web Navigation An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong

More information

An Improved PageRank Method based on Genetic Algorithm for Web Search

An Improved PageRank Method based on Genetic Algorithm for Web Search Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Data mining --- mining graphs

Data mining --- mining graphs Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network

More information

A project from the Social Media Research Founda8on: h:p://

A project from the Social Media Research Founda8on: h:p:// A project from the Social Media Research Founda8on: h:p://www.smrfounda8on.org About Me Introduc8ons Marc A. Smith Chief Social Scien8st Connected Ac8on Consul8ng Group Marc@connectedac8on.net h:p://www.connectedac8on.net

More information

CIP- OSN Online Social Networks as Graphs. Dr. Thanassis Tiropanis -

CIP- OSN Online Social Networks as Graphs. Dr. Thanassis Tiropanis - CIP- OSN Online Social Networks as Graphs Dr. Thanassis Tiropanis - tt2@ecs http://vimeo.com/58729247 http://socialnetworks.soton.ac.uk The narra9ve Web Evolu9on and Online Social Networks Web Evolu9on

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

CS 6140: Machine Learning Spring Final Exams. What we learned Final Exams 2/26/16

CS 6140: Machine Learning Spring Final Exams. What we learned Final Exams 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program

More information