The Necessity of Mathematics
|
|
- Samantha Chambers
- 6 years ago
- Views:
Transcription
1 The Necessity of Mathematics from Google to Counterterrorism to Sudoku Amy Langville work supported by NSF-CAREER-0566, NSA, DOEd, SAS, Semandex Mathematics Department College of Charleston Charleston, SC AMS Congressional Meeting /6/006
2 The Message Mathematics is useful. Mathematical models don t care about scale or size of problem. Mathematical models are broadly applicable. Mathematical research is an inventive process.
3 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
4 Overriding Mathematical Techniques Optimization Matrix Analysis min/max Objective subject to Constraint Constraint Graph Theory
5 Outline Sudoku optimization, matrices Military Applications optimization, graphs planning flight paths disabling and herding communication in networks Ranking Applications matrices, graphs ranking on the World Wide Web Clustering and Data Mining Applications optimization, matrices, graphs clustering the Enron dataset clustering on terrorist networks
6 Sudoku puzzle Sudoku
7 Sudoku Sudoku puzzle Sudoku matrix
8 Sudoku Sudoku puzzle Sudoku matrix Definition A n n matrix is called a Sudoku matrix if:. n is a perfect square (e.g.,, 9, 6, 5),. every row uses the integers through n exactly once,. every column uses the integers through n exactly once,. every submatrix uses the integers through n exactly once.
9 Mathematical Model of Sudoku
10 Mathematical Model of Sudoku Value of the Model With a computer algorithm, we can solve any Sudoku puzzle, regardless of: size n number of givens level of difficulty 9 9 puzzle takes 6.7 seconds to solve on desktop machine.
11 Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle
12 Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle Solution Solution
13 Some Interesting 9 9 Sudoku Facts How many 9 9 matrices deserve the title of Sudoku matrices? 6,670,90,75,0,07,96, What is the fewest number of givens that must be provided to create a 9 9 puzzle with a unique solution? 7; 5,96 distinct puzzles with 7 givens and a unique solution have been found. No unique solution puzzle with 6 givens has been found yet. Given one Sudoku matrix, could I make my own Daily Sudoku Calendar? Puzzle Unique Solution Puzzle Unique Solution By using mathematical operations 6,879 ( 99 years worth of) Sudoku matrices can be created from one 9 9 Sudoku matrix.
14 Military Applications
15 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
16 Flight Path Planning (Lincoln Labs) No-Fly Zone Target Radar Objective: Constraints: Enemy Territory create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
17 Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
18 Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
19 Discretization Flight Path Planning
20 Flight Path Planning Connect the Dots plane must fly over target plane must avoid no-fly zones plane has fixed amount of fuel (total # path segments D) plane cannot make unrealistic turns
21 Flight Path Results
22 Flight Path Results Distance limit: 50; Path Distance=.988; Cost=0; Total time (sec): 6.5
23 Flight Path Results Distance limit: 00; Path Distance=98.975; Cost=0; Total time (sec): 56.8
24 Flight Path Results Sorry, no feasible path for D=70; Total time (sec): 5.7
25 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
26 NSA Enemy Communication Networks Enable pairs Disable pairs Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
27 NSA Communication Networks Enable pairs Disable pairs Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
28 NSA Communication Networks Enable pairs Disable pairs Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
29 NSA Communication Networks Enable pairs Disable pairs cutset Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
30 NSA Communication Networks Enable pairs Disable pairs cutset Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
31 NSA Communication Networks Enable pairs Disable pairs cutset Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
32 NSA Communication Networks Enable pairs Disable pairs cutset Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
33 Multiple enable-disable pairs Enable pairs Disable pairs Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
34 Herding Problem Enable pairs Disable pairs 6 Monitoring set Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs herd all communication over monitored set
35 Ranking Applications
36 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
37 Yahoo hierarchies of sites organized by humans Best Search Techniques word of mouth expert advice the pre-998 Web Overall Feeling of Users Jorge Luis Borges 9 short story, The Library of Babel When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon.... As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.
38 enter Link Analysis Change in User Attitudes about Web Search Today It s not my homepage, but it might as well be. I use it to ego-surf. I use it to read the news. Anytime I want to find out anything, I use it. - Matt Groening, creator and executive producer, The Simpsons I can t imagine life without Google News. Thousands of sources from around the world ensure anyone with an Internet connection can stay informed. The diversity of viewpoints available is staggering. - Michael Powell, chair, Federal Communications Commission Google is my rapid-response research assistant. On the run-up to a deadline, I may use it to check the spelling of a foreign name, to acquire an image of a particular piece of military hardware, to find the exact quote of a public figure, check a stat, translate a phrase, or research the background of a particular corporation. It s the Swiss Army knife of information retrieval. - Garry Trudeau, cartoonist and creator, Doonesbury
39 the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;. hezbollah: 9; ; 9; 9; 558;. global warming: 78; 980; 55;
40 PuPstyleBook March, 006 Index k-step transition matrix, 79 a vector, 7, 8, 75, 80 A9, absolute error, 0 absorbing Markov chains, 85 absorbing states, 85 accuracy, adaptive PageRank method, Adar, Eytan, 6 adjacency list, 77 adjacency matrix,, 76, 6,, 69 advertising, 5 aggregated chain, 97 aggregated chains, 95 aggregated transition matrix, 05 aggregated transition probability, 97 aggregation, 9 97 approximate, 0 0 exact, 0 05 exact vs. approximate, iterative, partition, 09 aggregation in Markov chains, 97 aggregation theorem, 05 Aitken extrapolation, 9 Alexa traffic ranking, 8 algebraic multiplicity, 57 algorithm PageRank, 0 Aitken extrapolation, 9 dangling node PageRank, 8, 8 HITS, 6 iterative aggregation updating, 08 personalized PageRank power method, 9 quadratic extrapolation, 9 query-independent HITS, α parameter, 7, 8,, 7 8 Amazon s traffic rank, anchor text, 8, 5, 0 Ando, Albert, 0 aperiodic, 6, aperiodic Markov chain, 76 Application Programming Interface (API), 65, 7, 97 approximate aggregation, 0 0 arc, 0 Arrow, Kenneth, 6 asymptotic convergence rate, 65 asymptotic rate of convergence,, 7, 0, 9, 5 Atlas of Cyberspace, 7 authority, 9, 0 authority Markov chain, authority matrix, 7, 0 authority score, 5, 0 authority vector, 0 Babbage, Charles, 75 back button, 8 86 BadRank, Barabasi, Albert-Laszlo, 0 Berry, Michael, 7 bibliometrics,, bipartite undirected graph, BlockRank, 9 97, 0 blog, 55, 6, 0 Boldi, Paolo, 79 Boolean model, 5 6, 0 bounce back, 8 86 bowtie structure, Brezinski, Claude, 9 Brin, Sergey, 5, 05 Browne, Murray, 7 Bush, Vannevar,, 0 Campbell, Lord John, canonical form, reducible matrix, 8 censored chain, 0 censored chains, 9 censored distribution, 0, 95 censored Markov chain, 9 censorship, 6 7 Cesàro sequence, 6 Cesàro summability, stochastic matrix, 8 characteristic polynomial, 0, 56 Chebyshev extrapolation, 9 Chien, Steve, 0 cloaking, clustering search results, co-citation,, 0 co-reference,, 0 Collatz Wielandt formula, 68, 7 complex networks, 0 compressed matrix storage, 76 condition number, 59, 7, 55 Condorcet, 6 connected components, 7,
41 the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total)
42 the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total) too many results per search term
43 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote Markov chain 6 5
44 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
45 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
46 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
47 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
48 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
49 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
50 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
51 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
52 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
53 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote page is a dangling node 6 5
54 Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote surfer teleports 6 5
55 Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages.
56 Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages. Proportion of Time Page =.0 Page =.05 Page =.0 Page =.8 Page 5 =.0 Page 6 = Ranked List of Pages Page Page 6 Page 5 Page Page Page
57 Clustering and Data Mining Applications
58 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
59 The Enron Dataset (SAS) PRIVATE collection of 50 Enron employees during 00 9,000 terms and 65,000 messages Term-by-Message Matrix f astow f astow skilling subpoena 0... dynegy
60
61
62 Clustering the Enron Dataset
63 Tracking Enron clusters over time
64 Visualizing Clusters in the Enron Dataset
65
66 Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron dataset clustering on terrorist networks
67 Data Mining on Terrorist Networks locating most important terrorists clustering terrorists identifying central nodes in terrorist network
68 Terrorist Network
69
70
71 Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and
72 Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and T ime = Money
1.6 Case Study: Random Surfer
Memex 1.6 Case Study: Random Surfer Memex. [Vannevar Bush, 1936] Theoretical hypertext computer system; pioneering concept for world wide web. Follow links from book or film to another. Tool for establishing
More informationWeb search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)
' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationCOMP Page Rank
COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationInformation Retrieval. Lecture 11 - Link analysis
Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks
More informationAgenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page
Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationHow Google Finds Your Needle in the Web's
of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 21: Link Analysis Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-06-18 1/80 Overview
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationCOMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationA brief history of Google
the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationPagerank Scoring. Imagine a browser doing a random walk on web pages:
Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 12: Link Analysis January 28 th, 2016 Wolf-Tilo Balke and Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig An Overview
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationLink Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.
Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationPageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 1, Ver. III (Jan.-Feb. 2017), PP 01-07 www.iosrjournals.org PageRank Algorithm Albi Dode 1, Silvester
More informationCalculating Web Page Authority Using the PageRank Algorithm. Math 45, Fall 2005 Levi Gill and Jacob Miles Prystowsky
Calculating Web Page Authority Using the PageRank Algorithm Math 45, Fall 2005 Levi Gill and Jacob Miles Prystowsky Introduction In 1998 a phenomenon hit the World Wide Web: Google opened its doors. Larry
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationEinführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants
More information1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a
!"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and
More informationSocial Network Analysis
Social Network Analysis Giri Iyengar Cornell University gi43@cornell.edu March 14, 2018 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 1 / 24 Overview 1 Social Networks 2 HITS 3 Page
More informationCS6200 Information Retreival. The WebGraph. July 13, 2015
CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #11: Link Analysis 3 Seoul National University 1 In This Lecture WebSpam: definition and method of attacks TrustRank: how to combat WebSpam HITS algorithm: another algorithm
More informationINTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)
INTRODUCTION TO DATA SCIENCE Link Analysis (MMDS5) Introduction Motivation: accurate web search Spammers: want you to land on their pages Google s PageRank and variants TrustRank Hubs and Authorities (HITS)
More informationA Reordering for the PageRank problem
A Reordering for the PageRank problem Amy N. Langville and Carl D. Meyer March 24 Abstract We describe a reordering particularly suited to the PageRank problem, which reduces the computation of the PageRank
More informationPage rank computation HPC course project a.y Compute efficient and scalable Pagerank
Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet
More informationBrief (non-technical) history
Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationFast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck
Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank Hans De Sterck Department of Applied Mathematics University of Waterloo, Ontario, Canada joint work with Steve McCormick,
More informationMathematical Analysis of Google PageRank
INRIA Sophia Antipolis, France Ranking Answers to User Query Ranking Answers to User Query How a search engine should sort the retrieved answers? Possible solutions: (a) use the frequency of the searched
More informationc 2006 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 27, No. 6, pp. 2112 212 c 26 Society for Industrial and Applied Mathematics A REORDERING FOR THE PAGERANK PROBLEM AMY N. LANGVILLE AND CARL D. MEYER Abstract. We describe a reordering
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationMathematical Methods and Computational Algorithms for Complex Networks. Benard Abola
Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network
More informationCSI 445/660 Part 10 (Link Analysis and Web Search)
CSI 445/660 Part 10 (Link Analysis and Web Search) Ref: Chapter 14 of [EK] text. 10 1 / 27 Searching the Web Ranking Web Pages Suppose you type UAlbany to Google. The web page for UAlbany is among the
More informationUnit VIII. Chapter 9. Link Analysis
Unit VIII Link Analysis: Page Ranking in web search engines, Efficient Computation of Page Rank using Map-Reduce and other approaches, Topic-Sensitive Page Rank, Link Spam, Hubs and Authorities (Text Book:2
More informationPageRank and related algorithms
PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic
More informationLink Structure Analysis
Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score
More informationF. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google
Web Search Engines 1 Web Search before Google Web Search Engines (WSEs) of the first generation (up to 1998) Identified relevance with topic-relateness Based on keywords inserted by web page creators (META
More informationPAGE RANK ON MAP- REDUCE PARADIGM
PAGE RANK ON MAP- REDUCE PARADIGM Group 24 Nagaraju Y Thulasi Ram Naidu P Dhanush Chalasani Agenda Page Rank - introduction An example Page Rank in Map-reduce framework Dataset Description Work flow Modules.
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 12 Lecture 12: Crawling and Link Analysis Information Retrieval Last Time Chapter 11 1. Probabilistic Approach to Retrieval / Basic Probability Theory 2. Probability
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationSearching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW
Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax
More informationProximity Prestige using Incremental Iteration in Page Rank Algorithm
Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationThe PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems
The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss
More informationLec 8: Adaptive Information Retrieval 2
Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:
More informationLecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!
Lecture 11: Graph algorithms!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the scenes of MapReduce:
More informationCS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS
CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network
More informationThe PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems
The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it This talk The objective of this talk is to discuss
More informationLink Analysis. Hongning Wang
Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationCS/INFO 1305 Summer 2009
Information Retrieval Information Retrieval (Search) IR Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationCSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies
CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationDSCI 575: Advanced Machine Learning. PageRank Winter 2018
DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationAdvanced Computer Architecture: A Google Search Engine
Advanced Computer Architecture: A Google Search Engine Jeremy Bradley Room 372. Office hour - Thursdays at 3pm. Email: jb@doc.ic.ac.uk Course notes: http://www.doc.ic.ac.uk/ jb/ Department of Computing,
More informationData mining --- mining graphs
Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank
More informationThe PageRank Citation Ranking
October 17, 2012 Main Idea - Page Rank web page is important if it points to by other important web pages. *Note the recursive definition IR - course web page, Brian home page, Emily home page, Steven
More informationCS/INFO 1305 Information Retrieval
(Search) Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945 Artificial Intelligence Where (or for what)
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More information3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today
3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
More informationGoogle Pagerank And Why It s Important:
Google Pagerank And Why It s Important: Google Pagerank And Why It s Important: Looking to start a business online? Trying to make more money online with an existing website? Then you need to know about
More informationIntroduction To Graphs and Networks. Fall 2013 Carola Wenk
Introduction To Graphs and Networks Fall 203 Carola Wenk On the Internet, links are essentially weighted by factors such as transit time, or cost. The goal is to find the shortest path from one node to
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationLarge-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies
Large-Scale Networks PageRank Dr Vincent Gramoli Lecturer School of Information Technologies Introduction Last week we talked about: - Hubs whose scores depend on the authority of the nodes they point
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis Content Anchor text Link analysis for ranking Pagerank and variants HITS The Web as a Directed Graph Page A Anchor
More informationCollaborative filtering based on a random walk model on a graph
Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:
More informationGRAPHS (Undirected) Graph: Set of objects with pairwise connections. Why study graph algorithms?
GRAPHS (Undirected) Graph: Set of objects with pairwise connections. Why study graph algorithms? Interesting and broadly useful abstraction. Challenging branch of computer science and discrete math. Hundreds
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationLecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule
Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question
More informationThe application of Randomized HITS algorithm in the fund trading network
The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.
More informationSearching the Web for Information
Search Xin Liu Searching the Web for Information How a Search Engine Works Basic parts: 1. Crawler: Visits sites on the Internet, discovering Web pages 2. Indexer: building an index to the Web's content
More informationNetwork Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017
Network Centrality Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Node centrality n Relative importance of a node in a network n How influential a person is within a
More informationData Science Center Eindhoven. The Mathematics Behind Big Data. Alessandro Di Bucchianico
Data Science Center Eindhoven The Mathematics Behind Big Data Alessandro Di Bucchianico 4TU AMI SRO Big Data Meeting Big Data: Mathematics in Action! November 24, 2017 Outline Big Data Some real-life examples
More informationNetwork Flow. The network flow problem is as follows:
Network Flow The network flow problem is as follows: Given a connected directed graph G with non-negative integer weights, (where each edge stands for the capacity of that edge), and two distinguished
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.
More informationThe Internet - Changing Constantly - Purpose/Content of Websites -
- The Internet might seem intimidating at first. - A vast communications network spanning the globe with billions of web-pages that are made to satisfy and inform their constituents. Indeed the Internet
More information