The Necessity of Mathematics

Similar documents
1.6 Case Study: Random Surfer

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Lecture 8: Linkage algorithms and web search

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

COMP Page Rank

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Information Retrieval. Lecture 11 - Link analysis

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Big Data Analytics CSCI 4030

How to organize the Web?

How Google Finds Your Needle in the Web's

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Introduction to Information Retrieval

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Similarity Ranking in Large- Scale Bipartite Graphs

COMP5331: Knowledge Discovery and Data Mining

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

A brief history of Google

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Link Analysis and Web Search

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Big Data Analytics CSCI 4030

University of Maryland. Tuesday, March 2, 2010

Part 1: Link Analysis & Page Rank

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Information Retrieval and Web Search Engines

COMP 4601 Hubs and Authorities

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

Calculating Web Page Authority Using the PageRank Algorithm. Math 45, Fall 2005 Levi Gill and Jacob Miles Prystowsky

Lecture 27: Learning from relational data

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Information Networks: PageRank

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

Social Network Analysis

CS6200 Information Retreival. The WebGraph. July 13, 2015

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Introduction to Data Mining

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

A Reordering for the PageRank problem

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Brief (non-technical) history

Link Analysis in the Cloud

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

Mathematical Analysis of Google PageRank

c 2006 Society for Industrial and Applied Mathematics

Graph Data Processing with MapReduce

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

CSI 445/660 Part 10 (Link Analysis and Web Search)

Unit VIII. Chapter 9. Link Analysis

PageRank and related algorithms

Link Structure Analysis

F. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google

PAGE RANK ON MAP- REDUCE PARADIGM

Information Retrieval

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Social Networks 2015 Lecture 10: The structure of the web and link analysis

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

Lec 8: Adaptive Information Retrieval 2

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

Link Analysis. Hongning Wang

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

CS/INFO 1305 Summer 2009

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Bruno Martins. 1 st Semester 2012/2013

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

Mining Web Data. Lijun Zhang

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Advanced Computer Architecture: A Google Search Engine

Data mining --- mining graphs

The PageRank Citation Ranking

CS/INFO 1305 Information Retrieval

Slides based on those in:

Data-Intensive Computing with MapReduce

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today

Google Pagerank And Why It s Important:

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Web Structure Mining using Link Analysis Algorithms

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis

Collaborative filtering based on a random walk model on a graph

GRAPHS (Undirected) Graph: Set of objects with pairwise connections. Why study graph algorithms?

2.3 Algorithms Using Map-Reduce

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

The application of Randomized HITS algorithm in the fund trading network

Searching the Web for Information

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017

Data Science Center Eindhoven. The Mathematics Behind Big Data. Alessandro Di Bucchianico

Network Flow. The network flow problem is as follows:

CS425: Algorithms for Web Scale Data

The Internet - Changing Constantly - Purpose/Content of Websites -

Transcription:

The Necessity of Mathematics from Google to Counterterrorism to Sudoku Amy Langville langvillea@cofc.edu work supported by NSF-CAREER-0566, NSA, DOEd, SAS, Semandex Mathematics Department College of Charleston Charleston, SC AMS Congressional Meeting /6/006

The Message Mathematics is useful. Mathematical models don t care about scale or size of problem. Mathematical models are broadly applicable. Mathematical research is an inventive process.

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

Overriding Mathematical Techniques Optimization Matrix Analysis min/max Objective subject to Constraint Constraint. 5 0 0 7 0 0 0 Graph Theory 6 5 9 7 8

Outline Sudoku optimization, matrices Military Applications optimization, graphs planning flight paths disabling and herding communication in networks Ranking Applications matrices, graphs ranking on the World Wide Web Clustering and Data Mining Applications optimization, matrices, graphs clustering the Enron email dataset clustering on terrorist networks

Sudoku puzzle Sudoku

Sudoku Sudoku puzzle Sudoku matrix 6 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 9 9 9 5 5 5 5 5

Sudoku Sudoku puzzle Sudoku matrix 6 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 9 9 9 5 5 5 5 5 Definition A n n matrix is called a Sudoku matrix if:. n is a perfect square (e.g.,, 9, 6, 5),. every row uses the integers through n exactly once,. every column uses the integers through n exactly once,. every submatrix uses the integers through n exactly once.

Mathematical Model of Sudoku

Mathematical Model of Sudoku Value of the Model With a computer algorithm, we can solve any Sudoku puzzle, regardless of: size n number of givens level of difficulty 9 9 puzzle takes 6.7 seconds to solve on desktop machine.

Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle

Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle Solution Solution

Some Interesting 9 9 Sudoku Facts How many 9 9 matrices deserve the title of Sudoku matrices? 6,670,90,75,0,07,96,960 6.67 0 What is the fewest number of givens that must be provided to create a 9 9 puzzle with a unique solution? 7; 5,96 distinct puzzles with 7 givens and a unique solution have been found. No unique solution puzzle with 6 givens has been found yet. Given one Sudoku matrix, could I make my own Daily Sudoku Calendar? Puzzle Unique Solution Puzzle Unique Solution By using mathematical operations 6,879 ( 99 years worth of) Sudoku matrices can be created from one 9 9 Sudoku matrix.

Military Applications

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

Flight Path Planning (Lincoln Labs) No-Fly Zone Target Radar Objective: Constraints: Enemy Territory create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.

Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.

Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.

Discretization Flight Path Planning

Flight Path Planning Connect the Dots plane must fly over target plane must avoid no-fly zones plane has fixed amount of fuel (total # path segments D) plane cannot make unrealistic turns

Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00

Flight Path Results 00 80 60 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Distance limit: 50; Path Distance=.988; Cost=0; Total time (sec): 6.5

Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Distance limit: 00; Path Distance=98.975; Cost=0; Total time (sec): 56.8

Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Sorry, no feasible path for D=70; Total time (sec): 5.7

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

NSA Enemy Communication Networks Enable pairs Disable pairs 6 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 cutset 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 cutset 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 cutset 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

NSA Communication Networks Enable pairs Disable pairs 5 00 5 cutset 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

Multiple enable-disable pairs Enable pairs Disable pairs 6 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs

Herding Problem Enable pairs Disable pairs 6 Monitoring set 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs herd all communication over monitored set

Ranking Applications

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

Yahoo hierarchies of sites organized by humans Best Search Techniques word of mouth expert advice the pre-998 Web Overall Feeling of Users Jorge Luis Borges 9 short story, The Library of Babel When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon.... As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.

998... enter Link Analysis Change in User Attitudes about Web Search Today It s not my homepage, but it might as well be. I use it to ego-surf. I use it to read the news. Anytime I want to find out anything, I use it. - Matt Groening, creator and executive producer, The Simpsons I can t imagine life without Google News. Thousands of sources from around the world ensure anyone with an Internet connection can stay informed. The diversity of viewpoints available is staggering. - Michael Powell, chair, Federal Communications Commission Google is my rapid-response research assistant. On the run-up to a deadline, I may use it to check the spelling of a foreign name, to acquire an image of a particular piece of military hardware, to find the exact quote of a public figure, check a stat, translate a phrase, or research the background of a particular corporation. It s the Swiss Army knife of information retrieval. - Garry Trudeau, cartoonist and creator, Doonesbury

the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;. hezbollah: 9; ; 9; 9; 558;. global warming: 78; 980; 55;

PuPstyleBook March, 006 Index k-step transition matrix, 79 a vector, 7, 8, 75, 80 A9, absolute error, 0 absorbing Markov chains, 85 absorbing states, 85 accuracy, 79 80 adaptive PageRank method, 89 90 Adar, Eytan, 6 adjacency list, 77 adjacency matrix,, 76, 6,, 69 advertising, 5 aggregated chain, 97 aggregated chains, 95 aggregated transition matrix, 05 aggregated transition probability, 97 aggregation, 9 97 approximate, 0 0 exact, 0 05 exact vs. approximate, 05 07 iterative, 07 09 partition, 09 aggregation in Markov chains, 97 aggregation theorem, 05 Aitken extrapolation, 9 Alexa traffic ranking, 8 algebraic multiplicity, 57 algorithm PageRank, 0 Aitken extrapolation, 9 dangling node PageRank, 8, 8 HITS, 6 iterative aggregation updating, 08 personalized PageRank power method, 9 quadratic extrapolation, 9 query-independent HITS, α parameter, 7, 8,, 7 8 Amazon s traffic rank, anchor text, 8, 5, 0 Ando, Albert, 0 aperiodic, 6, aperiodic Markov chain, 76 Application Programming Interface (API), 65, 7, 97 approximate aggregation, 0 0 arc, 0 Arrow, Kenneth, 6 asymptotic convergence rate, 65 asymptotic rate of convergence,, 7, 0, 9, 5 Atlas of Cyberspace, 7 authority, 9, 0 authority Markov chain, authority matrix, 7, 0 authority score, 5, 0 authority vector, 0 Babbage, Charles, 75 back button, 8 86 BadRank, Barabasi, Albert-Laszlo, 0 Berry, Michael, 7 bibliometrics,, bipartite undirected graph, BlockRank, 9 97, 0 blog, 55, 6, 0 Boldi, Paolo, 79 Boolean model, 5 6, 0 bounce back, 8 86 bowtie structure, Brezinski, Claude, 9 Brin, Sergey, 5, 05 Browne, Murray, 7 Bush, Vannevar,, 0 Campbell, Lord John, canonical form, reducible matrix, 8 censored chain, 0 censored chains, 9 censored distribution, 0, 95 censored Markov chain, 9 censorship, 6 7 Cesàro sequence, 6 Cesàro summability, stochastic matrix, 8 characteristic polynomial, 0, 56 Chebyshev extrapolation, 9 Chien, Steve, 0 cloaking, clustering search results, co-citation,, 0 co-reference,, 0 Collatz Wielandt formula, 68, 7 complex networks, 0 compressed matrix storage, 76 condition number, 59, 7, 55 Condorcet, 6 connected components, 7,

the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total)

the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total) too many results per search term

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote Markov chain 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote page is a dangling node 6 5

Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote surfer teleports 6 5

Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages.

Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages. Proportion of Time Page =.0 Page =.05 Page =.0 Page =.8 Page 5 =.0 Page 6 =.9 6 5 Ranked List of Pages Page Page 6 Page 5 Page Page Page

Clustering and Data Mining Applications

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

The Enron Email Dataset (SAS) PRIVATE email collection of 50 Enron employees during 00 9,000 terms and 65,000 messages Term-by-Message Matrix f astow f astow skilling.......... subpoena 0... dynegy 0 0..........

Clustering the Enron Email Dataset

Tracking Enron clusters over time

Visualizing Clusters in the Enron Dataset

Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks

Data Mining on Terrorist Networks locating most important terrorists clustering terrorists identifying central nodes in terrorist network

Terrorist Network

Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and

Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and T ime = Money