The Necessity of Mathematics from Google to Counterterrorism to Sudoku Amy Langville langvillea@cofc.edu work supported by NSF-CAREER-0566, NSA, DOEd, SAS, Semandex Mathematics Department College of Charleston Charleston, SC AMS Congressional Meeting /6/006
The Message Mathematics is useful. Mathematical models don t care about scale or size of problem. Mathematical models are broadly applicable. Mathematical research is an inventive process.
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
Overriding Mathematical Techniques Optimization Matrix Analysis min/max Objective subject to Constraint Constraint. 5 0 0 7 0 0 0 Graph Theory 6 5 9 7 8
Outline Sudoku optimization, matrices Military Applications optimization, graphs planning flight paths disabling and herding communication in networks Ranking Applications matrices, graphs ranking on the World Wide Web Clustering and Data Mining Applications optimization, matrices, graphs clustering the Enron email dataset clustering on terrorist networks
Sudoku puzzle Sudoku
Sudoku Sudoku puzzle Sudoku matrix 6 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 9 9 9 5 5 5 5 5
Sudoku Sudoku puzzle Sudoku matrix 6 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 9 9 9 5 5 5 5 5 Definition A n n matrix is called a Sudoku matrix if:. n is a perfect square (e.g.,, 9, 6, 5),. every row uses the integers through n exactly once,. every column uses the integers through n exactly once,. every submatrix uses the integers through n exactly once.
Mathematical Model of Sudoku
Mathematical Model of Sudoku Value of the Model With a computer algorithm, we can solve any Sudoku puzzle, regardless of: size n number of givens level of difficulty 9 9 puzzle takes 6.7 seconds to solve on desktop machine.
Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle
Unique Solution? Most puzzle creators do not check whether their puzzle has one unique solution. Puzzle Solution Solution
Some Interesting 9 9 Sudoku Facts How many 9 9 matrices deserve the title of Sudoku matrices? 6,670,90,75,0,07,96,960 6.67 0 What is the fewest number of givens that must be provided to create a 9 9 puzzle with a unique solution? 7; 5,96 distinct puzzles with 7 givens and a unique solution have been found. No unique solution puzzle with 6 givens has been found yet. Given one Sudoku matrix, could I make my own Daily Sudoku Calendar? Puzzle Unique Solution Puzzle Unique Solution By using mathematical operations 6,879 ( 99 years worth of) Sudoku matrices can be created from one 9 9 Sudoku matrix.
Military Applications
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
Flight Path Planning (Lincoln Labs) No-Fly Zone Target Radar Objective: Constraints: Enemy Territory create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
Flight Path Planning No-Fly Zone Target Radar Objective: Constraints: create path that minimizes time over radars. plane must fly over target plane must avoid no-fly zones plane cannot make unrealistic turns plane has fixed amount of fuel etc., etc., etc.
Discretization Flight Path Planning
Flight Path Planning Connect the Dots plane must fly over target plane must avoid no-fly zones plane has fixed amount of fuel (total # path segments D) plane cannot make unrealistic turns
Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00
Flight Path Results 00 80 60 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Distance limit: 50; Path Distance=.988; Cost=0; Total time (sec): 6.5
Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Distance limit: 00; Path Distance=98.975; Cost=0; Total time (sec): 56.8
Flight Path Results 00 90 80 70 60 50 0 0 0 0 0 0 0 0 0 0 50 60 70 80 90 00 Sorry, no feasible path for D=70; Total time (sec): 5.7
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
NSA Enemy Communication Networks Enable pairs Disable pairs 6 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 cutset 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 6 cutset 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 cutset 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
NSA Communication Networks Enable pairs Disable pairs 5 00 5 cutset 6 7 00 8 9 5 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
Multiple enable-disable pairs Enable pairs Disable pairs 6 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs
Herding Problem Enable pairs Disable pairs 6 Monitoring set 5 9 7 8 Objective: Constraints: minimize cost associated with cutting links and nodes enable communication between all O O pairs disable communication between all O O pairs herd all communication over monitored set
Ranking Applications
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
Yahoo hierarchies of sites organized by humans Best Search Techniques word of mouth expert advice the pre-998 Web Overall Feeling of Users Jorge Luis Borges 9 short story, The Library of Babel When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon.... As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.
998... enter Link Analysis Change in User Attitudes about Web Search Today It s not my homepage, but it might as well be. I use it to ego-surf. I use it to read the news. Anytime I want to find out anything, I use it. - Matt Groening, creator and executive producer, The Simpsons I can t imagine life without Google News. Thousands of sources from around the world ensure anyone with an Internet connection can stay informed. The diversity of viewpoints available is staggering. - Michael Powell, chair, Federal Communications Commission Google is my rapid-response research assistant. On the run-up to a deadline, I may use it to check the spelling of a foreign name, to acquire an image of a particular piece of military hardware, to find the exact quote of a public figure, check a stat, translate a phrase, or research the background of a particular corporation. It s the Swiss Army knife of information retrieval. - Garry Trudeau, cartoonist and creator, Doonesbury
the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;. hezbollah: 9; ; 9; 9; 558;. global warming: 78; 980; 55;
PuPstyleBook March, 006 Index k-step transition matrix, 79 a vector, 7, 8, 75, 80 A9, absolute error, 0 absorbing Markov chains, 85 absorbing states, 85 accuracy, 79 80 adaptive PageRank method, 89 90 Adar, Eytan, 6 adjacency list, 77 adjacency matrix,, 76, 6,, 69 advertising, 5 aggregated chain, 97 aggregated chains, 95 aggregated transition matrix, 05 aggregated transition probability, 97 aggregation, 9 97 approximate, 0 0 exact, 0 05 exact vs. approximate, 05 07 iterative, 07 09 partition, 09 aggregation in Markov chains, 97 aggregation theorem, 05 Aitken extrapolation, 9 Alexa traffic ranking, 8 algebraic multiplicity, 57 algorithm PageRank, 0 Aitken extrapolation, 9 dangling node PageRank, 8, 8 HITS, 6 iterative aggregation updating, 08 personalized PageRank power method, 9 quadratic extrapolation, 9 query-independent HITS, α parameter, 7, 8,, 7 8 Amazon s traffic rank, anchor text, 8, 5, 0 Ando, Albert, 0 aperiodic, 6, aperiodic Markov chain, 76 Application Programming Interface (API), 65, 7, 97 approximate aggregation, 0 0 arc, 0 Arrow, Kenneth, 6 asymptotic convergence rate, 65 asymptotic rate of convergence,, 7, 0, 9, 5 Atlas of Cyberspace, 7 authority, 9, 0 authority Markov chain, authority matrix, 7, 0 authority score, 5, 0 authority vector, 0 Babbage, Charles, 75 back button, 8 86 BadRank, Barabasi, Albert-Laszlo, 0 Berry, Michael, 7 bibliometrics,, bipartite undirected graph, BlockRank, 9 97, 0 blog, 55, 6, 0 Boldi, Paolo, 79 Boolean model, 5 6, 0 bounce back, 8 86 bowtie structure, Brezinski, Claude, 9 Brin, Sergey, 5, 05 Browne, Murray, 7 Bush, Vannevar,, 0 Campbell, Lord John, canonical form, reducible matrix, 8 censored chain, 0 censored chains, 9 censored distribution, 0, 95 censored Markov chain, 9 censorship, 6 7 Cesàro sequence, 6 Cesàro summability, stochastic matrix, 8 characteristic polynomial, 0, 56 Chebyshev extrapolation, 9 Chien, Steve, 0 cloaking, clustering search results, co-citation,, 0 co-reference,, 0 Collatz Wielandt formula, 68, 7 complex networks, 0 compressed matrix storage, 76 condition number, 59, 7, 55 Condorcet, 6 connected components, 7,
the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total)
the pre-998 Web. Ranking on the Web border patrol: ; 567; 809; 0;... (8,700,000 in total). hezbollah: 9; ; 9; 9; 558;... (5,00,000 in total). global warming: 78; 980; 55;... (,00,000 in total) too many results per search term
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote Markov chain 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote page is a dangling node 6 5
Ranking with a Random Surfer Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote surfer teleports 6 5
Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages.
Ranking with a Random Surfer If a page is important, it gets lots of votes from other important pages, which means the random surfer visits it often. Simply count the number of times, or proportion of time, the surfer spends on each page to create ranking of webpages. Proportion of Time Page =.0 Page =.05 Page =.0 Page =.8 Page 5 =.0 Page 6 =.9 6 5 Ranked List of Pages Page Page 6 Page 5 Page Page Page
Clustering and Data Mining Applications
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
The Enron Email Dataset (SAS) PRIVATE email collection of 50 Enron employees during 00 9,000 terms and 65,000 messages Term-by-Message Matrix f astow f astow skilling.......... subpoena 0... dynegy 0 0..........
Clustering the Enron Email Dataset
Tracking Enron clusters over time
Visualizing Clusters in the Enron Dataset
Outline Sudoku Military Applications planning flight paths disabling and herding communication in networks Ranking Applications ranking on the World Wide Web Clustering and Data Mining Applications clustering the Enron email dataset clustering on terrorist networks
Data Mining on Terrorist Networks locating most important terrorists clustering terrorists identifying central nodes in terrorist network
Terrorist Network
Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and
Mathematics is useful. Conclusions To isolate mathematics from the practical demands of the sciences is to invite the sterility of a cow shut away from the bulls. P. L. Chebychev Mathematics is a more powerful instrument of knowledge than any other that has been bequeathed to us by human agency. Descartes Mathematical models scale well. radars vs. 00 radars: the mathematical model doesn t care. Mathematical models are broadly applicable. Same mathematical techniques solve Sudoku, flight route, clustering problems. There is no branch of mathematics, however abstract, which may not someday be applied to the phenomena of the real world. N. Lobachevsky Mathematical research is an inventive process, which takes time, and T ime = Money