Lecture 27: Learning from relational data

Size: px
Start display at page:

Download "Lecture 27: Learning from relational data"

Transcription

1 Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, / 12

2 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission to earn credit. 2 / 12

3 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission to earn credit. In order to earn credit for the Kaggle competition, submit Homework 8 by this Thursday. 2 / 12

4 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last names A-L: Skilling Auditorium (here) Last names M-S: Gates B03 Last names T-Z: Thornton / 12

5 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last names A-L: Skilling Auditorium (here) Last names M-S: Gates B03 Last names T-Z: Thornton 201. SCPD students: The exam will be sent out early Tuesday (12/12) morning, and your proctor must return it by Wednesday December 13 at 2pm PST. If you wish to take the exam at Stanford with the rest of the class, you must tell SCPD (not me) ahead of time. Then pick one of the three class rooms above to take the exam. 3 / 12

6 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last names A-L: Skilling Auditorium (here) Last names M-S: Gates B03 Last names T-Z: Thornton 201. SCPD students: The exam will be sent out early Tuesday (12/12) morning, and your proctor must return it by Wednesday December 13 at 2pm PST. If you wish to take the exam at Stanford with the rest of the class, you must tell SCPD (not me) ahead of time. Then pick one of the three class rooms above to take the exam. Closed book, closed notes. No calculators or computers. 3 / 12

7 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last names A-L: Skilling Auditorium (here) Last names M-S: Gates B03 Last names T-Z: Thornton 201. SCPD students: The exam will be sent out early Tuesday (12/12) morning, and your proctor must return it by Wednesday December 13 at 2pm PST. If you wish to take the exam at Stanford with the rest of the class, you must tell SCPD (not me) ahead of time. Then pick one of the three class rooms above to take the exam. Closed book, closed notes. No calculators or computers. Expect a final that is similar to the midterm. Practice problems on the website. 3 / 12

8 Announcements The class this Wednesday will be a review. There is no class Friday. Instructor and TAs will have their usual office hours. Instructor is out of town Friday Monday. 4 / 12

9 Relational data The observations have the form of a graph. Examples. 5 / 12

10 Relational data The observations have the form of a graph. Examples. Links between websites. 5 / 12

11 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. 5 / 12

12 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. Transmission networks for contagious diseases. 5 / 12

13 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. Transmission networks for contagious diseases. The links can be directed or undirected. 5 / 12

14 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. Transmission networks for contagious diseases. The links can be directed or undirected. There can be different types of link (friend, follower, followed). 5 / 12

15 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. Transmission networks for contagious diseases. The links can be directed or undirected. There can be different types of link (friend, follower, followed). We can observe the graph in time (how do social networks grow?). 5 / 12

16 Relational data The observations have the form of a graph. Examples. Links between websites. Relationships between accounts in social networks. Transmission networks for contagious diseases. The links can be directed or undirected. There can be different types of link (friend, follower, followed). We can observe the graph in time (how do social networks grow?). Each vertex can have additional features or metadata. 5 / 12

17 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. 6 / 12

18 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. Uses a graph of links between websites to rank websites by importance. 6 / 12

19 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. Uses a graph of links between websites to rank websites by importance. Motivation: Consider the problem of searching the web using the query "birth control". 6 / 12

20 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. Uses a graph of links between websites to rank websites by importance. Motivation: Consider the problem of searching the web using the query "birth control". There are millions of pages containing the term. 6 / 12

21 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. Uses a graph of links between websites to rank websites by importance. Motivation: Consider the problem of searching the web using the query "birth control". There are millions of pages containing the term. Analyzing the content of each website semantically to infer which one is more likely to interest the user is very expensive. 6 / 12

22 PageRank algorithm Invented by Sergei Brin and Larry Page of Google. Uses a graph of links between websites to rank websites by importance. Motivation: Consider the problem of searching the web using the query "birth control". There are millions of pages containing the term. Analyzing the content of each website semantically to infer which one is more likely to interest the user is very expensive. We need a way to rank websites, to filter out all those that are rarely visited. This information is given by links. 6 / 12

23 PageRank algorithm Consider a hypothetical surfer who is jumping from website to website by clicking on random links. 7 / 12

24 PageRank algorithm Consider a hypothetical surfer who is jumping from website to website by clicking on random links. Intuitively, the websites that are visited more frequently can be considered more important in the network of links. 7 / 12

25 PageRank algorithm Consider a hypothetical surfer who is jumping from website to website by clicking on random links. Intuitively, the websites that are visited more frequently can be considered more important in the network of links. Will the surfer visit every website eventually? 7 / 12

26 PageRank algorithm Consider a hypothetical surfer who is jumping from website to website by clicking on random links. Intuitively, the websites that are visited more frequently can be considered more important in the network of links. Will the surfer visit every website eventually? No. It is possible to get stuck in a website with no outgoing links, or to be stuck in a loop between two websites, for example. 7 / 12

27 PageRank algorithm Consider a hypothetical surfer who is jumping from website to website by clicking on random links. Intuitively, the websites that are visited more frequently can be considered more important in the network of links. Will the surfer visit every website eventually? No. It is possible to get stuck in a website with no outgoing links, or to be stuck in a loop between two websites, for example. To avoid this problem, we modify the random walk, such that at every step, with probability 1 q, we pick a website at random, and with probability q we go through one of the links in the current website at random. 7 / 12

28 PageRank algorithm The surfer s random walk is a Markov chain on the set of websites. 8 / 12

29 PageRank algorithm The surfer s random walk is a Markov chain on the set of websites. It is a fact that the frequency with which the surfer visits any website converges to some limit. 8 / 12

30 PageRank algorithm The surfer s random walk is a Markov chain on the set of websites. It is a fact that the frequency with which the surfer visits any website converges to some limit. The PageRank of a website is this limiting frequency. 8 / 12

31 PageRank algorithm Let P ij be the probability of jumping from website i to website j, then P ij = (1 q) 1 [ ] # of links from i to j n + q # of links out of i 9 / 12

32 PageRank algorithm Let P ij be the probability of jumping from website i to website j, then P ij = (1 q) 1 [ ] # of links from i to j n + q # of links out of i The limiting frequency of website j, π j, must satisfy π j = n π i P ij i=1 or in matrix notation π = πp. 9 / 12

33 PageRank algorithm Let P ij be the probability of jumping from website i to website j, then P ij = (1 q) 1 [ ] # of links from i to j n + q # of links out of i The limiting frequency of website j, π j, must satisfy π j = n π i P ij i=1 or in matrix notation π = πp. That is, π is an eigenvector of the transition probability matrix P with eigenvalue 1. 9 / 12

34 Finding the limiting frequencies π In principle, finding the limiting frequencies could require solving the eigendecomposition of a matrix P which is n n, and this has a complexity which grows as n / 12

35 Finding the limiting frequencies π In principle, finding the limiting frequencies could require solving the eigendecomposition of a matrix P which is n n, and this has a complexity which grows as n 3. However, it is possible to compute π by starting with the approximation π (0) = (1/n,..., 1/n), and iterating: π (t) = π (t 1) P. The number of iterations necessary for convergence is typically small. 10 / 12

36 Finding the limiting frequencies π In principle, finding the limiting frequencies could require solving the eigendecomposition of a matrix P which is n n, and this has a complexity which grows as n 3. However, it is possible to compute π by starting with the approximation π (0) = (1/n,..., 1/n), and iterating: π (t) = π (t 1) P. The number of iterations necessary for convergence is typically small. The matrix-vector multiplication in each iteration can be sped up using sparse matrix techniques. 10 / 12

37 How can PageRank be used in web search? One idea: 1. Find all websites that contain all query terms. 2. Display them in order of their PageRank. 11 / 12

38 How can PageRank be used in web search? One idea: 1. Find all websites that contain all query terms. 2. Display them in order of their PageRank. A more likely approach: 1. Use PageRank to select the 10,000 most important pages which contain the query terms. 2. Rank these 10,000 pages by analyzing their content, integrating information about the user, etc. 11 / 12

39 Working with graphs in R and Python The package igraph implements a lot of utilities for analyzing graphs in R and Python. It has a function page.rank, among many others. 12 / 12

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Synonyms. Hostile. Chilly. Direct. Sharp

Synonyms. Hostile. Chilly. Direct. Sharp Graphs and Networks A Social Network Synonyms Hostile Slick Icy Direct Nifty Cool Abrupt Sharp Composed Chilly Chemical Bonds http://4.bp.blogspot.com/-xctbj8lkhqa/tjm0bonwbri/aaaaaaaaak4/-mhrbauohhg/s600/ethanol.gif

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Synonyms. Hostile. Chilly. Direct. Sharp

Synonyms. Hostile. Chilly. Direct. Sharp Graphs and Networks A Social Network Synonyms Hostile Slick Icy Direct Nifty Cool Abrupt Sharp Composed Chilly Chemical Bonds http://4.bp.blogspot.com/-xctbj8lkhqa/tjm0bonwbri/aaaaaaaaak4/-mhrbauohhg/s600/ethanol.gif

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

Motivation. Motivation

Motivation. Motivation COMS11 Motivation PageRank Department of Computer Science, University of Bristol Bristol, UK 1 November 1 The World-Wide Web was invented by Tim Berners-Lee circa 1991. By the late 199s, the amount of

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

A brief history of Google

A brief history of Google the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page

More information

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) ' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times

More information

Announcements. Assignment 6 due right now. Assignment 7 (FacePamphlet) out, due next Friday, March 16 at 3:15PM

Announcements. Assignment 6 due right now. Assignment 7 (FacePamphlet) out, due next Friday, March 16 at 3:15PM Graphs and Networks Announcements Assignment 6 due right now. Assignment 7 (FacePamphlet) out, due next Friday, March 6 at 3:5PM Build your own social network! No late days may be used. YEAH hours tonight

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016 CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:

More information

COMP Page Rank

COMP Page Rank COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper

More information

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants

More information

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 21: Link Analysis Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-06-18 1/80 Overview

More information

Information Networks: PageRank

Information Networks: PageRank Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

Course and Contact Information. Course Description. Course Objectives

Course and Contact Information. Course Description. Course Objectives San Jose State University College of Science Department of Computer Science CS157A, Introduction to Database Management Systems, Sections 1 and 2, Fall2016 Course and Contact Information Instructor: Dr.

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Introduction

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Introduction STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Introduction You = 2 General Reasoning Scheme Data Conclusion 1010..00 0101..00 1010..11... Observations 3 General Reasoning Scheme

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

CMPS 182: Introduction to Database Management Systems. Instructor: David Martin TA: Avi Kaushik. Syllabus

CMPS 182: Introduction to Database Management Systems. Instructor: David Martin TA: Avi Kaushik. Syllabus CMPS 182: Introduction to Database Management Systems Instructor: David Martin TA: Avi Kaushik Syllabus Course Content Relational database features & operations Data definition, queries and update Indexes,

More information

College Algebra. Cartesian Coordinates and Graphs. Dr. Nguyen August 22, Department of Mathematics UK

College Algebra. Cartesian Coordinates and Graphs. Dr. Nguyen August 22, Department of Mathematics UK College Algebra Cartesian Coordinates and Graphs Dr. Nguyen nicholas.nguyen@uky.edu Department of Mathematics UK August 22, 2018 Agenda Welcome x and y-coordinates in the Cartesian plane Graphs and solutions

More information

Operating Systems, Spring 2015 Course Syllabus

Operating Systems, Spring 2015 Course Syllabus Operating Systems, Spring 2015 Course Syllabus Instructor: Dr. Rafael Ubal Email: ubal@ece.neu.edu Office: 140 The Fenway, 3rd floor (see detailed directions below) Phone: 617-373-3895 Office hours: Wednesday

More information

CSE 114, Computer Science 1 Course Information. Spring 2017 Stony Brook University Instructor: Dr. Paul Fodor

CSE 114, Computer Science 1 Course Information. Spring 2017 Stony Brook University Instructor: Dr. Paul Fodor CSE 114, Computer Science 1 Course Information Spring 2017 Stony Brook University Instructor: Dr. Paul Fodor http://www.cs.stonybrook.edu/~cse114 Course Description Procedural and object-oriented programming

More information

COMP 401 COURSE OVERVIEW

COMP 401 COURSE OVERVIEW COMP 401 COURSE OVERVIEW Instructor: Prasun Dewan (FB 150, help401@cs.unc.edu) Course page: http://www.cs.unc.edu/~dewan/comp401/current/ COURSE PAGE Linked from my home page (google my name to find it)

More information

Information Retrieval. Lecture 11 - Link analysis

Information Retrieval. Lecture 11 - Link analysis Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks

More information

Introduction to Information Technology ITP 101x (4 Units)

Introduction to Information Technology ITP 101x (4 Units) Objective Concepts Introduction to Information Technology ITP 101x (4 Units) Upon completing this course, students will: - Understand the fundamentals of information technology - Learn core concepts of

More information

Information Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University

Information Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University Information Retrieval CS6200 Jesse Anderton College of Computer and Information Science Northeastern University What is Information Retrieval? You have a collection of documents Books, web pages, journal

More information

Disability Resources and Educational Services (DRES)

Disability Resources and Educational Services (DRES) Disability Resources and Educational Services (DRES) Figure 1- Image of the instructor Log In - Find User Screen Student Access and Accommodation System (SAAS 2.0) Faculty Step-By-Step Procedures INTRODUCTION

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun yzsun@ccs.neu.edu November 12, 2013 Announcement Homework 4 will be out tonight Due on 12/2 Next class will be canceled

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Pagerank Scoring. Imagine a browser doing a random walk on web pages: Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably

More information

CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

CS-C Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web. Jaakko Hollmén, Department of Computer Science

CS-C Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web. Jaakko Hollmén, Department of Computer Science CS-C3160 - Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web Jaakko Hollmén, Department of Computer Science 30.10.2017-18.12.2017 1 Contents of this chapter Story

More information

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 1, Ver. III (Jan.-Feb. 2017), PP 01-07 www.iosrjournals.org PageRank Algorithm Albi Dode 1, Silvester

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Giri Iyengar Cornell University gi43@cornell.edu March 14, 2018 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 1 / 24 Overview 1 Social Networks 2 HITS 3 Page

More information

CSci 4211: Introduction to Computer Networks. Time: Monday and Wednesday 2:30 to 3:45 pm Location: Smith Hall 231 Fall 2018, 3 Credits

CSci 4211: Introduction to Computer Networks. Time: Monday and Wednesday 2:30 to 3:45 pm Location: Smith Hall 231 Fall 2018, 3 Credits CSci 4211: Introduction to Computer Networks Time: Monday and Wednesday 2:30 to 3:45 pm Location: Smith Hall 231 Fall 2018, 3 Credits 1 Instructor David Hung-Chang Du Email: du@cs.umn.edu Office: Keller

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Parameters and Objects

Parameters and Objects Parameters and Objects CS + ENGLISH Enrich your computer science skills with the understanding of human experiences, critical thinking, and creativity taught in English. More info: english.stanford.edu/csenglish

More information

Brief (non-technical) history

Brief (non-technical) history Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University

More information

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1,2 and 3, Spring 2017

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1,2 and 3, Spring 2017 San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1,2 and 3, Spring 2017 Course and Contact Information Instructor: Dr. Kim Office Location:

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Lecture 9 Decompositions of Graphs Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

CLOVIS WEST DIRECTIVE STUDIES P.E INFORMATION SHEET

CLOVIS WEST DIRECTIVE STUDIES P.E INFORMATION SHEET CLOVIS WEST DIRECTIVE STUDIES P.E. 2018-19 INFORMATION SHEET INSTRUCTORS: Peggy Rigby peggyrigby@cusd.com 327-2104. Vance Walberg vancewalberg@cusd.com 327-2098 PURPOSE: Clovis West High School offers

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Introduction to CS 4604

Introduction to CS 4604 Introduction to CS 4604 T. M. Murali August 23, 2010 Course Information Instructor T. M. Murali, 2160B Torgerson, 231-8534, murali@cs.vt.edu Office Hours: 9:30am 11:30am Mondays and Wednesdays Teaching

More information

PageRank and related algorithms

PageRank and related algorithms PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic

More information

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1, 2, and 3, Spring 2018

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1, 2, and 3, Spring 2018 San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1, 2, and 3, Spring 2018 Course and Contact Information Instructor: Suneuy Kim Office

More information

ENGR 105: Introduction to Scientific Computing. Dr. Graham. E. Wabiszewski

ENGR 105: Introduction to Scientific Computing. Dr. Graham. E. Wabiszewski ENGR 105: Introduction to Scientific Computing Machine Model, Matlab Interface, Built-in Functions, and Arrays Dr. Graham. E. Wabiszewski ENGR 105 Lecture 02 Answers to questions from last lecture Office

More information

CS 245: Database System Principles

CS 245: Database System Principles CS 245: Database System Principles Notes 01: Introduction Peter Bailis CS 245 Notes 1 1 This course pioneered by Hector Garcia-Molina All credit due to Hector All mistakes due to Peter CS 245 Notes 1 2

More information

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

Database Systems: Concepts, design, and implementation ISE 382 (3 Units) Database Systems: Concepts, design, and implementation ISE 382 (3 Units) Spring 2013 Description Obectives Instructor Contact Information Office Hours Concepts in modeling data for industry applications.

More information

INF 315E Introduction to Databases School of Information Fall 2015

INF 315E Introduction to Databases School of Information Fall 2015 INF 315E Introduction to Databases School of Information Fall 2015 Class Hours: Tuesday & Thursday10:30 am-12:00 pm Instructor: Eunyoung Moon Email: eymoon@utexas.edu Course Description Almost every website

More information

Lecture 17: Hash Tables, Maps, Finish Linked Lists

Lecture 17: Hash Tables, Maps, Finish Linked Lists ....... \ \ \ / / / / \ \ \ \ / \ / \ \ \ V /,----' / ^ \ \.--..--. / ^ \ `--- ----` / ^ \. ` > < / /_\ \. ` / /_\ \ / /_\ \ `--' \ /. \ `----. / \ \ '--' '--' / \ / \ \ / \ / / \ \ (_ ) \ (_ ) / / \ \

More information

Announcements. 1. Forms to return today after class:

Announcements. 1. Forms to return today after class: Announcements Handouts (3) to pick up 1. Forms to return today after class: Pretest (take during class later) Laptop information form (fill out during class later) Academic honesty form (must sign) 2.

More information

61A Lecture 7. Monday, September 15

61A Lecture 7. Monday, September 15 61A Lecture 7 Monday, September 15 Announcements Homework 2 due Monday 9/15 at 11:59pm Project 1 deadline extended, due Thursday 9/18 at 11:59pm! Extra credit point if you submit by Wednesday 9/17 at 11:59pm

More information

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5) INTRODUCTION TO DATA SCIENCE Link Analysis (MMDS5) Introduction Motivation: accurate web search Spammers: want you to land on their pages Google s PageRank and variants TrustRank Hubs and Authorities (HITS)

More information

Wastewater Treatment Level 1 Operator Training and Certification Exam

Wastewater Treatment Level 1 Operator Training and Certification Exam Wastewater Treatment Level 1 Operator Training and Certification Exam July 10 14, 2017 Hosted by: Shakopee Mdewakanton Sioux Community Mystic Lake Casino Hotel Includes: FREE Training for Tribal Wastewater

More information

Advanced Computer Architecture: A Google Search Engine

Advanced Computer Architecture: A Google Search Engine Advanced Computer Architecture: A Google Search Engine Jeremy Bradley Room 372. Office hour - Thursdays at 3pm. Email: jb@doc.ic.ac.uk Course notes: http://www.doc.ic.ac.uk/ jb/ Department of Computing,

More information

Extegrity Exam4 Take Home Exam Guide

Extegrity Exam4 Take Home Exam Guide Extegrity Exam4 Take Home Exam Guide IMPORTANT NOTE: While students may complete a take-home exam any time during the designated time as set by each faculty member, technology support is only available

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

Linear Algebra Math 203 section 003 Fall 2018

Linear Algebra Math 203 section 003 Fall 2018 Linear Algebra Math 203 section 003 Fall 2018 Mondays and Wednesdays from 7:20 pm to 8:35 pm, in Planetary Hall room 131. Instructor: Dr. Keith Fox Email: kfox@gmu.edu Office: Exploratory Hall Room 4405.

More information

Collaborative filtering based on a random walk model on a graph

Collaborative filtering based on a random walk model on a graph Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:

More information

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1 Graphs and Networks Page /7: Lecture 8. Clustering Heuristics Wednesday, November 8, 26 8:49 AM Today we will talk about clustering and partitioning in graphs, and sometimes in data sets. Partitioning

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Course and Contact Information. Course Description. Course Objectives

Course and Contact Information. Course Description. Course Objectives San Jose State University College of Science Department of Computer Science CS157A, Introduction to Database Management Systems, Sections 1 and 2, Fall2017 Course and Contact Information Instructor: Dr.

More information

Introduction to Data Structures

Introduction to Data Structures 15-121 Introduction to Data Structures Lecture #1 Introduction 28 August 2019 Margaret Reid-Miller Today Course Administration Overview of Course A (very basic) Java introduction Course website: www.cs.cmu.edu/~mrmiller/15-121

More information

Lab Preparing for the ICND1 Exam

Lab Preparing for the ICND1 Exam Lab 9.6.6.1 Preparing for the ICND1 Exam Objectives: Determine what activities you can give up or cut back to make time to prepare for the exam. Create a schedule to guide your exam preparation. Schedule

More information

ESET 349 Microcontroller Architecture, Fall 2018

ESET 349 Microcontroller Architecture, Fall 2018 ESET 349 Microcontroller Architecture, Fall 2018 Syllabus Contact Information: Professor: Dr. Byul Hur Office: 008 Fermier Telephone: (979) 845-5195 FAX: E-mail: byulmail@tamu.edu Web: rftestgroup.tamu.edu

More information

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

In this course, you need to use Pearson etext. Go to Pearson etext and Video Notes. **Disclaimer** This syllabus is to be used as a guideline only. The information provided is a summary of topics to be covered in the class. Information contained in this document such as assignments, grading

More information

COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang

COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era Spring 2016 Instructor: Grace Hui Yang The Web provides abundant information which allows us to live more conveniently and

More information

Graph and Link Mining

Graph and Link Mining Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }

More information

Math 152: Applicable Mathematics and Computing

Math 152: Applicable Mathematics and Computing Math 152: Applicable Mathematics and Computing April 10, 2017 April 10, 2017 1 / 12 Announcements Don t forget, first homework is due on Wednesday. Each TA has their own drop-box. Please provide justification

More information

Change in Schedule. I'm changing the schedule around a bit...

Change in Schedule. I'm changing the schedule around a bit... Graphs Announcements() { Change in Schedule I'm changing the schedule around a bit... Today: Graphs Tuesday: Shortest Path Algorithms Wednesday: Minimum Spanning Trees Tursday: Review Session for Midterm

More information

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

DSCI 575: Advanced Machine Learning. PageRank Winter 2018 DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on

More information

ECE573 Introduction to Compilers & Translators

ECE573 Introduction to Compilers & Translators ECE573 Introduction to Compilers & Translators Tentative Syllabus Fall 2005 Tu/Th 9:00-10:15 AM, EE 115 Instructor Prof. R. Eigenmann Tel 49-41741 Email eigenman@ecn Office EE334C Office Hours Tu 10:15-11:30

More information

Course and Contact Information. Catalog Description. Course Objectives

Course and Contact Information. Catalog Description. Course Objectives San Jose State University College of Science Department of Computer Science CS157A, Introduction to Database Management Systems, Sections 1 and 2, Fall2015 Course and Contact Information Instructor: Dr.

More information

CSCI 4250/6250 Fall 2013 Computer and Network Security. Instructor: Prof. Roberto Perdisci

CSCI 4250/6250 Fall 2013 Computer and Network Security. Instructor: Prof. Roberto Perdisci CSCI 4250/6250 Fall 2013 Computer and Network Security Instructor: Prof. Roberto Perdisci perdisci@cs.uga.edu CSCI 4250/6250 What is the purpose of this course? Combined Undergrad/Graduate Intro to Computer

More information

AIMS FREQUENTLY ASKED QUESTIONS: STUDENTS

AIMS FREQUENTLY ASKED QUESTIONS: STUDENTS AIMS FREQUENTLY ASKED QUESTIONS: STUDENTS CONTENTS Login Difficulties- Timed Out... 2 What it Looks Like... 2 What s Actually Happening... 2 Solution... 2 Login Difficulties - Expired... 3 What It Looks

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 281: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Intro to Verilog CprE 281: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev

More information

Practice Midterm Examination #1

Practice Midterm Examination #1 Eric Roberts Handout #35 CS106A May 2, 2012 Practice Midterm Examination #1 Review session: Sunday, May 6, 7:00 9:00 P.M., Hewlett 200 Midterm exams: Tuesday, May 8, 9:00 11:00 A.M., CEMEX Auditorium Tuesday,

More information

Pearson Edexcel Award

Pearson Edexcel Award Pearson Edexcel Award May June 2018 Examination Timetable FINAL For more information on Edexcel qualifications please visit http://qualifications.pearson.com v3 Pearson Edexcel Award 2018 Examination View

More information

CSE 504: Compiler Design

CSE 504: Compiler Design http://xkcd.com/303/ Compiler Design Course Organization CSE 504 1 / 20 CSE 504: Compiler Design http://www.cs.stonybrook.edu/~cse504/ Mon., Wed. 2:30pm 3:50pm Harriman Hall 116 C. R. Ramakrishnan e-mail:

More information

Intermediate Programming Section 03 Introduction. Department of Computer Science Johns Hopkins University. Course Overview.

Intermediate Programming Section 03 Introduction. Department of Computer Science Johns Hopkins University. Course Overview. Intermediate Programming 601.220 Section 03 Introduction Department of Computer Science Johns Hopkins University 1 Course Overview Week 1 http://www.dsn.jhu.edu/courses/cs220/ cs220-help@dsn.jhu.edu 2

More information

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network

More information

CS60092: Informa0on Retrieval

CS60092: Informa0on Retrieval Introduc)on to CS60092: Informa0on Retrieval Sourangshu Bha1acharya Today s lecture hypertext and links We look beyond the content of documents We begin to look at the hyperlinks between them Address ques)ons

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

EECS 395/495: Web Information Retrieval and Extraction. Fall 2012

EECS 395/495: Web Information Retrieval and Extraction. Fall 2012 EECS 395/495: Web Information Retrieval and Extraction Fall 2012 Outline Introductions Course goals and logistics Why take this class? What is Web Search? What will it be in five years? Ten years? Introductions

More information

Course Syllabus. Course Information

Course Syllabus. Course Information Course Syllabus Course Information Course: MIS 6326 Data Management Term: Fall 2015 Section: 002 Meets: Monday and Wednesday 2:30 pm to 3:45 pm JSOM 11.210 Professor Contact Information Instructor: Email:

More information