Social Network Analysis

Similar documents
Information Retrieval. Lecture 11 - Link analysis

PageRank and related algorithms

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Link Analysis and Web Search

COMP 4601 Hubs and Authorities

CS-C Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web. Jaakko Hollmén, Department of Computer Science

COMP5331: Knowledge Discovery and Data Mining

Social Network Analysis

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

CSI 445/660 Part 10 (Link Analysis and Web Search)

Einführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Information Retrieval and Web Search

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Web Structure Mining using Link Analysis Algorithms

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

Extracting Information from Complex Networks

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

CS6200 Information Retreival. The WebGraph. July 13, 2015

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

Information Retrieval and Web Search Engines

Lecture 17 November 7

Mining Web Data. Lijun Zhang

Degree Distribution: The case of Citation Networks

Generalized Social Networks. Social Networks and Ranking. How use links to improve information search? Hypertext

Graph Theory Problem Ideas

Authoritative Sources in a Hyperlinked Environment

Lecture 8: Linkage algorithms and web search

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

Experimental study of Web Page Ranking Algorithms

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Information Networks: Hubs and Authorities

Algebraic Graph Theory- Adjacency Matrix and Spectrum

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Data mining --- mining graphs

The application of Randomized HITS algorithm in the fund trading network

Lecture 3: Descriptive Statistics

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

Lecture 27: Learning from relational data

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

TODAY S LECTURE HYPERTEXT AND

How to organize the Web?

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Ranking of nodes of networks taking into account the power function of its weight of connections

Motivation. Motivation

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

Introduction aux Systèmes Collaboratifs Multi-Agents

Mining Social Network Graphs

Complex-Network Modelling and Inference

Recent Researches on Web Page Ranking

A Reordering for the PageRank problem

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Link-based Object Classification (LOC) Link-based Object Ranking (LOR)... 79

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff Dr Ahmed Rafea

A Review Paper on Page Ranking Algorithms

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Unit VIII. Chapter 9. Link Analysis

Hyperlink-Induced Topic Search (HITS) over Wikipedia Articles using Apache Spark

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

16 - Networks and PageRank

Inf 496/596 Topics in Informatics: Analysis of Social Network Data

5/30/2014. Acknowledgement. In this segment: Search Engine Architecture. Collecting Text. System Architecture. Web Information Retrieval

Information Networks: PageRank

Collaborative filtering based on a random walk model on a graph

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

Graph and Web Mining - Motivation, Applications and Algorithms PROF. EHUD GUDES DEPARTMENT OF COMPUTER SCIENCE BEN-GURION UNIVERSITY, ISRAEL

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

Centralities (4) Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California Excellence Through Knowledge

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

SGN (4 cr) Chapter 11

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

Social-Network Graphs

Part 1: Link Analysis & Page Rank

Bibliometrics: Citation Analysis

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Social Network Analysis

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

F. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google

Special-topic lecture bioinformatics: Mathematics of Biological Networks

Algorithms, Games, and Networks February 21, Lecture 12

Transcription:

Social Network Analysis Giri Iyengar Cornell University gi43@cornell.edu March 14, 2018 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 1 / 24

Overview 1 Social Networks 2 HITS 3 Page Rank Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 2 / 24

Overview 1 Social Networks 2 HITS 3 Page Rank Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 3 / 24

Social Networks Facebook LinkedIn Twitter Instagram Snapchat WhatsApp Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 4 / 24

Social Networks: Some common questions asked Bridge nodes Node Centrality Between-ness Closeness Hierarchy Density Network open-ness Figure: By Claudio Rocchini Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 5 / 24

Other types of networks Co-Authorship (undirected) Citation (directed) Disorder - Gene associations (directed) Protein - Protein interaction networks (undirected) Disease networks (directed) Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 6 / 24

Representing Networks by Matrices Adjacency Matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. If the graph is undirected, the adjacency matrix is symmetric Figure: from Wikipedia Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 7 / 24

Representing Networks by Matrices Degree Matrix In the mathematical field of graph theory the degree matrix is a diagonal matrix which contains information about the degree of each vertex. That is, the number of edges attached to each vertex. It is used together with the adjacency matrix to construct the Laplacian matrix of a graph. Figure: from Wikipedia Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 8 / 24

Closeness Centrality Closeness of a Vertex Closeness of a vertex is defined as the inverse of the distance between that 1 vertex and all other vertices in a Graph: C(x) = d(x,y). In directed y graphs, a more central node has more links pointing to it. Extension to weakly connected graphs C(x) = y x 1 d(x,y) We can also replace the Arithmetic mean by Harmonic mean above to get Harmonic Centrality Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 9 / 24

Degree Centrality Degree Centrality of a Vertex C(x) = deg(x) where deg(x) is either incoming (popularity) or outgoing (gregariousness) degree Degree Centrality of a whole graph Find the most central node, (v ) For any graph of same number of vertices, what is the maximum possible H = max y C(y ) C(y) This happens when you have a star-connected graph. One central node and all other nodes connected to it Compute C(G) = v C(v ) C(v) H Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 10 / 24

Overview 1 Social Networks 2 HITS 3 Page Rank Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 11 / 24

HITS: A Measure of node centrality For each node, give it two scores Hub score Authority score Invented by Jon Kleinberg, Cornell University Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 12 / 24

HITS Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that they held, but were used as compilations of a broad catalog of information that led users direct to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. Source: Wikipedia Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 13 / 24

HITS Intuition Journals such as Science and Nature receive a lot of citations and are therefore considered high impact. Consider two relatively obscure journals that receive roughly the same number of citations. The one that has more citations from Science or Nature should be more authoritative. Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 14 / 24

HITS Intuition In the Internet, the number of incoming links represents roughly how important a site is. However, even if it has only a few links coming in, if they are coming from Google, then it probably is more important. Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 15 / 24

HITS: Algorithm Sketch Perform a query and get a root set Expand the root set into a base set by collecting all pages pointed by this set and some of the links that point to this set Figure: https://commons.wikimedia.org/wiki/file:setsen.jpg Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 16 / 24

HITS: Algorithm Sketch On this focused sub-graph, the HITS computation is performed. Initialize all scores to 1. Authority Update: Update each node s Authority score to be equal to the sum of the Hub Scores of each node that points to it. That is, a node is given a high authority score by being linked from pages that are recognized as Hubs for information. Hub Update: Update each node s Hub Score to be equal to the sum of the Authority Scores of each node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject. Normalize the values and repeat as necessary Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 17 / 24

Overview 1 Social Networks 2 HITS 3 Page Rank Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 18 / 24

Page Rank Page Rank PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. It is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a webpage to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites [1]. Rank of a URL r(p i ) = r(p j ) P P j B j Pi Where B Pi is the set of pages pointing into P i and P i is the number of outlinks from P i. (1) Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 19 / 24

Page Rank Figure: Toy Example with 6 URLs. Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 20 / 24

Page Rank for Toy Universe Table 1 shows the PageRank results after first two iterations of the calculation. Table: First few iterations of PageRank calculations on a tiny web with 6 Urls Iteration 0 Iteration 1 Iteration 2 Rank at Iteration 2 r 0 (P 1 ) = 1 6 r 1 (P 1 ) = 1 24 r 2 (P 1 ) = 1 12 6 r 0 (P 2 ) = 1 6 r 1 (P 2 ) = 1 8 r 2 (P 2 ) = 5 48 5 r 0 (P 3 ) = 1 6 r 1 (P 3 ) = 1 3 r 2 (P 3 ) = 1 4 1 r 0 (P 4 ) = 1 6 r 1 (P 4 ) = 1 6 r 2 (P 4 ) = 1 6 3 r 0 (P 5 ) = 1 6 r 1 (P 5 ) = 1 8 r 2 (P 5 ) = 1 6 4 r 0 (P 6 ) = 1 6 r 1 (P 6 ) = 5 24 r 2 (P 6 ) = 11 48 2 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 21 / 24

Page Rank Iterative calculation of Page Rank If we define a matrix A such that each element A ji = 1 P i if there is a link from page i to page j and 0 otherwise. Then, we get r i+1 = A r i. Let r be the final probability vector. We want r such that r = A r Random browser extension - to make things work What happens if the graph is disconnected? Or there are some one-way paths that prevent you from coming back? Let s assume that most of the time the user will follow one of the links in the page but sometimes, they get bored and start at a new URL somewhere. This makes the matrix A well-behaved. Removes all the zeros. Updates the matrix to B = 0.85 A + 0.15 n Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 22 / 24

Page Rank: Perron Frobenius Theorem Perron Frobenius Theorem states Any square matrix A with positive entries has a unique eigenvector with positive entries (up to a multiplication by a positive scalar), and the corresponding eigenvalue has multiplicity one and is strictly greater than the absolute value of any other eigenvalue. Markov matrices If all entries of a Markov matrix A are positive then A has a unique equilibrium: there is only one eigenvalue 1. All other eigenvalues are smaller than 1. Page Rank: The eigenvector corresponding to the eigenvalue of 1. Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 23 / 24

Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 24 / 24