Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Similar documents
SI Networks: Theory and Application, Fall 2008

SI Networks: Theory and Application, Fall 2008

Network Representa.on and Descrip.on

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Social Network Analysis With igraph & R. Ofrit Lesser December 11 th, 2014

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

Networks in economics and finance. Lecture 1 - Measuring networks

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

CS224W: Analysis of Networks Jure Leskovec, Stanford University

TELCOM2125: Network Science and Analysis

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

CSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks

CAIM: Cerca i Anàlisi d Informació Massiva

Complex-Network Modelling and Inference

Graph Theory for Network Science

Big Data Analytics CSCI 4030

Constructing a G(N, p) Network

Random Simplicial Complexes

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Extracting Information from Complex Networks

Introduction to Network Analysis. Some materials adapted from Lada Adamic, UMichigan

Constructing a G(N, p) Network

Centrality. Peter Hoff. 567 Statistical analysis of social networks. Statistics, University of Washington 1/36

Social Network Analysis

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Basic Network Concepts

Erdős-Rényi Model for network formation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

A brief history of Google

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Introduction to network metrics

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Machine Learning and Modeling for Social Networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University


Graph Theory for Network Science

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

SI Networks: Theory and Application, Fall 2008

Lecture 5: Graphs & their Representation

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

Copyright 2008, Lada Adamic. School of Information University of Michigan

Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS

Mining Social Network Graphs

Information Networks: PageRank

Math/Stat 2300 Modeling using Graph Theory (March 23/25) from text A First Course in Mathematical Modeling, Giordano, Fox, Horton, Weir, 2009.

Big Data Analytics CSCI 4030

1 More configuration model

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Structure of Social Networks

Topic II: Graph Mining

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

16 - Networks and PageRank

Topic mash II: assortativity, resilience, link prediction CS224W

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Week 5 Video 5. Relationship Mining Network Analysis

Analytical reasoning task reveals limits of social learning in networks

Branching Distributional Equations and their Applications

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Using! to Teach Graph Theory

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Non Overlapping Communities

V2: Measures and Metrics (II)

Clustering analysis of gene expression data

CS 151. Graphs. Monday, November 19, 12

Graph and Link Mining

Monday, November 19, 12. Test #2 is Wednesday. For each data structure, you should know: the supported operations

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Supplementary material to Epidemic spreading on complex networks with community structures

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Information Visualization. Jing Yang Spring Graph Visualization

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Part 1: Link Analysis & Page Rank

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Centrality Measures. Computing Closeness and Betweennes. Andrea Marino. Pisa, February PhD Course on Graph Mining Algorithms, Università di Pisa

Unit VIII. Chapter 9. Link Analysis

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Graphs. Introduction To Graphs: Exercises. Definitions:

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

MCL. (and other clustering algorithms) 858L

SI Networks: Theory and Application, Fall 2008

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Inf 496/596 Topics in Informatics: Analysis of Social Network Data

CSCI5070 Advanced Topics in Social Computing

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

HW 4: PageRank & MapReduce. 1 Warmup with PageRank and stationary distributions [10 points], collaboration

Chapter 9 Graph Algorithms

Impact of Clustering on Epidemics in Random Networks

Signal Processing for Big Data

Slides based on those in:

Transcription:

Social Networks Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Introduction

Political blogs

Organizations

Facebook networks

Ingredient networks

SN representation

Networks are sets of nodes connected by edges. What are networks? node Network Graph edge points lines vertices edges, arcs math nodes links computer science sites bonds physics actors ties, relations sociology

Network elements: edges Directed (also called arcs, links) A -> B A likes B, A gave a gift to B, A is B s child Undirected A <-> B or A B A and B like each other A and B are siblings A and B are co-authors

Edge attributes Examples weight (e.g. frequency of communication) ranking (best friend, second best friend ) type (friend, relative, co-worker) properties depending on the structure of the rest of the graph: e.g. betweenness

Directed networks girls school dormitory dining-table partners, 1 st and 2 nd choices (Moreno, The sociometry reader, 1960) Ada Louise Lena Marion Adele Jane Cora Eva Frances Maxine Mary Robin Martha Anna Edna Betty Ruth Jean Alice Laura Helen Ellen Hazel Hilda Ella Irene

Data representation adjacency matrix edgelist adjacency list

Adjacency matrices Representing edges (who is adjacent to whom) as a matrix A ij = 1 if node i has an edge to node j = 0 if node i does not have an edge to j A ii = 0 unless the network has self-loops A ij = A ji if the network is undirected, or if i and j share a reciprocated edge

Example adjacency matrix 1 2 3 A = 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 5 4 0 0 0 0 1 1 1 0 0 0

Edge list Edge list 2, 3 2, 4 3, 2 3, 4 4, 5 5, 2 5, 1 1 5 2 4 3

Adjacency lists Adjacency list is easier to work with if network is large sparse quickly retrieve all neighbors for a node 1: 2: 3 4 3: 2 4 4: 5 5: 1 2 1 5 2 4 3

Computing metrics degree & degree distribution connected components

Degree: which node has the most edges????

Nodes Node network properties from immediate connections indegree how many directed edges (arcs) are incident on a node outdegree how many directed edges (arcs) originate at a node degree (in or out) number of edges incident on a node indegree=3 outdegree=2 degree=5 from the entire graph centrality (betweenness, closeness)

Is everything connected?

Connected components Strongly connected components Each node within the component can be reached from every other node in the component by following directed links Strongly connected components B C D E A G H F Weakly connected components: every node can be reached from every other node by following links in either direction A B E D C F H G Weakly connected components A B C D E G H F A B C F G In undirected networks one talks simply about connected components E D H

Giant component if the largest component encompasses a significant fraction of the graph, it is called the giant component

Erdös and Rényi

Erdös-Renyi: simplest network model Assumptions nodes connect at random network is undirected Key parameter (besides number of nodes N) : p or M p = probability that any two nodes share and edge M = total number of edges in the graph

what they look like after spring layout

Degree distribution (N,p)-model: For each potential edge we flip a biased coin with probability p we add the edge with probability (1-p) we don t

Emergence of the giant component http://ccl.northwestern.edu/netlogo/models/giantcomponent

Percolation on a 2D lattice http://www.ladamic.com/netlearn/netlogo501/latticepercolation.html

size of giant component Percolation threshold Percolation threshold: how many edges need to be added before the giant component appears? As the average degree increases to z = 1, a giant component suddenly appears average degree av deg = 0.99 av deg = 1.18 av deg = 3.96

Giant component another angle How many other friends besides you does each of your friends have? By property of degree distribution the average degree of your friends, you excluded, is z so at z = 1, each of your friends is expected to have another friend, who in turn have another friend, etc. the giant component emerges

Average shortest path How many hops on average between each pair of nodes? again, each of your friends has z = avg. degree friends besides you ignoring loops, the number of people you have at distance l is z l

friends at distance l N l =z l scaling: average shortest path l av l av ~ log N log z

Between-ness

is counting the edges enough?

Stanford Social Web (ca. 1999) network of personal homepages at Stanford

different notions of centrality In each of the following networks, X has higher centrality than Y according to a particular measure Y X Y X Y X X Y indegree outdegree betweenness closeness

what does degree not capture? In what ways does degree fail to capture centrality in the following graphs?

Brokerage not captured by degree Y X

betweenness: capturing brokerage intuition: how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops? X Y

betweenness: definition C ( i) g ( i) / B j k jk g jk Where g jk = the number of shortest paths connecting jk g jk (i) = the number that actor i is on. Usually normalized by: C B ' (i) = C B (i )/[(n -1)(n- 2)/2] number of pairs of vertices excluding the vertex itself

betweenness on toy networks non-normalized version: A B C D E A lies between no two other vertices B lies between A and 3 other vertices: C, D, and E C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E) note that there are no alternate paths for these pairs to take, so C gets full credit

betweenness on toy networks non-normalized version:

betweenness on toy networks non-normalized version: A B C E why do C and D each have betweenness 1? They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit: ½+½ = 1 D

closeness What if it s not so important to have many direct friends? Or be between others But one still wants to be in the middle of things, not too far from the center

need not be in a brokerage position Y X X X X Y Y Y

Closeness is based on the length of the average shortest path between a node and all other nodes in the network Closeness Centrality: closeness: definition C c ( i) N j 1 d( i, j) 1 Normalized Closeness Centrality C C ' (i) = (C C (i))/(n -1)

Eigenvector centrality How central you are depends on how central your neighbors are

Eigenvector centrality in directed networks PageRank brings order to the Web: it's not just the pages that point to you, but how many pages point to those pages, etc. more difficult to artificially inflate centrality with a recursive definition an important page, e.g. slashdot if a web page is slashdotted, it gains attention Many webpages scattered across the web

Ranking pages by tracking a drunk A random walker following edges in a network for a very long time will spend a proportion of time at each node which can be used as a measure of importance

Trapping a drunk Problem with pure random walk metric: Drunk can be trapped and end up going in circles

Ingenuity of the PageRank algorithm Allow drunk to teleport with some probability e.g. random websurfer follows links for a while, but with some probability teleports to a random page (bookmarked page or uses a search engine to start anew)

PageRank PageRank example: probable location of random walker after 1 step 20% teleportation probability 1 6 8 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 t=0 2 7 0.2 0.1 0 1 2 3 4 5 6 7 8 3 4 5 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 t=1 1 2 3 4 5 6 7 8 slide adapted from: Dragomir Radev

Coordination game and clustering coefficient

networked coordination game choice between two things, A and B (e.g. basketball and soccer) if friends choose A, they get payoff a if friends choose B, they get payoff b if one chooses A while the other chooses B, their payoff is 0

coordinating with one s friends Let A = basketball, B = soccer. Which one should you learn to play? fraction p = 3/5 play basketball fraction p = 2/5 play soccer

which choice has higher payoff? d neighbors p fraction play basketball (A) (1-p) fraction play soccer (B) if choose A, get payoff p * d *a if choose B, get payoff (1-p) * d * b so should choose A if p d a (1-p) d b or p b / (a + b)

two equilibria everyone adopts A everyone adopts B

what happens in between? What if two nodes switch at random? Will a cascade occur? example: a = 3, b = 2 payoff for nodes interaction using behavior A is 3/2 as large as what they get if they both choose B nodes will switch from B to A if at least q = 2/(3+2) = 2/5 of their neighbors are using A

how does a cascade occur suppose 2 nodes start playing basketball due to external factors (e.g. they are bribed with a free pair of shoes by some devious corporation)

you pick the initial 2 nodes A larger example (Easley/Kleinberg Ch. 19) does the cascade spread throughout the network? http://www.ladamic.com/netlearn/netlogo412/cascademodel.html

implications for viral marketing if you could pay a small number of individuals to use your product, which individuals would you pick?

Clustering Global clustering coefficient 3 x number of triangles in the graph number of connected triples of vertices C = 3 x number of triangles in the graph number of connected triples

Local clustering coefficient (Watts&Strogatz 1998) For a vertex i The fraction pairs of neighbors of the node that are themselves connected Let n i be the number of neighbors of vertex i C i = # of connections between i s neighbors max # of possible connections between i s neighbors Ci directed = # directed connections between i s neighbors n i * (n i -1) Ci undirected = # undirected connections between i s neighbors n i * (n i -1)/2

Local clustering coefficient (Watts&Strogatz 1998) Average over all n vertices C 1 n i C i i n i = 4 max number of connections: 4*3/2 = 6 3 connections present C i = 3/6 = 0.5 link present link absent

Explanation n i = 3 there are 2 connections present out of max of 3 possible C i = 2/3 i