Online Social Networks 2 - PDF Free Download

Online Social Networks 2 Formation, Contagion, Information Diffusion Some slides adapted from Michael Kearns (Upenn) and Eytan Adar (Michigan) 1

Today's Plan Information diffusion, contagion Formation of (online) Social Networks Nitty-gritty: Facebook graph search API 2

TIPPING AND INFO DIFFUSION 3

Gladwell Tipping Examples Hush Puppies: almost dead in 1994; > 10x sales increase by 96 no advertising or marketing budget claim: viral fashion spread from NY teens to designers must be certain connectivity and individuals NYC Crime: 1992: > 2K murders; < 770 five years later standard socio-economic explanations: police performance, decline of crack, improved economy, aging but these all changed incrementally alternative: small forces provoked anti-crime virus Technology tipping: fax machines, email, cell phones Tipping origins: 1970 s white flight

Key Characteristics of Tipping (according to Gladwell) Contagion: viral spread of disease, ideas, knowledge, etc. spread is determined by network structure network structure will influence outcomes who gets infected, infection rate, number infected Amplification of the incremental: small changes can have large, dramatic effects network topology, infectiousness, individual behavior Sudden, not gradual change: phase transitions and non-linear phenomena How can we formalize some of these ideas?

crime rate crime rate crime rate crime rate Rates of Growth and Decay linear linear size of police force size of police force nonlinear, gradual decay nonlinear, tipping size of police force size of police force

Gladwell s Three Sources of Tipping The Law of the Few (Messengers): Connectors, Mavens and Salesman Hubs and Authorities The Stickiness Factor (Message): The infectiousness of the message itself Still largely treated as a crude property of transmission The Power of Context: global influences affecting messenger behavior

Epidemos Forest fire simulation: http://www.cis.upenn.edu/~mkearns/teaching/networkedlife/demos/forestfire.html grid of forest and vacant cells fire always spreads to adjacent four cells perfect stickiness or infectiousness connectivity parameter: probability of forest fire will spread to all of connected component of source tip when forest ~ 0.6 clean mathematical formalization (e.g. fraction burned) Viral spread simulation: http://www.cis.upenn.edu/~mkearns/teaching/networkedlife/demos/epidemic.html population on a grid network, each with four neighbors stickiness parameter: probability of passing disease connectivity parameter: probability of rewiring local connections to random long-distance no long distance connections: tip at stickiness ~ 0.3 at rewiring = 0.5, often tip at stickiness ~ 0.2

Mathematizing the Forest Fire Start with a regular 2-dimensional grid network this represents a complete forest Delete each vertex (and its edges) with probability p (independently) this represents random clear-cutting or natural fire breaks Choose a random remaining vertex v this is my campsite Q: What is the expected size of v s connected component? this is how much of the forest is going to burn

Mathematizing the Epidemic Start with a regular 2-dimensional grid network this represents a dense population with local connections (neighbors) Rewire each edge with probability p to a random destination this represents long-distance connections (chance meetings) Choose a random remaining vertex v this is an infection; spreads probabilistically to each of v s neighbors Fraction killed more complex: depends on both size and structure of v s connected component Important theme: mixing regular, local structure with random, long-distance connections

Small Worlds and the Law of the Few Gladwell s Law of the Few : a small number of highly connected vertices ( heavy tails) inordinate importance for global connectivity ( small diameter) Travers & Milgram 1969: classic early social network study destination: a Boston stockbroker; lived in Sharon, MA sources: Nebraska stockowners; Nebraska and Boston randoms forward letter to a first-name acquaintance closer to target target information provided: name, address, occupation, firm, college, wife s name and hometown navigational value? Basic findings: 64 of 296 chains reached the target average length of completed chains: 5.2 interaction of chain length and navigational difficulties main approach routes: home (6.1) and work (4.6) Boston sources (4.4) faster than Nebraska (5.5) no advantage for Nebraska stockowners

The Connectors to the Target T & M found that many of the completed chains passed through a very small number of penultimate individuals Mr. G, Sharon merchant: 16/64 chains Mr. D and Mr. P: 10 and 5 chains Connectors are individuals with extremely high degree why should connectors exist? how common are they? how do they get that way? (see Gladwell for anecdotes) Connectors can be viewed as the hubs of social traffic Note: no reason target must be a connector for small worlds Two ways of getting small worlds (low diameter): truly random connection pattern dense network a small number of well-placed connectors in a sparse network

Small Worlds: A Modern Experiment The Columbia Small Worlds Project: considerably larger subject pool, uses email subject of Dodds et al. assigned paper Basic methodology: 18 targets from 13 countries on-line registration of initial participants, all tracking electronic 99K registered, 24K initiated chains, 384 reached targets Some findings: < 5% of messages through any penultimate individual large friend degree rarely (< 10%) cited Dodds et al: no evidence of connectors! (but could be that connectors are not cited for this reason ) interesting analysis of reasons for forwarding interesting analysis of navigation method vs. chain length

Tie Strength Strength of Weak Ties (Granovetter) Granovetter: How often did you see the contact that helped you find the job prior to the job search 16.7 % often (at least once a week) 55.6% occasionally (more than once a year but less than twice a week) 27.8% rarely once a year or less Weak ties will tend to have different information than we and our close contacts do weak ties will tend to have high betweenness and low transitivity

The Strength of Weak Ties Not all links are of equal importance Granovetter 1974: study of job searches 56% found current job via a personal connection of these, 16.7% saw their contact often the rest saw their contact occasionally or rarely Your closest contacts might not be the most useful similar backgrounds and experience they may not know much more than you do connectors derive power from a large fraction of weak ties Further evidence in Dodds et al. paper T&M, Granovetter, Gladwell: multiple spaces & distances geographic, professional, social, recreational, political, we can reason about general principles without precise measurement

The Magic Number 150 Social channel capacity correlation between neocortex size and group size Dunbar s equation: neocortex ratio group size Clear implications for many kinds of social networks Again, a topological constraint on typical degree From primates to military units to Gore-Tex

Link Prediction?

Link Prediction in Social Net Data We know things about structure Homophily = like likes like or bird of a feather flock together or similar people group together Mutuality Triad closure Various measures that try to use this

Link Prediction Simple metrics Only take into account graph properties 1 log ( ) z ( x) ( y) z Γ(x) = neighbors of x Originally: 1 / log(frequency(z)) Liben-Nowell, Kleinberg (CIKM 03)

Link Prediction Simple metrics Only take into account graph properties l 1 l paths l xy, Paths of length l (generally 1) from x to y weighted variant is the number of times the two collaborated Liben-Nowell, Kleinberg (CIKM 03)

Link Prediction in Relational Data We know things about structure Homophily = like likes like or bird of a feather flock together or similar people group together Mutuality Triad closure Slightly more interesting problem if we have relational data on actors and ties Move beyond structure

Relationship & Link Prediction advisorof? Employee /contractor Salary Time at company

Describing Real Networks: Heavy-Tailed Degree Distributions

What Do We Mean By Not Heavy-Tailed? Mathematical model of a typical bell-shaped distribution: the Normal or Gaussian distribution over some quantity x Good for modeling many real-world quantities but not degree distributions if mean/average is then probability of value x is: main point: exponentially fast decay as x moves away from if we take the logarithm: probability(x) e x Claim: if we plot log(x) vs log(probability(x)), will get strong curvature Let s look at some (artificial) sample data (Poisson better than Normal for degrees, but same story holds) 2 log( probability(x)) (x ) 2

frequency(x) log(frequency(x)) x log(x)

What Do We Mean By Heavy-Tailed? One mathematical model of a typical heavy-tailed distribution: the Power Law distribution with exponent main point: inverse polynomial decay as x increases if we take the logarithm: probability(x) 1/x Claim: if we plot log(x) vs log(probability(x)), will get a straight line! Let s look at (artificial) some sample data log( probability(x)) log( x)

frequency(x) log(frequency(x)) x log(x)

Oracle of Bacon Revisited

Degree Distribution of the Web Graph [Broder et al.]

Actor Collaborations; Web; Power Grid [Barabasi and Albert]

Zipf s Law Look at the frequency of English words: the is the most common, followed by of, to, etc. claim: frequency of the n-th most common ~ 1/n (power law, a ~ 1) General theme: rank events by their frequency of occurrence resulting distribution often is a power law! Other examples: North America city sizes personal income file sizes genus sizes (number of species) the long tail of search (on which more later ) let s look at log-log plots of these People seem to dither over exact form of these distributions e.g. value of a but not over heavy tails

iphone App Popularity

Summary Power law distribution is a good mathematical model for heavy tails; Normal/bell-shaped is not Statistical signature of power law and heavy tails: linear on a log-log scale Many social and other networks exhibit this signature Next universal : small diameter

How Do Real Networks Look? II. Small Diameter

What Do We Mean By Small Diameter? Definition of diameter: assumes network has a single connected component (or examine giant component) for every pair of vertices u and v, compute shortest-path distance d(u,v) then (average-case) diameter of entire network or graph G with N vertices is diameter(g) 2/(N(N 1)) d(u,v) equivalent: pick a random pair of vertices (u,v); what do we expect d(u,v) to be? What s the smallest/largest diameter(g) could be? smallest: 1 (complete network, all N(N-1)/2 edges present); independent of N largest: linear in N (chain or line network) Small diameter: no precise definition, but certainly << N Travers and Milgram: ~5; any fixed network has fixed diameter may want to allow diameter to grow slowly with N (?) e.g. log(n) or log(log(n)) u,v

Empirical Support Travers and Milgram, 1969: diameter ~ 5-6, N ~ 200M Columbia Small Worlds, 2003: diameter ~4-7, N ~ web population? Lescovec and Horvitz, 2008: Microsoft Messenger network Diameter ~6.5, N ~ 180M Backstrom et al., 2012: Facebook social graph diameter ~5, N ~ 721M

Summary So far: naturally occuring, large-scale networks exhibit: heavy-tailed degree distributions small diameter Next up: clustering of connectivity

How Do Real Networks Look? III. Clustering of Connectivity

The Clustering Coefficient of a Network Intuition: a measure of how bunched up edges are The clustering coefficient of vertex u: let k = degree of u = number of neighbors of u k(k-1)/2 = max possible # of edges between neighbors of u c(u) = (actual # of edges between neighbors of u)/[k(k-1)/2] fraction of pairs of friends that are also friends 0 <= c(u) <= 1; measure of cliquishness of u s neighborhood Clustering coefficient of a graph G: CC(G) = average of c(u) over all vertices u in G k = 4 k(k-1)/2 = 6 c(u) = 4/6 = 0.666 u

What Do We Mean By High Clustering? CC(G) measures how likely vertices with a common neighbor are to be neighbors themselves Should be compared to how likely random pairs of vertices are to be neighbors Let p be the edge density of network/graph G: p E /(N(N 1)/2) Here E = total number of edges in G If we picked a pair of vertices at random in G, probability they are connected is exactly p So we will say clustering is high if CC(G) >> p

Clustering Coefficient Example 1 1/(2 x 1/2) = 1 2/(3 x 2/2) = 2/3 3/(4 x 3/2) = 1/2 2/(3 x 2/2) = 2/3 1/(2 x 1/2) = 1 C.C. = (1 + ½ + 1 + 2/3 + 2/3)/5 = 0.7666 p = 7/(5 x 4/2) = 0.7 Not highly clustered

Clustering Coefficient Example 2 Network: simple cycle + edges to vertices 2 hops away on cycle By symmetry, all vertices have the same clustering coefficient Clustering coefficient of a vertex v: Degree of v is 4, so the number of possible edges between pairs of neighbors of v is 4 x 3/2 = 6 How many pairs of v s neighbors actually are connected? 3 --- the two clockwise neighbors, the two counterclockwise, and the immediate cycle neighbors So the c.c. of v is 3/6 = ½ Compare to overall edge density: Total number of edges = 2N Edge density p = 2N/(N(N-1)/2) ~ 4/N As N becomes large, ½ >> 4/N So this cyclical network is highly clustered

Clustering Coefficient Example 3 Divide N vertices into sqrt(n) groups of size sqrt(n) (here N = 25) Add all connections within each group (cliques), connect leaders in a cycle N sqrt(n) non-leaders have C.C. = 1, so network C.C. 1 as N becomes large Edge density is p ~ 1/sqrt(N)

NETWORK FORMATION OVERVIEW 45

The Erdös-Renyi (Random Graph) Model Really a randomized algorithm for generating networks Begin with N isolated vertices, no edges Add edges gradually, one at a time Randomly select two vertices not already neighbors, add edge So edges are added in a random, unbiased fashion About the simplest (dumbest?) formation model possible But what can it already explain?

The Erdös-Renyi (Random Graph) Model After adding E edges, edge density is p E /(N(N 1)/2) As E increases, p goes from 0 to 1 Q: What are the likely structural properties at density p? e.g. as p = 0 1, small diameter occurs; single connected component At what values of p do natural structures emerge? We will see: many natural and interesting properties arise at rather small p furthermore, they arise very suddenly (tipping/threshold) Let s revisit the Erdös-Renyi simulator

Why Can t There Be Two Large Components? N /2 densely connected N 2 /4 missing edges N /2 densely connected

Threshold Phenomena in Erdös-Renyi Theorem: In Erdös-Renyi, as N becomes large: If p < 1/N, probability of a giant component (e.g. 50% of vertices) goes to 0 If p > 1/N, probability of a giant component goes to 1, and all other components will have size at most log(n) Note: at edge density p, expected/average degree is p(n-1) ~ pn So at p ~ 1/N, average degree is ~ 1: incredibly sparse So model explains giant components in real networks General tipping point at edge density q (may depend on N): If p < q, probability of property goes to 0 as N becomes large If p > q, probability of property goes to 1 as N becomes large For example, could examine property diameter 6 or less

Threshold Phenomena in Erdös-Renyi Theorem: In Erdös-Renyi, as N becomes large: Threshold at for diameter 6. Note: degrees growing (slightly) with N If N = 300M (U.S. population) then average degree pn ~ 500 If N = 7BN (world population) then average degree pn ~ 1000 Not unreasonable figures p ~ log(n)/n 5/ 6 At p not too far from 1/N, get strong connectivity Very efficient use of edges

Threshold Phenomena in Erdös-Renyi In fact: Any monotone property of networks exhibits a threshold phenomenon in Erdös-Renyi monotone: property continues to hold if you add edges to the networks e.g. network has a group of K vertices with at least 71% neighbors e.g. network has a cycle of at least K vertices Tipping is the rule, not the exception

What Doesn t the Model Explain? Erdös-Renyi explains giant component and small diameter But: degree distribution not heavy-tailed; exponential decay from mean (Poisson) clustering coefficient is *exactly* p To explain these, we ll need richer models with greater realism

MODELS OF NETWORK FORMATION II. CLUSTERING MODELS

Programming Clustering Erdös-Renyi: global/background edge density p all edges appear independently with probability p no bias towards connecting friends of friends (distance 2) no high clustering But in real networks, such biases often exist: people introduce their friends to each other people with common friends may share interests (homophily) So natural to consider a model in which: the more common neighbors two vertices share, the more likely they are to connect still some background probability of connecting still selecting edges randomly, but now with a bias towards friends of friends

Making it More Precise: the a-model 1.0 y = probability of connecting u & v smaller a y ~ p (x /N) a larger a default probability p x = number of current common neighbors of u & v network size N

From D. Watts, Small Worlds

An Alternative Model A different model: start with all vertices arranged on a ring or cycle (or a grid) connect each vertex to all others that are within k cycle steps with probability q, rewire each local connection to a random vertex Initial cyclical structure models local or geographic connectivity Long-distance rewiring models long-distance connectivity q=0: high clustering, high diameter q=1: low clustering, low diameter (~ Erdös-Renyi) Again is a magic range of q where we get both high clustering and low diameter Let s look at this demo http://ccl.northwestern.edu/netlogo/models/run.cgi?smallworlds.839.533

Summary Two rather different ways of getting high clustering, low diameter: bias connectivity towards shared friendships mix local and long-distance connectivity Both models require proper tuning to achieve simultaneously Both a bit more realistic than Erdös-Renyi Neither model exhibits heavy-tailed degree distributions

Rich-Get-Richer Processes Processes in which the more someone has of something, the more likely they are to get more of it Examples: the more friends you have, the easier it is to make more the more business a firm has, the easier it is to win more the more people there are at a nightclub, the more who want to go Such processes will amplify inequality One simple and general model: if you have amount x of something, the probability you get more is proportional to x so if you have twice as much as me, you re twice as likely to get more Generally leads to heavy-tailed distributions (power laws) Let s look at a simple nightclub demo

Preferential Attachment Start with two vertices connected by an edge At each step, add one new vertex v with one edge back to previous vertices Probability a previously added vertex u receives the new edge from v is proportional to the (current) degree of u more precisely, probability u gets the edge = (current degree of u)/(sum of all current degrees) Vertices with high degree are likely to get even more links! just like the crowded nightclub Generates a power law distribution of degrees Variation: each new vertex initially gets k edges