ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

Similar documents
Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Maximizing the Spread of Influence through a Social Network

Cascades. Rik Sarkar. Social and Technological Networks. University of Edinburgh, 2018.

Impact of Clustering on Epidemics in Random Networks

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Example 3: Viral Marketing and the vaccination policy problem

Extracting Influential Nodes for Information Diffusion on a Social Network

Part I Part II Part III Part IV Part V. Influence Maximization

Jure Leskovec Machine Learning Department Carnegie Mellon University

The Coral Project: Defending against Large-scale Attacks on the Internet. Chenxi Wang

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Influence Maximization in the Independent Cascade Model

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Midterm Review NETS 112

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Data mining --- mining graphs

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Plan of the lecture I. INTRODUCTION II. DYNAMICAL PROCESSES. I. Networks: definitions, statistical characterization, examples II. Modeling frameworks

Failure in Complex Social Networks

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

M.E.J. Newman: Models of the Small World

CSCI5070 Advanced Topics in Social Computing

CS224W: Analysis of Networks Jure Leskovec, Stanford University

IRIE: Scalable and Robust Influence Maximization in Social Networks

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann

Models and Algorithms for Network Immunization

Simulating the Diffusion of Information: An Agent-based Modeling Approach

Combining intensification and diversification to maximize the propagation of social influence

Information Dissemination in Socially Aware Networks Under the Linear Threshold Model

Agent Based Modeling Tutorial

Cascading Node Failure with Continuous States in Random Geometric Networks

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

1 More configuration model

1 Introduction. Russia

Recovering Social Networks from Contagion Information

Comparing the diversity of information by word-of-mouth vs. web spread

Lecture Note: Computation problems in social. network analysis

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

THE preceding chapters were all devoted to the analysis of images and signals which

Algorithms for Data Processing Lecture I: Introduction to Computational Efficiency

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

A Generating Function Approach to Analyze Random Graphs

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

Graph Theory and Network Measurment

Extracting Information from Complex Networks

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Jure Leskovec Computer Science Department Cornell University / Stanford University

Positive and Negative Links

Combinatorial Model and Bounds for Target Set Selection

Economics of Information Networks

arxiv: v3 [cs.ni] 3 May 2017

A Class of Submodular Functions for Document Summarization

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Navigation in Networks. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

A Course in Machine Learning

First-improvement vs. Best-improvement Local Optima Networks of NK Landscapes

Topic mash II: assortativity, resilience, link prediction CS224W

Complex Networks. Structure and Dynamics

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

arxiv: v1 [cs.si] 12 Jan 2019

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

David Easley and Jon Kleinberg January 24, 2007

Derivatives and Graphs of Functions

Viral Marketing for Product Cross-Sell through Social Networks

On the Resilience of Bipartite Networks

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

Graphical Models. David M. Blei Columbia University. September 17, 2014

Preliminary results from an agent-based adaptation of friendship games

Recap! CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

Game Theoretic Models for Social Network Analysis

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Homework 1 Networked Life (NETS 112) Fall 2016 Prof Michael Kearns

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Understanding Disconnection and Stabilization of Chord

Social Network Analysis

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Diffusion and Clustering on Large Graphs

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

CS 534: Computer Vision Segmentation and Perceptual Grouping

Clustering: Classic Methods and Modern Views

Numerical Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

CS-E5740. Complex Networks. Scale-free networks

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Exploring Watts Cascade Boundary

Big Mathematical Ideas and Understandings

arxiv: v1 [cs.si] 7 Aug 2017

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Mathematics of Networks II

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

EC422 Mathematical Economics 2

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Absorbing Random walks Coverage

Transcription:

ECS 289 / MAE 298, Lecture 15 Mar 2, 2011 Diffusion, Cascades and Influence, Part II

Diffusion and cascades in networks (Nodes in one of two states) Viruses (human and computer) contact processes epidemic thresholds Technologies Winner take all Benefit of first to market Benefit of second to market Political or social beliefs and societal norms A long history of study, now trying to add impact of underlying network structure.

Diffusion, Cascade behaviors, and influential nodes Part I: Ensemble models (last time) Generating functions / Master equations / giant components Generating function approach: Watts PNAS (2002) Fractional threshold model (Φ i ). Node activated once a fraction of it s neighbors Φ i are active. A vulnerable node is one that needs only a single neighbor to be active before it flips (i.e., Φ i 1/k). Use generating functions to calculate the expected size of clusters of vulnerable nodes. Results Heterogeneity in thresholds (Φ i ) enhances global cascades. Heterogeneity of degree (P k ) suppresses global cascades.

Diffusion, Cascade behaviors, and influential nodes Part I: Ensemble models (last time) Generating functions / Master equations / giant components Master equation approach: Pastor-Satorras and Vespignani Contact process, epidemic spreading Probability of becoming activated is proportional to the number of active neighbors. Results Heterogeneity of degree (P k ) enhances global spreading. For PLRG with 2 < γ < 3 the epidemic threshold λ c 0.

Diffusion, Cascade behaviors, and influential nodes Part II: Contact processes with individual node preferences Long history of empirical / qualitative study in the social sciences (Peyton Young, Granovetter, Martin Nowak...; diffusion of innovation; societal norms) Recent theorems: network coordination games (bigger payout if connected nodes in the same state) (Kleinberg, Kempe, Tardos, Dodds, Watts, Domingos) Finding the influential set of nodes, or the k most influential Often NP-hard and not amenable to approximation algorithms Key distinction: Thresholds of activation (leads to unpredictable behaviors) Diminishing returns (submodular functions nicer)

Diffusion, Cascade behaviors, and influential nodes Part III: Markov chains and mixing times New game-theoretic approaches (general coordination games) Results in an Ising model. Montanari and Saberi PNAS 2010. Techniques from Parts I and II suggest: Innovations spread quickly in highly connected networks. Long-range links benefit spreading. High-degree nodes quite influential (enhance spreading). Techniques from Part III suggest: Innovations spread quickly in locally connected networks. Local spatial coordination enhances spreading (having a spatial metric; graph embeddable in small dimension). High-degree nodes slow down spreading.

Part II. Network Coordination Games The most basic model: Reviewed in Kleinberg Cascading Behavior in Networks: Algorithmic and Economic Issues, Chap 24 of Algorithmic Game Theory, (Cambridge University Press, 2007). Again each node in one of two states, say { 1, +1}. Play a game with each connected neighbor independently. Total payout is sum over all games. Assume neighbor(s) of j in fixed state while j updates. Positive payout if connected nodes i and j adopt the same state. No payout if they differ. And -1 can have different payout that +1 coordinated behavior. Payout matrix: q 0 0 (1-q)

How each node operates Again assume all other nodes fixed while node j updates. It has k A j nodes in state 1, and kb j nodes in state +1. If it chooses state 1, payout of qk A j. If it chooses state +1, payout of (1 q)k B j. Chooses 1 if qk A j > (1 q)kb j. Substitute in k j = k A j + kb j and rearrange: Criteria: choose 1 if k B j < qk j and +1 if k B j > qk j. A threshold model! Adopt +1 if a fraction q of your neighbors have state +1.

Contagion threshold and cascades Start all nodes in 1. And all nodes update synchronously at discrete time steps. Key question: When is there a small set of nodes S, that when set to +1 convert all (or almost all) of the population? A set S is contagious if every other node is converted by S. Easier for S to be contagious if the threshold q is small. Define the contagion threshold of a graph G to be the maximum q for which there exists a finite contagous set. (Like with generating functions, here no notion of how long it takes for the full network to be activated. Just a final steadystate answer.)

Progressive vs. non-progressive processes The model thus far is non-progressive: nodes can flip from 1 to +1 and back to 1. This makes the situation less stable. Consider a line of all 1 at the start with a single +1 in the center, and q = 1/2. At next time steps neighbors of the +1 flip, but the +1 switches back to 1! And the whole system ends up blinking. Progressive: Once you flip, always stay in that state. The line above now all flips to +1 in a wavefront moving right and left-wards.

Theorem: The Contagion Threshold for any Graph is at most 1/2. (Recall the contagion threshold is the maximum value of q for which a finite contagious set exists.) Independent of progressive vs non-progressive. A behavior cant spread very far if it requires a strict majority of your friends to convince you to adopt it. This means if q > 1/2 on any graph, it cannot support a cascade and the full graph will not be activated. This is for any graph: uniform degree, power law, etc.

Extending this simple model So far all nodes have same fractional threshold q, and all neighbors contribute equally in calculation of fraction. The General Linear Threshold Model Directed graphs (not reciprocal influence necessarily). Each node has a threshold chosen uniformly at random between [0, 1]. Each neighbor exerts a non-negative weight. The only constraint is that sum over all the weights is less than or equal to 1. Note we now have diversity of influence (e.g., spouse/relative can exert stronger weight than coworker/friend).

Finding the influential nodes Motivation Viral marketing use word-of-mouth effects to sell product with minimal advertising cost. Design of search tools to track news, blogs, and other forms of on-line discussion about current events Finding the influential nodes: formally The minimum set S V that will lead to the whole network being activated. The optimal set of a specified size k = S that will lead to largest portion of the network being activated.

Due to thresholds/ critical mass In general NP-hard to find optimal set S. NP-hard to even find a approximate optimal set (optimal to within factor η 1 ɛ where n is network size and ɛ > 0.) ( inapproximability ) Due to thresholds (esp if each node can have its own) might have a tiny activated final set of nodes but it jumps abruptly if just a few more nodes or, moreover, the right nodes activated. Kleinberg calls this abrupt response the Knife edge property

Diminishing returns (No longer a threshold, but a concave function) Each additional friend who adopts the new behavior enhances your chance of adopting the new behevaior, but with less influence for each additional friend (from Leskovec talk)

Diminishing returns (Submodular / concave function) The benefit of adding elements decreases as the set to which they are being added grows. So no longer get to have more influence from family or other special nodes. (Instead its the first nodes exert more influence.) Since no longer have special nodes easy to build up optimal set S of k nodes. Hill climbing add one at the time nodes to the set S that cause maximum impact.

Hill climbing (from Leskovec talk)

Submodular and hill climbing more formally: (from Leskovec talk)

Empirical observations (from Leskovec talk)

(from Leskovec talk)

Joining Livejournal: on online bulletin board network 0.025 Probability of joining a community when k friends are already members 0.02 0.015 probability 0.01 0.005 0 0 5 10 15 20 25 30 35 40 45 50 k Diminishing returns only sets in once k > 3. Network effect not illustrated by curve: If the k friends are highly clustered, the new user is more likely to join.

(from Leskovec talk)

(from Leskovec talk)

(from Leskovec talk)

(from Leskovec talk)

(from Leskovec talk)

(from Leskovec talk)

(from Leskovec talk)

For a wealth of additional information see Leskovec talk: http://videolectures.net/mmdss07 leskovec dcbn/ The role product category plays (books, dvds, videos, anime dvds) Predicting recommendation success with linear models. How do people actually get recommendations Amazon recommendation of similar purchases Personalized rec based on previous purchases/likes 68% of people consult friends and family before purchasing home electronics [Burke 2003]. (i.e. More influenced by friends than strangers.) 94% of users make recommendations w/o having received one (they are the seed nodes)

Summary Important distinctions for cascade processes Contagion (e.g. a virus) versus social behaviors. Threshold models / critical mass (abrupt changes as set S increased) Diminishing returns (submodular / concave) KKT03, KKT05: If individual function for all nodes exhibit diminishing returns, the resulting influence function for the graph will be submodular ( local to global ). Can approximate such sets (hill climbing)

Other interesting, related models on networks Voter models Synchronization (related to the spectral properties of A the adjacency matrix). Finally: Part III: Markov chains and mixing times Montanari and Saberi, The Spread of Innovations in Social Networks, PNAS 2010 Unlike any of the above models (which tell us about equilibrium sizes of activated populations), Markov Chain and mixing times tell us about the time it takes for innovations to be adopted! sluggish rapid fire spread