Randomized Rumor Spreading in Social Networks

Similar documents
The push&pull protocol for rumour spreading

Rumour spreading in the spatial preferential attachment model

Simple, Fast and Deterministic Gossip and Rumor Spreading. Main paper by: B. Haeupler, MIT Talk by: Alessandro Dovis, ETH

Simple, Fast and Deterministic Gossip and Rumor Spreading

How Efficient Can Gossip Be? (On the Cost of Resilient Information Exchange)

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

Distributed Computing over Communication Networks: Leader Election

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Erdős-Rényi Model for network formation

Network Mathematics - Why is it a Small World? Oskar Sandberg

Mathematics of Networks II

6 Distributed data management I Hashing

arxiv: v1 [cs.ds] 8 Dec 2015

Overlay (and P2P) Networks

Adaptive Push-Then-Pull Gossip Algorithm for Scale-free Networks

Constructing a G(N, p) Network

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

Introduction to Data Mining

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Distances in power-law random graphs

Broadcast in the Rendezvous Model

Constructing a G(N, p) Network

arxiv: v1 [cs.ds] 12 Feb 2014

Algorithms for Evolving Data Sets

Distributed Network Routing Algorithms Table for Small World Networks

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Maximal Independent Set

8 Introduction to Distributed Computing

Small-World Models and Network Growth Models. Anastassia Semjonova Roman Tekhov

On Compressing Social Networks. Ravi Kumar. Yahoo! Research, Sunnyvale, CA. Jun 30, 2009 KDD 1

Discrete Mathematics Course Review 3

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

Lecture 3: Sorting 1

Maximal Independent Set

VCG Overpayment in Random Graphs

Distributed Systems Leader election & Failure detection

Throughout this course, we use the terms vertex and node interchangeably.

Ferianakademie 2010 Course 2: Distance Problems: Theory and Praxis. Distance Labelings. Stepahn M. Günther. September 23, 2010

A geometric model for on-line social networks

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 7, JULY

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Fractional Cascading in Wireless. Jie Gao Computer Science Department Stony Brook University

M.E.J. Newman: Models of the Small World

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Thomas Moscibroda Roger Wattenhofer MASS Efficient Computation of Maximal Independent Sets in Unstructured Multi-Hop Radio Networks

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Models for the growth of the Web

Lecture 2: From Structured Data to Graphs and Spectral Analysis

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Course Introduction / Review of Fundamentals of Graph Theory

GIAN Course on Distributed Network Algorithms. Network Topologies and Local Routing

Distributed Sorting. Chapter Array & Mesh

A Tight Analysis of the (1 + 1)-EA for the Single Source Shortest Path Problem

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

Algorithm Design and Analysis

GIAN Course on Distributed Network Algorithms. Network Topologies and Interconnects

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Distributed minimum spanning tree problem

Λέων-Χαράλαμπος Σταματάρης

Chapter 8 Sort in Linear Time

Preferential attachment models and their generalizations

A Generating Function Approach to Analyze Random Graphs

Algorithmic Problems Related to Internet Graphs

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

Sorting (Chapter 9) Alexandre David B2-206

Geometric Inhomogeneous Random Graphs (GIRGs)

The Power of Locality

Introduction to Graph Theory

Networks and Discrete Mathematics

Chapter 8 DOMINATING SETS

SmartGossip: : an improved randomized broadcast protocol for sensor networks

Sparse Hypercube 3-Spanners

Random graphs and complex networks

All-to-All Communication

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout

Mathematics of networks. Artem S. Novozhilov

7 Distributed Data Management II Caching

Chapter 8 DOMINATING SETS

Graph theory. Po-Shen Loh. June We begin by collecting some basic facts which can be proved via bare-hands techniques.

Algorithm 23 works. Instead of a spanning tree, one can use routing.

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

SHARED MEMORY VS DISTRIBUTED MEMORY

Random Simplicial Complexes

Test 1 Review Questions with Solutions

Big Data Analytics CSCI 4030

Parallel Breadth First Search

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

CSCI5070 Advanced Topics in Social Computing

Distributed Data Aggregation Scheduling in Wireless Sensor Networks

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

Comparing the strength of query types in property testing: The case of testing k-colorability

Algorithms for Grid Graphs in the MapReduce Model

Sublinear Algorithms Lectures 1 and 2. Sofya Raskhodnikova Penn State University

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Sorting and Selection

4640 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 12, DECEMBER Gossiping With Multiple Messages

Resource Discovery in Networks under Bandwidth Limitations

MAC Theory. Chapter 7. Ad Hoc and Sensor Networks Roger Wattenhofer

Transcription:

Randomized Rumor Spreading in Social Networks Benjamin Doerr (MPI Informatics / Saarland U) Summary: We study how fast rumors spread in social networks. For the preferential attachment network model and the classic push-pull randomized rumor spreading process, we show that all nodes learn the rumor within a logarithmic number of rounds. This is the first such bound for a real-world network model. Surprisingly, rumors spread significantly faster (i) when avoiding to call the same person twice in a row or (ii) in the asynchronous rumor spreading process. [joint work with Mahmoud Fouz (Saarland U) and Tobias Friedrich (MPI-INF, now U Jena)]

We do THEORY 2

Make assumptions (mathematically precise) Social network = preferential attachment graph on n nodes rumor spreading = in theoretical computer science We do THEORY = rigorously prove results by mathematical methods Rigorously prove a result: For all n, the expected first time when all nodes heard the rumor, is at most K log(n) Why do we do this? Gives results as true as possible gives results for arbitrary large networks a proof also reveals why the statement is true Price to pay: Difficult, time-consuming, less info for concrete problems 3

Overview of What Follows Rumor spreading: Why a computer science topic? Define the push-pull rumor spreading process Social network: Preferential attachment (PA) graph [Barabási, Albert (1999)] Result: Rumor spreading in PA graphs is fast and faster, if you don t call the same neighbor twice in a row Some proof ideas Why faster without double-contacts Why faster than in other graphs Some more results: asynchronous rumor spreading is even faster 4

Randomized Rumor Spreading Randomized rumor spreading Any random process in a network where nodes call random neighbors and send/retrieve information Question: How long does it take until a piece of information ( rumor ) is known to all nodes? Example: Complete graph (edges not drawn), push process Frieze&Grimmett Round Round 4: 5: 2: 3: Each Let s 1: hope informed Starting 85: Round the Θ(log remaining vertex 0: n) Starting rounds calls two a vertex suffice random get informed... is vertex with vertex high prob. 5

Why Study Rumor Spreading? Can be used as simple distributed algorithm Maintaining replicated databases: Name servers in the Xerox corporate internet [Dehmers et al. (1987)] communication protocol for unreliable/unknown/dynamic... networks (wireless sensor networks, mobile ad-hoc networks) buzz words: Epidemic algorithms, gossip-based algorithms Model for existing processes Rumors, computer viruses, diseases, influence processes, An early motivation: Technical tool in a mathematical analysis of an all-pairs shortest path algorithm [Frieze, Grimmett (1985)] 6

The Rumor Spreading Process Set-up: Network (undirected graph), nodes can communicate with neighbors Initially, one node has a piece of information ( rumor ) Synchronized push-pull rumor spreading: Synchronized process ( rounds ) In each round, each node contacts a random neighbor if one of the two knows the rumor, it forwards it to the other push operation: caller sends the rumor to a neighbor pull operation: caller learns the rumor from a neighbor [Push protocol: Only informed nodes call random neighbors.] 7

Two Results (both push and push-pull) Rumor spreading is fast: After O(log n) rounds, with high probability the rumor is known by all n vertices of complete graphs [Frieze, Grimmett (1985); Pittel (1987); Karp, Shenker, Schindelhauer, Vöcking (2000)] hypercubes [Feige, Peleg, Raghavan, Upfal (1990)] random graphs G(n,p), p (1+ε) ln(n)/n [FPRU 90] O(log n) = less than K log(n) for some constant K Rumor spreading is robust against transmission failures: In complete graphs: If each call fails with constant probability, the time until all nodes are informed increases only by a constant factor [D, Huber, Levavi (2009)] push-model only: If the message-loss probability is 50%, then time increases by a factor of 1.82 only 8

Social Networks, Real-World Graphs Real-world graph : airports connected by direct flights scientific authors connected by a joint publication Facebook users being friends Observation: Real-world graphs look different. small diameter non-uniform degree distribution: few nodes of high degree: hubs many nodes of small (constant) degree power law: number of nodes of degree d is proportional to d -β [β a constant, often between 2 and 3] 9

Preferential Attachment (PA) Graphs Barabási, Albert (Science 1999): explanation why many real-world networks look like this suggest a model for real-world graphs: preferential attachment (PA) Preferential attachment paradigm: network evolves over time when a new node enters the network, it chooses at random a constant number of neighbors random choice is not uniform, but gives preference to popular nodes probability to attach to node x is proportional to the degree of x PA paradigm defines a random graph model ( PA graphs ) Today: One of the most used models for real-world networks 10

Dirty Details: Definition of PA Graphs Density parameter: integer m PA graph on n vertices: G n ; vertex set {1, n} G 1 : 1 is the single vertex and has m self-loops G n : Obtained from adding the new vertex n to G n-1 One after the other, the new vertex n chooses m neighbors The probability that vertex x is chosen, is proportional to the current degree of x, if x n proportional to 1 + the current degree of x, if x = n (self-loop probability takes into account the current edge starting in n ) Properties: diameter Θ(log n / log log n) [Bollobás, Riordan (2004)] [Bollobás, Riordan (2004)] Θ(log n) = O(log n) and more than K log(n) for some constant K power law degree distribution: For d n 1/5, the expected number of vertices having degree d is proportional to d -3. [BRSpencerTusnády (2003)] 11

Rumor Spreading in PA Graphs Chierichetti, Lattanzi, Panconesi (2009): The push-pull protocol in O((log n) 2 ) rounds informs a PA graph, m 2, with high probability Our results (STOC 11, Comm. ACM 2012): Θ(log n) rounds are necessary and sufficient Θ(log n / loglog n), if contacts are chosen excluding the neighbor contacted in the very previous round (no double-contacts ) Note: Avoiding double-contacts does not improve the O(log n) times for complete graphs, random graphs, hypercubes, Challenge in proving such a result: Analyze a random process on a complicated random graph! 12

Experiments: Time vs. Graph Size Time to inform all vertices for different graph sizes (no double-contacts). Observation: Hidden constants don t matter, PA is truly faster. 13

Experiments: Progress over Time Number of nodes informed after t rounds. All graphs: n = 3,072,441; density m = 38 (except complete). Orkut: Google s Facebook (100m users in India and Brasil). 14

Graphs used in previous experiments Orkut: 2006 crawl of around 11% the Orkut social network (Google s alternative to Facebook, today very popular in India and Brazil, ~100,000,000 users, Alexa traffic rank 81 st ): n = 3,072,441 nodes, ~117 million edges (approx. 38n edges). Preferential attachment (PA) graph: n nodes, each chooses m = 38 neighbors, giving higher preference to already popular nodes Random-attachment graph (m-out random graph): n nodes, each chooses m neighbors uniformly at random Complete graph on n vertices 15

Experiments: Same with Twitter n = 51,161,011 nodes, 1,613,927,450 edges, density m = 32. 16

Proof Ideas Theorem: Randomized rumor spreading in the push-pull model informs the PA graph G n (with m 2) with high probability in Θ(log n) rounds when choosing neighbors uniformly at random Θ(log n / loglog n) rounds without double-contacts Two questions: Why do double-contacts matter? What makes PA graphs spread rumors faster than other graphs? G(n,p) random graphs also have a diameter O(log n / loglogn), but rumor spreading needs Θ(log n) rounds, also without doublecontacts. 17

With Double-Contacts Critical situation: A pair of uninformed nodes (neighbors), each having a constant number of neighbors With constant probability, the following happens in one round: the two nodes in the pair call each other all their neighbors call someone outside the pair hence the situation remains critical (pair uninformed) Problem: Initially, there are Θ(n) such critical situations in a PA graph. Since each is solved with constant probability in one round, Θ(log n) rounds are necessary 18

Without Double-Contacts The uninformed pair is not critical anymore, because the two nodes cannot call each other twice in a row Remaining critical situations: Cycles of uninformed nodes having a constant number of neighbors in total. Again, each round, with constant probability the situation remains critical (cycle uninformed) No problem! There are only O(exp((log n) 3/4 )) such critical situations in a PA graph. 19

Proof Ideas (2): Why is PA faster? Large- and small-degree nodes: hub: node with degree (log n) 3 or greater poor node: node with degree exactly m (as small as possible) Observation: Poor nodes convey rumors fast! Let a and b be neighbors of a poor node x If a is informed, the expected time for x to pull the rumor from a is less than m After that, it takes another less than m rounds (in expectation) for x to push the news to b Key lemma: Between a any two hubs, there is a path bof length O(log n / log log n) with every second node a poor node. Key lemma + observation + XXX: If xone hub is informed, after O(log n / log log n) rounds all hubs are. 20

Main Tool: BR 04 Definition of PA Model Equivalent definition of the PA model due to Bollobás, Riordan (2004) For m=1 Choose 2n random numbers in [0,1]: x 1, y 1,, x n, y n If x i > y i, exchange the two values Pr(y i r) = r 2 Sort the (x,y) pairs by increasing y-value; call them again (x 1,y 1 ), (x 2,y 2 ), For all k, vertex k chooses that i k as neighbor which satisfies y i-1 x k < y i Note: x k is uniform in [0,y k ] For m 2: Generate G mn as for m=1, merge each m consecutive nodes Advantage: Many independent random variables, not a sequential process 21

Recent Result: Async. Rumor Spreading Synchronized rumor spreading: Each node in each round calls one neighbor not realistic Asynchronous rumor spreading: Each node runs a Poisson process to determine when it calls a neighbor Rate 1: expected waiting time between calls one unit of time ( same call intensity as in the synchronized version) Classic result: Async. rumor spreading takes Θ(log n) time on complete graphs, hypercubes, random graphs, [both to inform all and to inform most nodes] Our result (SWAT 12): Asynchronous rumor spreading informs most nodes of the PA graph in O((log n) 1/2 ) time 22

Summary: Rumor Spreading in PA Graphs Theorem: Randomized rumor spreading in the push-pull model informs the PA graph G n (with m 2) with high probability in Θ(log n) rounds when choosing neighbors uniformly at random Θ(log n / loglog n) rounds without double-contacts asynchronous: most nodes informed after O((log n) 1/2 ) rounds Explanation: Interaction between hubs and poor nodes (constant degree) hubs are available to be called poor nodes quickly transport the news from one neighbor to all others Difference visible in experiments: Thanks! 23