Influence Maximization in the Independent Cascade Model

Similar documents
Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Example 3: Viral Marketing and the vaccination policy problem

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Extracting Influential Nodes for Information Diffusion on a Social Network

Part I Part II Part III Part IV Part V. Influence Maximization

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Maximizing the Spread of Influence through a Social Network

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu

Journal of Engineering Science and Technology Review 7 (3) (2014) Research Article

A visual analytics approach to compare propagation models in social networks

Minimizing the Spread of Contamination by Blocking Links in a Network

ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

Lecture Note: Computation problems in social. network analysis

Optimization Models and Algorithms in Natural Resource Planning. David Shmoys Cornell University

arxiv: v1 [cs.si] 12 Jan 2019

Impact of Clustering on Epidemics in Random Networks

IRIE: Scalable and Robust Influence Maximization in Social Networks

Rounded Dynamic Programming for Tree-Structured Stochastic Network Design

On the Clustered Shortest-Path Tree Problem

Limiting Concept Spread in Environments with Interacting Concepts

Cascades. Rik Sarkar. Social and Technological Networks. University of Edinburgh, 2018.

Information Dissemination in Socially Aware Networks Under the Linear Threshold Model

Graphical Approach for Influence Maximization in Social Networks Under Generic Threshold-based Non-submodular Model

Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach

Combinatorial Model and Bounds for Target Set Selection

Merging Frequent Summaries

The Robust Set problem: parameterized complexity and approximation

Coloring Random Graphs

Dynamic Graph Algorithms

On Approximating Minimum Vertex Cover for Graphs with Perfect Matching

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

The complement of PATH is in NL

Greedy Approximations

Vertex-Colouring Edge-Weightings

1 The Traveling Salesperson Problem (TSP)

Understanding Community Effects on Information Diffusion

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

A Reduction of Conway s Thrackle Conjecture

Important separators and parameterized algorithms

Approximation Algorithms

Combining intensification and diversification to maximize the propagation of social influence

Influence-oriented Community Analysis in Social Networks

Optimally Learning Social Networks with Activations and Suppressions

Blocking Simple and Complex Contagion By Edge Removal

Non-zero disjoint cycles in highly connected group labelled graphs

In this lecture we discuss the complexity of approximation problems, and show how to prove they are NP-hard.

Algorithmic Problems in Epidemiology

Graph Theory and Optimization Approximation Algorithms

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Chordal deletion is fixed-parameter tractable

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

On a Network Generalization of the Minmax Theorem

Content-Centric Flow Mining for Influence Analysis in Social Streams

Efficient Influence Maximization in Social Networks

Efficient influence spread estimation for influence maximization under the linear threshold model

On the Min-Max 2-Cluster Editing Problem

Improved algorithms for constructing fault-tolerant spanners

Flexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract

CMPSCI 311: Introduction to Algorithms Practice Final Exam

1. Lecture notes on bipartite matching

Parameterized graph separation problems

Chi Wang, Wei Chen & Yajun Wang

CS261: Problem Set #1

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching

Absorbing Random walks Coverage

Absorbing Random walks Coverage

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

CS261: A Second Course in Algorithms Lecture #15: Introduction to Approximation Algorithms

Paths, Flowers and Vertex Cover

Clustering. (Part 2)

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Max Coloring Problem

Algorithms for Euclidean TSP

On Covering a Graph Optimally with Induced Subgraphs

Solutions for the Exam 6 January 2014

List Colouring Squares of Planar Graphs

A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition

Scheduling Unsplittable Flows Using Parallel Switches

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Expected Approximation Guarantees for the Demand Matching Problem

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

SPARSIFICATION OF SOCIAL NETWORKS USING RANDOM WALKS BRYAN WILDER

Learning Characteristic Structured Patterns in Rooted Planar Maps

Positive Influence Dominating Set in Online Social Networks

1. Lecture notes on bipartite matching February 4th,

On the Approximability of Modularity Clustering

Automatic synthesis of switching controllers for linear hybrid systems: Reachability control

Cascading Node Failure with Continuous States in Random Geometric Networks

Counting the Number of Eulerian Orientations

On the Robustness of Distributed Computing Networks

Comprehensive Solution for Anomaly-free BGP

A survey of submodular functions maximization. Yao Zhang 03/19/2015

SOCIAL network analysis research can be broadly classified

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

Identifying Vulnerabilities and Hardening Attack Graphs for Networked Systems

Agreedy approximation for minimum connected dominating sets

On the Robustness of Distributed Computing Networks

c 2003 Society for Industrial and Applied Mathematics

Transcription:

Influence Maximization in the Independent Cascade Model Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj Gran Sasso Science Institute (GSSI), Viale F. Crispi, 7, 67100, L Aquila, Italy. {gianlorenzo.dangelo,lorenzo.severini,yllka.velaj}@gssi.infn.it Abstract. We present our ongoing work on the problem of increasing the information spread in a network by creating a limited amount of new edges incident to a given initial set of active nodes. As a preliminary result, we give a constant approximation algorithm for the case in which the set of initial active nodes is a singleton. Our aim is to extend this result to the general case. We outline some further research directions which we are investigating. 1 Introduction Studying the processes by which ideas and influence propagate through a network has been one of the main goals in the field of social network analysis. The influence problem is motivated by many applications in different fields: from marketing, with the aim of maximizing the adoption of a new product [3], to epidemiology, in order to limit the diffusion of a virus or disease [11], going through the analysis of social networks to find influential users and to study how information flows through the network [1]. Different models of information diffusion have been introduced in the literature [5], two widely studied models are: the Linear Threshold Model (LTM) and the Independent Cascade Model (ICM). In both models, we can distinguish between active, or infected, nodes, called seeds, which spread the information, and inactive ones. Recursively, currently infected nodes can infect their neighbours with some probability. After a certain number of such cascading cycles, a large number of nodes becomes infected in the network. In LTM the idea is that a node becomes active if a large part of its neighbours is active. More formally, each node u has a threshold t chosen uniformly at random in the interval [0, 1]. The threshold represents the fraction of neighbours of u that must become active in order for u to become active. At the beginning of the process a small percentage of nodes of the graph is set to active in order to let the information diffusion Copyright c by the paper s authors. Copying permitted for private and academic purposes. V. Biló, A. Caruso (Eds.): ICTCS 2016, Proceedings of the 17th Italian Conference on Theoretical Computer Science, 73100 Lecce, Italy, September 7 9 2016, pp. 269 274 published in CEUR Workshop Proceedins Vol-1720 at http://ceur-ws.org/vol-1720

270 Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj process start. In subsequent steps of the process a node becomes active if the fraction of its active neighbours is greater than its threshold. In ICM, instead, a seed u tries to influence one of its inactive neighbours but the success of node u in activating the node v only depends on the propagation probability of the edge from u to v (each edge has its own value). Regardless of its success, the same node will never get another chance to activate the same inactive neighbour. The process terminates when no further node gets activated. An interesting question, in the analysis of the information spread through a network, is how to shape a given diffusion process so as to maximize or minimize the number of activated nodes at the end of the process by taking intervention actions. Many intervention actions have been studied in the literature, the most important one is: if we are allowed to add at most k seeds, which ones should be selected so as to maximize the number of active nodes resulting from the diffusion process [5]. Besides source selection, other intervention actions may be used to facilitate or limit the diffusion processes, such as inserting or deleting edges and adding or deleting nodes in the network. To the best of our knowledge, under LTM, the problems that have been studied are the following: Khalil et al. [6] consider two types of actions, adding edges to or deleting edges from the existing network and they show that this network structure modification problem has a supermodular objective and therefore can be solved by algorithms with provable approximation guarantees. Zhang et al. [15] consider arbitrarily specified groups of nodes, and interventions that involve both edge and node removal from the groups. They develop algorithms with rigorous performance guarantees and good empirical performance. Kimura et al. [7] use a greedy approach to delete edges under the LTM without any analysis of the supermodularity of the objective, nor rigorous approximation guarantees. Kuhlman et al. [9] propose heuristic algorithms for edge removal under a simpler deterministic variant of LTM which is not only hard, but also has no approximation guarantee. Papagelis [12] and Crescenzi et al. [4] study the problem of augmenting the graph in order to increase the connectivity or the centrality of a node, respectively and experimentally show that this increases the expected number of eventual active nodes. Under ICM, the main results are the following: Wu et al. [14] consider intervention actions other then edge addition, edge deletion and source selection, such as increasing the probability that a node infects its neighbours. It can be shown that optimizing the selection of such actions with a limited budget tends to be NP-hard and is neither submodular nor supermodular. Sheldon et al. [13] study the problem of node addition to maximize the spread of information, and provide a counterexample showing that the objective function is not submodular. Bogunovic [2] addresses the node deletion problem providing a greedy algorithm. Kimura et al. [8] propose methods for efficiently finding good approximate solutions on the basis of a greedy strategy for the edge deletion problem under the ICM, but do not provide any approximation guarantees. In this paper, we focus on the Independent Cascade Model and investigate the problem of adding a small number of edges incident to an arbitrary seed in order to increase the spreading of information in terms of number of nodes

Influence Maximization in the Independent Cascade Model 271 that become active. Thus, the problem we analyse differs from above mentioned ones and, as far as we know, similar problems have never been studied for the Independent Cascade Model. The aim of this paper is reporting our ongoing research on which we wish to get feedback so as to possibly include these results in future publications. 2 Preliminary results In this section we will give all the necessary definitions, introduce the problem that will be considered and show our preliminary results. A social network is represented by a weighted directed graph G(V, âăěe, p) where V represents the set of nodes, E represents set of relationships and p : V V [0, 1] is the probability of an edge to propagate information. For each node u, N u denotes the set of neighbours of u, i.e. N u = {v (u, v) E}. The Independent Cascade Model [5] is an information diffusion model where the information flows over the network through cascade. Nodes can have two states, active: it means the node is already influenced by the information in diffusion, inactive: node is unaware of the information or not influenced. The process runs in discrete steps. At the beginning of ICM process, few nodes are given the information, they are known as seed nodes. Upon receiving the information these nodes become active. In each discrete step, an active node tries to influence one of its inactive neighbours. Regardless of its success, the same node will never get another chance to activate the same inactive neighbour. The success of node u in activating the node v depends on the propagation probability of the edge (u, v) defined as p uv, each edge has its own value. The process terminates when no further node gets activated. We define the influence of a set A V in the graph G, denoted by σ(a, G), to be the expected number of active nodes at the end of the process, given that A is the initial set of seeds. Given a set S of edges not in E, we denote by G(S) the graph augmented by adding the edges in S to G, i.e. G(S) = (V, E S). Given a graph G = (V, E), a vertex set A V and an integer k, the problem we are studying consists in finding a set S of edges incident to the nodes in A not in E (that is, S {(a, v) : v V \ N a, a A}) such that S k and σ(a, G(S)) is maximum. In the paper we focus on the case A = {a}. We leave the case A > 1 as a future work. It has been shown [10] that for a monotone submodular function the following greedy algorithm provides a ( 1 1 e ) -approximation: start with the empty set and repeatedly add an element that gives the maximal marginal gain. The greedy algorithm can be extended to any monotone submodular objective function thanks to the following result. Theorem 1 ([10]). For a non-negative, monotone submodular function f, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the value of f. Then S provides a ( 1 1 e ) -approximation.

272 Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj In this paper, we exploit this result by showing that σ(a, G(S)) is monotone and submodular w.r.t. the possible set of edges incident to a. Theorem 2. σ(a, G(S)) is a monotonically increasing submodular function of the set S of edges to be added. Proof (sketch). We will use the definition of live-edge graph X = (V, E X ) which is a directed graph where the set of nodes is equal to V and the set of edges is a subset of E. E X is given by a edge selection process such that each edge is either live or blocked according to its propagation probability. We can assume that for each pair of neighbours in the graph, a coin of bias p uv is flipped and the edges for which the coin indicated an activation are live, the remaining are blocked. It is easy to show that a diffusion model is equivalent to the reachability problem in live-edge graphs: given any seed set A, the distribution of active node sets after the diffusion process ends is the same as the distribution of node sets reachable from A in a live-edge graph. We denote with χ(g) the probability space in which each sample point specifies one possible set of outcomes for all the coin flips on the edges, it is the set of all possible live-edge graphs. Let R(A, X) denote the set of all nodes that can be reached from the nodes in A on a path consisting entirely on live edges: R(a, X). R(A, X) = a A The main idea to prove that the function in monotonically increasing is that, after an edge addition in G, the live-graph X has at least one more edge than the original live-edge graph, hence, the number of reachable nodes can not decrease. To prove submodularity, we note that the number of new reachable nodes from the seed after the edge addition in G(T ) is smaller or equal than the number of new reachable nodes in G(S) since most of the nodes are already reachable by the edges in T \ S. We prove these conditions for all X χ(g). Note that, in the problem we are studying, the greedy algorithm can not evaluate the influence function exactly since σ(a, G(S)) is the expected number of activated nodes and it has been proven that evaluating this function is generally #P -complete for ICM [3]. However, by simulating the diffusion process sufficiently many times and sampling the resulting active sets, it is possible to obtain arbitrarily good approximations to σ(a, G(S)) (see Prop 4.1 in [5] to bound the number of samples needed to obtain a (1 + δ)-approximation). It is an extension of the result of Nemhauser et al. [10] that by using (1 ± δ)-approximate values for the function to be optimized where δ 0, we obtain ( 1 1 e ɛ) -approximation, where ɛ depends on δ and goes to 0 as δ 0. Theorem 3. For the problem of adding a set S of edges, not in E, incident to the node in A = {a} such that S k and σ(a, G(S)) is maximum, there is a polynomial-time algorithm approximating the maximum influence to within a factor of ( 1 1 e ɛ) where ɛ is any positive real number.

3 Future research Influence Maximization in the Independent Cascade Model 273 In this paper, we presented our ongoing work on the problem of increasing the information spread in a network considering the case in which the set of active nodes A is a singleton. We have analysed the properties of the influence function which is monotonically increasing and submodular and we propose a greedy approximation algorithm for efficiently computing a set of edges that a seed can decide to add to the graph in order to increase the expected number of influenced nodes. As future works, we plan to extend our approach to A > 1 and consider the insertion of edges incident to all the seeds in A. Moreover, we plan to analyse a generalization of the problem considered in this paper by allowing the deletion of edges incident to seeds. Finally, our intent is to study the same problem in a generalization of ICM, which is the Decreasing Cascade Model. In this model the probability of a node u to influence v is non-increasing as a function of the set of nodes that have previously tried to influence v. From the experimental point of view, our aim is to measure the efficiency of the greedy algorithm in term of expected number of influenced nodes. References 1. E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone s an influencer: Quantifying influence on twitter. In Proc. of the Fourth ACM Int. Conf. on Web Search and Data Mining, WSDM 11, pages 65 74. ACM, 2011. 2. I. Bogunovic. Robust protection of networks against cascading phenomena, 2012. 3. W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proc. of the 16th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2010. 4. P. Crescenzi, G. D angelo, L. Severini, and Y. Velaj. Greedily improving our own closeness centrality in a network. ACM Trans. Knowl. Discov. Data, 11(1):9:1 9:32, 2016. 5. D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105 147, 2015. 6. E. B. Khalil, B. Dilkina, and L. Song. Scalable diffusion-aware optimization of network topology. In Proc. of the 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2014. 7. M. Kimura, K. Saito, and H. Motoda. Solving the Contamination Minimization Problem on Networks for the Linear Threshold Model. 2008. 8. M. Kimura, K. Saito, and H. Motoda. Blocking links to minimize contamination spread in a social network. ACM Trans. Knowl. Discov. Data, 2009. 9. C. J. Kuhlman, G. Tuli, S. Swarup, M. V. Marathe, and S. Ravi. Blocking simple and complex contagion by edge removal. In Proc. of the IEEE Int. Conf. on Data Mining (ICDM), 2013. 10. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set functions I. Math. Program., 1978. 11. M. E. J. Newman. Spread of epidemic disease on networks. Phys. Rev. E, 66:016128, Jul 2002. 12. M. Papagelis. Refining social graph connectivity via shortcut edge addition. ACM Trans. Knowl. Discov. Data, 10(2):12, 2015.

274 Gianlorenzo D Angelo, Lorenzo Severini, and Yllka Velaj 13. D. Sheldon, B. N. Dilkina, A. N. Elmachtoub, R. Finseth, A. Sabharwal, J. Conrad, C. P. Gomes, D. B. Shmoys, W. Allen, O. Amundsen, and W. Vaughan. Maximizing the spread of cascades using network design. CoRR, 2012. 14. X. Wu, D. Sheldon, and S. Zilberstein. Efficient algorithms to optimize diffusion processes under the independent cascade model. In NIPS Work. on Networks in the Social and Information Sciences, 2015. 15. Y. Zhang, A. Adiga, A. Vullikanti, and B. A. Prakash. Controlling propagation at group scale on networks. In Proc. of the IEEE Int. Conf. on Data Mining (ICDM), 2015.