Searching for influencers in big-data complex networks

Similar documents
Supplementary Information

Collective Influence Algorithm to find influencers via optimal percolation in massively large social media

arxiv: v1 [physics.soc-ph] 28 Mar 2016

Immunization for complex network based on the effective degree of vertex

Attack Vulnerability of Network with Duplication-Divergence Mechanism

Failure in Complex Social Networks

Identifying a set of influential spreaders in complex networks arxiv: v2 [cs.si] 18 Jul 2016

Response Network Emerging from Simple Perturbation

Supplementary material to Epidemic spreading on complex networks with community structures

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Characteristics of Preferentially Attached Network Grown from. Small World

An efficient strategy for the identification of influential spreaders that could be used to control

arxiv: v2 [physics.soc-ph] 27 Aug 2012

A Community-Based Approach to Identifying Influential Spreaders

Emergence of Blind Areas in Information Spreading

Critical Phenomena in Complex Networks

Models and Algorithms for Network Immunization

Complex Networks: Ubiquity, Importance and Implications. Alessandro Vespignani

Optimal box-covering algorithm for fractal dimension of complex networks

The coupling effect on VRTP of SIR epidemics in Scale- Free Networks

From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks

Data mining --- mining graphs

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights

Comparing the diversity of information by word-of-mouth vs. web spread

The Establishment Game. Motivation

Dynamic network generative model

On a Network Generalization of the Minmax Theorem

Minimizing the Spread of Contamination by Blocking Links in a Network

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Why Do Computer Viruses Survive In The Internet?

Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows arxiv: v1 [physics.soc-ph] 12 May 2008

networks and the spread of computer viruses

Example 3: Viral Marketing and the vaccination policy problem

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 10 Jan 2007

Review. Information cascades in complex networks. Edited by: Ernesto Estrada

arxiv: v1 [physics.soc-ph] 10 Dec 2011

Phase Transitions in Random Graphs- Outbreak of Epidemics to Network Robustness and fragility

Information Dissemination in Socially Aware Networks Under the Linear Threshold Model

Plan of the lecture I. INTRODUCTION II. DYNAMICAL PROCESSES. I. Networks: definitions, statistical characterization, examples II. Modeling frameworks

TITLE PAGE. Classification: PHYSICAL SCIENCES/Applied Physical Sciences. Title: Mitigation of Malicious Attacks on Networks

A Novel Technique for Finding Influential Nodes

M.E.J. Newman: Models of the Small World

Influence Maximization in the Independent Cascade Model

Attacks and Cascades in Complex Networks

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Simulating the Diffusion of Information: An Agent-based Modeling Approach

Basics of Network Analysis

An Evolving Network Model With Local-World Structure

Universal Behavior of Load Distribution in Scale-free Networks

Topic mash II: assortativity, resilience, link prediction CS224W

arxiv: v1 [cs.si] 12 Jan 2019

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

CSCI5070 Advanced Topics in Social Computing

ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Extracting Influential Nodes for Information Diffusion on a Social Network

CS224W: Analysis of Networks Jure Leskovec, Stanford University

DNE: A Method for Extracting Cascaded Diffusion Networks from Social Networks

Search in weighted complex networks

Properties of Biological Networks

A Parallel Community Detection Algorithm for Big Social Networks

Approaches for Quantifying

IRIE: Scalable and Robust Influence Maximization in Social Networks

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

A new analysis method for simulations using node categorizations

Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

Role of modular and hierarchical structure in making networks dynamically stable

Cascade dynamics of complex propagation

1 Introduction. Russia

Maximizing the Spread of Influence through a Social Network

Mixing navigation on networks. Tao Zhou

Research Proposal. Guo, He. November 3rd, 2016

Modeling and Analysis of Information Propagation Model of Online/Offline Network Based on Coupled Network

The Complex Network Phenomena. and Their Origin

Smallest small-world network

Unlabeled equivalence for matroids representable over finite fields

Topologies and Centralities of Replied Networks on Bulletin Board Systems

Social and Technological Network Analysis. Lecture 6: Network Robustness and Applica=ons. Dr. Cecilia Mascolo

My Best Current Friend in a Social Network

Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set based Approach

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 7 Jan 2004

Evolution Characteristics of the Network Core in the Facebook

THE COMPARISON OF IMMUNIZATION MECHANISMS AGAINST DISSEMINATING INFORMATION IN SOCIAL NETWORKS

Attack Tolerance and Resiliency of Large Complex Networks

Modeling of Complex Social. MATH 800 Fall 2011

ECS 289 / MAE 298, Lecture 9 April 29, Web search and decentralized search on small-worlds

The Microblogging Evolution Based on User's Popularity

Onset of traffic congestion in complex networks

SEARCHING WITH MULTIPLE RANDOM WALK QUERIES

arxiv: v1 [cs.si] 7 Aug 2017

Summary: What We Have Learned So Far

Pheromone Static Routing Strategy for Complex Networks. Abstract

Methods for Removing Links in A Network to Minimize the Spread of Infections

CS-E5740. Complex Networks. Network analysis: key measures and characteristics

Introduction to network metrics

Robustness of Centrality Measures for Small-World Networks Containing Systematic Error

SNA 8: network resilience. Lada Adamic

Social Network Analysis

Transcription:

1 Searching for influencers in big-data complex networks Byungjoon Min, Flaviano Morone, and Hernán A. Makse Levich Institute and Physics Department, City College of New York, New York, NY 10031 Abstract. Localizing the most influential nodes in complex networks is a critical problem in an era of big data across different topics including information diffusion, epidemic outbreaks, and viral marketing. The common practice of identification of influencers is based on the use the number of connections called degree, which is defined locally. However, the topological location of individual would be more meaningful instead of the degree if hubs are located at the periphery of network. In addition, there is no guarantee of the performance for the widely used heuristic methods including the degree because of the lack of optimization of global influence. Here we present a strategy to search for influential individual in complex networks at single superspreader and collective influencers levels respectively. For single spreader, we find that superspreaders are located within the core identified by k-shell decomposition, insensitive to their degree. For collective influencers, we review theory and algorithm based on the optimization of global influence to find the optimal (minimal) set of influencers in linear time to break up the network. 1.1 Introduction How to identify influential node in complex networks has been a crucial question for a long time due to its various applications in information dissemination [1], epidemic spreading [2], and viral marketing [3]. A set of influential nodes in a network which are much smaller than the size of the network can play a pivotal role [4 6]: (i) when activated in social networks like Twitter, they can spread information to the whole network, (ii) when removed due to breakdown, they efficiently dismantle the connectivity of the entire network, and (iii) when immunized in contact networks, they prevent the emergence of global epidemics. Finding influential nodes called influencers or superspreaders in a network has received much attention from physics, network science, and big-data science communities because of the practical importance [4 13]. The problem of identifying influencers has two distinct topics: finding individual superspreader [4] and a set of multiple influencers [6, 8] that can

2 Byungjoon Min, Flaviano Morone, Hernán A. Makse optimize the global influence. For spreading processes originated from a single spreader, one of the most widely used strategies for identifying influencers in particular in networks with a scale-free degree distribution is the number of neighbors, degree, k [7]. However, if a highly connected person (high degree) is located at the periphery of a network, the hub would not have much power in spreading processes. In contrast, a moderately connected person who is placed within the core of a network can have a huge impact in spreading, which would cause a large diffusion to the whole network. In this article, we illustrate that the superspreaders are consistently located in core of networks insensitive to their degree [4]. For the multiple influencers, a lot of heuristic algorithms [4,5,7,14 18] were introduced in order to estimate the influence of nodes. Since these rankings are not based on the optimization of global influence, however, they may fail to find optimal influencers. Therefore, a reliable algorithm to identify a set of influential nodes in complex networks is demanded with a rigorous ground of the optimization of global influence. The influence of nodes can be quantified by the extent of damage when nodes are removed from original network. For instance, the removal of a few influential nodes can fragment a complex network to disconnected small components. Therefore, the problem of identifying influential nodes can be mapped into optimal percolation, that is, finding the minimal set of nodes to destroy the largest connected component in a network. In other words, the most influential nodes are the minimal set of nodes so called optimal influencers that are responsible for global connectivity of a network. We develop an analytical framework to find a set of the optimal influencers in networks, with the mapping of optimal percolation [6]. Furthermore, we propose a scalable algorithm in linear time for the solution in massively large social media [19]. Here, we cover recent advance on the study of theory of finding influencers for the single spreader and collective influencers [4, 6, 19]. We first discuss the identification of superspreader originated from a single spreader and show that topological location plays a crucial role rather than the connections defined locally. Next, we review the theory of collective influencers with mapping of optimal percolation to search for optimal influencers that are the minimal set of nodes to break up the whole network. 1.2 Identifying superspreaders in complex networks Spreading usually starts from a single origin in epidemics spreading and rumor diffusion [1, 2]. It is widely believed that a node with the highest degree would be the superspreader, among others [7]. However, the highly connected nodes would have a small impact on spreading processes when the high degree nodes are placed at the periphery of a network, surrounded by less connected nodes. Alternatively, nodes located within the core of a network can play an important role in spreading even though the nodes are less connected.

1 Searching for influencers in big-data complex networks 3 Fig. 1.1. a, An example of the k-shell decomposition in a network. The two nodes with the same degree k = 8, blue and yellow nodes, are located respectively in the innermost core (k S = 3) and the periphery (k S = 1) in this network. b-d, The extent of the spreading efficiency in the SIR model originated from a single spreader for three different cases: (k S, k) = (63, 96) (node A), (k S, k) = (26, 96) (node B), and (k S, k) = (63, 65) (node C). On top of the contact network of inpatients (CNI), when origin has high k-shell index, i.e. k S = 63 for nodes A and C, spreading covers a wide area of the entire network both for nodes A and C similarly, although the degree of C is much smaller than A. However, spreading remains largely localized for node B despite of its high degree, meaning that the degree cannot predict the spreading efficiency reliably. The results are obtained over over 10,000 different realizations with infection probability β = 0.035. In order to validate this hypothesis, we study real-world complex networks as examples of complex networks of social interactions. We use four social networks: (i) the network of email contacts in the Computer Science Department of the University College London (Zhou, S., private communication), (ii) contact network of inpatients (CNI) from hospitals in Sweden [20], (iii) the network of actors who have starred in the same movie labeled by imdb.com as adult [21], and (iv) the friendship network in the LiveJournal.com community [22].

4 Byungjoon Min, Flaviano Morone, Hernán A. Makse On top of the real-world networks, we study spreading process using a prototypical epidemic model, Susceptible-Infectious-Recovered (SIR) model [2,23]. In the SIR model, each node can be in one of three states, susceptible, infected, and recovered (or removed). Initially, all nodes are susceptible (S) except for one infected (I) node. The infected nodes infect their susceptible neighbors with the probability β and then becomes recovered (R) where they cannot become infected again. The SIR model can describe not only epidemic spreading but also information and rumor spreading [24]. In this study, we use relatively small values for infection probability β, so that only limited fraction of nodes will be infected. Otherwise, a large fraction of the population will be infected and the effect of the initially infected node is not significant. The core-periphery structure of complex networks are identified by using the k-shell (also known as k-core) decomposition [25 27]. We assign k-shell index, k S, to each node after successive removals of nodes with degree smaller than k S. Initially, the nodes with degree k = 1 are removed. After pruning all the nodes with degree k = 1, we recalculate the degree of each node and continue to remove the nodes with left degree k = 1. After removing all nodes with degree k = 1 iteratively, all the removed nodes become a k-shell with index k S = 1. We iteratively remove and assign the next k-shell, k S = 2. We continue to remove and assign higher k-shells until all nodes are removed from the original network. Then, each node is defined by a unique k S index according to their core-periphery structure determined by network topology. The periphery of networks is defined as small values of k S whereas the core shows large k S (Fig. 1.1a). In order to quantify the spreading efficiency of a node i in the SIR model, we measure the infected fraction in an epidemic originating at node i. Figures 1.1b-c show that the degree may not predict accurately the size of the infected nodes in a spreading process. The infected fraction can be far different even though the spreading starts from the same degree as shown in Figs. 1.1b-c. Instead, the size of infected population is predicted more accurately when we use the topological location of the origin defined by k S index. In Figs 1.1b and d, the nodes with the same k-shell index bring out similar infected populations even though they have very different k. This example implies that the coreness of the node has better predictive power than the degree. For more systematic analysis, we quantify the influence of a node i in the SIR model by using the average size of infected nodes M i. Then, the infected nodes averaged over all the origins with respect to (k S, k) is given by M(k S, k) = i Υ (k S,k) M i N(k S, k), (1.1) where Υ (k S, k) is the union of all N(k S, k) nodes with (k S, k) values. We examine M(k S, k) in the four social networks to reveal the influence of nodes with (k S, k) (see Fig. 1.2). For the same k-shell index, M(k S, k) is insensitive to the degree as shown vertically in Fig. 1.2. However, M(k S, k) exhibits wide

1 Searching for influencers in big-data complex networks 5 a b c d Fig. 1.2. Average fraction of infected size M(k S, k) when spreading originates from nodes with (k S, k). We use for networks a, email email contacts (β = 8%), b, CNI network (β = 4%), c, the actors network from IMBD data (β = 1%), and d, the friendship network from Livejournal.com (β = 1.5%). The k-shell index is more reliable predictor for superspreaders than the degree k. Average fraction is larger for nodes with high k-shell index k S independent to k whereas origins of a given k can cause either small or large infection, depending on k S. range for the same degree k, meaning that nodes with the same degree can have a different influence in the spreading process. This result implies that influential spreaders are consistently placed within the core identified by high k-shell index, regardless of the degree. Therefore, the k-shell index of a node is a better predictor of the spreading efficiency than the local property, like the degree. We demonstrate that superspreaders are consistently located in the core of a network defined by the k-shell index insensitive to their degree. When spreading starts from the core of a network (high k-shell), the spreading origin is surrounded by other core nodes structurally. Thus, there exist many pathways to the rest of the network from the origin, so that the spreading can

6 Byungjoon Min, Flaviano Morone, Hernán A. Makse easily reach a critical mass that maintains global outbreaks. However, when spreading originates from a high k node but located at periphery, it is not guaranteed that the spreading will develop. k-shell decomposition allows us to identify superspreaders in a general spreading process. For a wide range of infection probability β, the innercore nodes identified by k-shell index shows higher spreading efficiency than outer shells [4]. In a different class of spreading model, susceptible-infectedsusceptible model, we also find that the persistence of infected fraction is larger for nodes with high k-shell index than those with low [4]. Therefore, the k- shell index is a robust and reliable predictor to identify influential spreader on complex networks. 1.3 Identifying collective influencers in complex networks In this section, we discuss the influence of multiple nodes as a general problem of node s influence. Suppose that we conduct a viral marketing campaign with a limited resource and so we should choose a small set of nodes for the campaign. Then, we have to select collective influencers who can potentially spread the information to the larger part of the social network. Inversely, if the collective influencers are removed from the original network, it becomes fragment with small components. Consider a network with N nodes connected by M edges, with an arbitrary degree distribution P (k). When we remove a fraction q of nodes in random, at a critical fraction the giant component of the network is destroyed, G(q) = 0 where G the size of the giant component. Thus, the problem of optimal influencers can be mapped into finding the minimum fraction q c to dismantle a network, G(q) = 0. To develop theory of collective influencers, we first introduce auxiliary variables ν i j representing the probability that node i belongs to the giant component when j is absent. On a locally tree-like network, the variable can be expressed as ν i j = n i 1 (1 ν k i ), (1.2) k i\j where i \ j is a set of nearest neighbors of i excluding j and n i represents whether the node i is removed or not, i.e, n i = 0 if node i is removed and n i = 1 if node i is present. Once we calculate ν i j for each connection, we can finally obtain the probability ν i that node i belongs to the giant connected component by following, ν i = n i 1 (1 ν j i ). (1.3) j i

1 Searching for influencers in big-data complex networks 7 The value ν i j = 0 for all pairs is trivially a solution of Eq. (1.2), corresponding to the situation that no node belongs to the giant component. If the solution is stable, the iteration of Eq. (1.2) will converge into the trivial solution with no giant component. If it is not stable, we will obtain finally a different stable fixed solution and there should be a giant component. This fact allows us to apply stability analysis at the value ν i j = 0 in order to predict percolation threshold precisely. By linearizing Eq. (1.2), we can determine the stability of the solution ν i j = 0 by the largest eigenvalue λ(n; q) of the linear operator ˆM defined on the 2M 2M directed edges as M k l,i j ν i j. (1.4) ν k l {νi j=0} For a locally-tree like network, ˆM is given by M k l,i j = n i B k l,i j (1.5) where B k l,i j is the non-backtracking or Hashimoto matrix of the network [28 30]. If λ(n; q) 1, the solution ν i j = 0 is stable. Hence, the optimal influence problem can be interpreted as identifying the minimal set of nodes to be removed that minimize the the largest eigenvalue λ(n; q). The largest eigenvalue of ˆM can be calculated by the Power Method: λ(n) = lim l [ ˆM ] 1/l l w 0, (1.6) w 0 where w 0 is an arbitrary non-zero vector with 2M entries. For a finite l, we define the cost energy function ˆM l w 0 from Equation (1.6) to be minimized. The solution of minimized cost function by perturbation series is ˆM N l w 0 2 = (k i 1) (k j 1), (1.7) i=1 j Ball(i,2l 1) k P 2l 1 (i,j) where Ball(i, l) is the set of nodes inside a ball of radius l defined as the shortest path around node i, Ball(i, l) is the frontier of the ball and P l (i, j) is the shortest path of length l connecting i and j. From the cost energy function Eq. (1.7), we establish systematically the algorithm of node s influence. Collective influence (CI) of node is defined as the extent of the drop in the energy function when we remove a node from the network. The algorithm for the collective influence is following: First, we identify the nodes belonging to the frontier of a given ball of radius l of every node. Next, we assign to node i the strength of collective influence by CI l (i) = (k i 1) (k j 1). (1.8) j Ball(i,l) n k

8 Byungjoon Min, Flaviano Morone, Hernán A. Makse a G(q) 1 0.8 0.6 0.4 Erdős-Rényi k = 3.5 q c 0.4 0.3 q CI HDA PR HD CC EC k-core Optimal 0.1 k 1 2 3 4 0 0 0 0.1 0.3 b G(q) 1 0.8 0.6 0.4 q c γ HD analytic 0 0 0.1 5 0.15 0.1 0.05 2 3 4 5 q CI HDA HD Scale-free γ=3 c HD HDA CI G Fig. 1.3. a, The size of giant connected component G(q) in ER network (N = 2 10 5, k = 3.5, error bars are s.e.m. over 20 realizations). CI outperforms other heuristic algorithms including high degree adaptive (HDA), PageRank (PR), high degree (HD), closeness centrality (CC), eigenvector centrality (EC), and k-core. b, G(q) for scale-free networks with degree exponent γ = 3, maximum degree k max = 10 3 and N = 2 10 5 (error bars are s.e.m. over 20 realizations). c, An example of scale-free SF network with γ = 3 after the removal of the 15% of nodes based on CI, HDA, and HD methods. Then, remove node i with the highest CI l and set n i = 0. As removing node i, the degree of its neighbors decreases by one. The above procedures are repeated to identify next highest CI node to remove until the giant component vanishes, G(q) = 0. Based on CI algorithm, we identify a set of collective influencers to destroy the giant component efficiently. In addition, this algorithm is able to find the minimal set of nodes in almost linear time with optimal implementation [19]. In order to validate the performance of CI algorithm, we compare the percolation threshold of CI and that of many heuristic centrality measures: high-degree (HD) [7], high-degree adaptive (HDA), PageRank (PR) [15], closeness centrality (CC) [14], eigenvector centrality (EC) [14], and k-core [4, 25] (Fig. 1.3). Figure 1.3a compares CI in Erdös-Rényi (ER) networks against other heuristic algorithms. For scale-free (SF) networks, we also compare CI with HDA and HD methods. In all networks, CI produces smaller percolation threshold than the other heuristic methods that we tested.

1 Searching for influencers in big-data complex networks 9 a G(q) 1 0.8 0.6 0.4 Twitter CI HDA PR HD k-core b 0 0 0.02 0.04 0.06 0.08 0.1 q weak node Fig. 1.4. Performance of CI in a large-scale real social network. a, Size of giant component G(q) of Twitter users (N = 469, 013) as a function of removed fraction, based on CI, HDA, PR, HD, and k-core strategies. b, Example of a weak-node found in Twitter. Weak-node refers node with a small number of connections surrounded by hierarchical coronas of hubs. CI algorithm identifies weak-node as an influential node, which is not identified by other heuristic methods. As an example of real-world network for information spreading, we construct a network of Twitter users. Figure 1.4 exhibits the size of giant component of Twitter users with respect to a removed fraction q. In the Twitter network, CI algorithm outperforms other heuristics in identification of optimal influencers. The CI algorithm allows us to find weak-nodes which are the low-degree nodes surrounded by hierarchically structured coronas of hubs as shown in Fig. 1.4b, which are not captured by other heuristics. A large number of weak-nodes identified by CI algorithm play an important role in global influence, so that the CI algorithm leads to nearly optimal solution for finding collective influencers. 1.4 Summary We study the identification of the most influential individuals in complex networks for the individual spreader and collective influencers respectively. For a single spreader, we find that influential individuals in spreading processes are consistently placed within the core area of complex networks defined by k-core percolation, independent to their degree. For collective influencers, we propose an optimal method to search for influential nodes based on minimization of cost energy function of global influence, allowing us to find a new class of influencers called weak-nodes that are weakly connected nodes yet surrounded by hierarchical coronas of hubs. Our theory for influential nodes in complex networks has a variety of applications. One of the most promising examples would be finding important area in the brain. For the brain networks, finding the most important regions

10 Byungjoon Min, Flaviano Morone, Hernán A. Makse allows us to design protocols to control brain functions by regulating these areas. In a broad sense, this research of identifying important areas in the brain can open a novel way to handle neurological disease with intervention to core areas prescribed by network theory, which would be a seminal advance [12]. 1.5 Acknowledgement This work was funded by NSF-PoLS PHY-1305476, NIH-NIGMS 1R21GM107641, NSF IIS-1515022, and Army Research Laboratory Cooperative Agreement Number W911NF-09-2-0053 (the ARL Network Science CTA). References 1. E. M. Rogers, Diffusion of Innovation (Free Press, New York, 4th ed, 1995). 2. R. M. Anderson, R. M. May, B. Anderson, Infectious Diseases of Humans: Dynamics and Control (Oxford Science Publications, 1992). 3. P. Domingos, M. Richardson, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, p61-70 (2002). 4. M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, H. A. Makse, Nat. Phys. 6, 888 (2010). 5. S. Pei, H. A. Makse, J. Stat. Mech. P12002 (2013). 6. F. Morone, H. A. Makse, Nature 524, 65 (2015). 7. R. Albert, H. Jeong, A.-L. Barabási, Nature 406, 378 (2000). 8. D. Kempe, J. Kleinberg, E. Tardos, Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, p137-143 (2003). 9. S. Pei, L. Muchnik, J. S. Andrade, Jr., Z. Zheng, H. A. Makse, Sci. Rep. 4, 5547 (2014). 10. B. Min, F. Liljeros, H. A. Makse, PLoS ONE 10(8), e0136831 (2015). 11. S. Pei, L. Muchnik, S. Tang, Z. Zheng, H. A. Makse, PLoS ONE 10(5), e0126894 (2015). 12. F. Morone, K. Roth, B. Min, H. E. Stanley, H. A. Makse, arxiv:1602.06238 (2016). 13. C. Castellano, R. Pastor-Satorras, Sci. Rep. 2, 371 (2012). 14. L. C. Freeman, Social Networks 1, 215 (1979). 15. S. Brin, L. Page, Comput. Networks ISDN 30, 107-117 (1998). 16. J. Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms (1998). 17. R. Cohen, K. Erez, D. ben-avraham, S. Havlin, Phys. Rev. Lett. 86, 3682 (2001). 18. Y. Chen, G. Paul, S. Havlin, F. Liljeros, H. E. Stanley, Phys. Rev. Lett. 101, 058701 (2008). 19. F. Morone, B. Min, L. Bo, R. Mari, H. A. Makse, arxiv:1603.08273. 20. F. Liljeros, J. Giesecke, P. Holme, Mathematical Population Studies 14, 269 (2007). 21. The Internet Movie Database, www.imdb.com 22. Live Journal, www.livejournal.com 23. H. W. Hethcote, SIAM Rev. 42, 599 (2000).

1 Searching for influencers in big-data complex networks 11 24. C. Castellano, S. Fortunato, V. Loretto, Rev. Mod. Phys. 81, 591 (2009). 25. B. Bollobas, Graph Theory and Combinatorics: Proceedings of the Cambridge Combinatorial Conference in honor of P. Erdös, 35 (Academic, New York, 1984). 26. S. B. Seidman, Social Networks 5, 269 (1983). 27. S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, E. Shir, Proc. Natl. Acad. Sci. USA 104, 11150 (2007). 28. K. Hashimoto, Adv. Stud. Pure Math. 15, 211 (1989). 29. O. Angel, J. Friedman, S. Hoory, Trans. Amer. Math. Soc. 367 4287 (2015). 30. B. Karrer, M. E. J. Newman, L. Zdeborová, Phys. Rev. Lett. 113, 208702 (2014).