Multiplex networks: a Generative Model and Algorithmic Complexity

Size: px

Start display at page:

Download "Multiplex networks: a Generative Model and Algorithmic Complexity"

Georgiana Allen
5 years ago
Views:

1 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Multiplex networks: a Generative Model and Algorithmic Complexity Prithwish Basu, Matthew Dippel, and Ravi Sundaram Raytheon BBN Technologies, Cambridge, MA, USA, pbasu@bbn.com Northeastern University, Boston, MA, USA, mdippel@ccs.neu.edu Northeastern University, Boston, MA, USA, koods@ccs.neu.edu Abstract Real-world networks often consist of multiple layers, be they infrastructure such as airline networks or social such as collaboration networks. A common aspect to these networks is that there are multiple sub-networks that evolve in parallel on the same node set these are referred to as multiplex networks. For example, in the case of airline networks, the cities (nodes) have been well-established for several decades if not centuries, but over time new airlines (sub-networks) emerge and each airline creates its own flight linkages between cities. Similarly multiple modalities of communications evolve in parallel between individuals (nodes) such as , SMS, and Online Social Networks, e.g., Facebook and Twitter. While in some multiplex networks, each layer evolves independently from other layers over time, in other multiple networks, the evolution of a layer is coupled with that of other layers a process referred to as co-evolution. In this paper, we propose a novel generative model, BINBALL, for a class of multiplex networks whose structure may coevolve (that is, depend on as well as influence) the structure of the individual networks. We validate our model using a multiplex data set for the European Air Transportation Network (EATN). We also investigate questions regarding the algorithmic complexity of finding short paths through multiplex networks as well as coverage of nodes using a minimum number of layers. We show that while certain problems in this space can be solved in polynomial time, others are NP hard. Among the latter, some problems are approximable, whereas others are not. Finally, we demonstrate that BINBALL is a good generative model since it is able to generate random networks whose degree as well as path length distributions closely match those of the EATN. functions, but these resultant smaller networks are embedded in a larger network which is likely to have a more diverse set of functions. Each such smaller network can be considered as a layer of a larger multi-layer network, with multiple layers having nodes in common. Networks whose layers operate on the same set of node are referred to as multiplex networks [4]. Multiplex networks occur naturally in the real-world. For example, the global air transportation network can be seen as a collection of multiple private or state airlines, each of which operates a set of flights between certain pairs of cities. Flights operated by each such airline company thus corresponds to a layer in the larger multiplex network (MPN) [3]. The growth of such an MPN typically happens edge-by-edge, with each edge corresponding to a flight operated between two cities by a certain airline company. The process of edge addition could depend not only on the current structure of its own network (layer) but also on the structure of the global MPN since the cities are shared across layers. Therefore, the structure of a single layer does not evolve independently of the other layers; instead, it co-evolves with the other layers in the MPN. MPNs can also model complex online social networks [9], where each layer corresponds to a common social attribute (e.g., interest in movies, sports etc.) which binds together a set of individuals, whereas the overall social network is the conglomeration of all the attribute-based social networks formed between multifaceted individuals. I. INTRODUCTION Networks are formed when a group of nodes decide to form connections among themselves in order to achieve a desirable function. Networks may be formed for efficiently transporting commodities such as physical goods or electronic messages, or they may be formed to express relationships between entities in space, even in the absence of an explicit flow of commodities. Oftentimes, the aforementioned focused interactions result in the formation of networks with limited Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF (the ARL Network Science CTA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. This research started when Matthew Dippel was a summer intern at Raytheon BBN Technologies. A. Motivation Not only are the topological features of the global MPN of primary interest to network scientists, how these features arise because of the interactions between individual networks (or co-evolution) is also of significant importance. Thus, a network formation model that can characterize the growth of each individual random network layer over time as well as that of the global MPN is desirable. Merely characterizing the topology of each layer is insufficient for characterizing the topology of the MPN. This is because nodes are shared across layers; therefore, a random permutation of nodes in a small subset of layers, while preserving the intra-layer statistics, could significantly alter the inter-layer statistics. B. Our Contributions In this paper, we first describe several metrics for MPNs that are different from those of individual layers in the MPN. These ASONAM '15, August 25-28, 2015, Paris, France 2015 ACM. ISBN /15/08 $15.00 DOI: 456

2 include distributions of global degrees, layer-union degrees, and shortest path lengths with a budgeted number of layer uses or transfers. Secondly, we present a novel generative model BINBALL, which characterizes the growth of a random MPN using a multitude of layers. In a nutshell, each layer is a bin and each connection (or edge) which arrives into the MPN one at a time is a ball. BINBALL first decides which layer should get this new connection and then which two nodes in that specific layer should be its endpoints by following certain probabilistic rules based on local-global preferential attachment. These rules capture the balance between the desire of a node to form links with other local nodes and also to other important global nodes, which are local to other layers. Next, we show that while certain optimization problems involving some of the aforementioned metrics such as shortest paths can be solved in polynomial time, others such as budgeted shortest paths and budgeted coverage are NP hard. Moreover, some of the NP hard problems are approximable, whereas others are not. Finally, we demonstrate the efficacy of BINBALL by performing experiments on a real-world MPN which models air transportation in Europe (EATN) [3]. Specifically, we show that the aforementioned MPN metrics computed on random networks generated by BINBALL agree closely with the real EATN data. The creation of BINBALL, a generative network model specifically in the context of airline networks, has societal value for decision and policy makers at national and supranational levels, e.g. how to situate new airports or alleviate existing capacity constraints, how to allocate routes/slots that enhance societal utility while respecting commercial interests, etc. Algorithms, such as for shortest paths or budgeted coverage, are of significance for travel websites and travelers who wish to minimize their dis-utility given ticket prices and connection inconveniences. C. Related Work 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Multi-layer networks have begun to receive significant attention in the network science literature in recent years. Kurant and Thiran investigated layered rail transportation networks from the perspective of load distribution among the multiple layers [8]. Buldyrev et al. studied special classes of multi-layer interdependent networks from the perspective of cascading failures [2]. In particular, they showed how the failure of one node in one network can accelerate the fragmentation of a two-layer network due to the coupling between layers. More recently, researchers have proposed various generalizations of types of multi-layer networks [7] including their mathematical foundations using tensor representations of such networks [4]. Boccaletti et al. have written a comprehensive survey on this topic [1]. The study of generative models based on preferential attachment (PA) kernels for multiplex networks has received some attention recently [12], [6]. Nicosia et al. [12] have analytically studied the characteristics of PA based growth models in two-layer networks. In their model, a node v is born in both layers at the same time and it attempts random connections to an older node u in each layer with probability 457 Figure 1. A 3-layer multiplex network on 12 unique nodes. proportional to a linear combination of the degrees of u in each layer. Such an attachment process results in a positive degree correlation for v in both layers. For the special case where the attachment probability of v u in layer i only depends on the degree of u in layer i, the attachment process yields two coupled Barabasi-Albert style scale free networks in each layer. Since we observed that the EATN network exhibits behavior different from the above, the attachment process of BINBALL is fundamentally different. Each layer has some local nodes (i.e. nodes with distinct affinity to a layer 1 ), and these could serve as global nodes for other layers, and in each time step an edge is added to a layer by following a local-global PA rule described in Section IV. Kim et al. [6] investigate non-linear PA kernels, which result in interesting degree correlations structures. Air transportation networks have been well studied using a classical network science perspective [5] as well as a multilayer perspective [3]. While the latter is one of the first to take a multi-layer approach to study such networks, they do not give generative models. Also, algorithmic complexity for path finding and coverage in multiplex networks has not received much attention thus far. II. MULTIPLEX NETWORK MODELS AND METRICS In this section, we describe the basic structure of the MPN and some metrics of interest. These metrics, although derived from standard network metrics, tend to capture the inherent multiplex network aspects well. We denote an MPN on a node set V as a collection of graphs on the same node set G i = (V, E i ) for different layer labels i. Each graph can have different edge sets, but their vertex sets are identical. Figure 1 denotes an MPN on 3 layers and 12 nodes. We also mark some nodes as local to indicate that those nodes have the majority of their edges in the corresponding layer. While the general model of MPN is agnostic of locality 1 For example, in an airline transportation network, this could arise from the fact that nationalities of some (but not all) cities tend to pin them to a certain layer.

3 (or affinity) of nodes to layers, this aspect is important for our generative model BINBALL, which is described in Section IV. Note that in Figure 1, each layer has 4 local nodes and 8 non-local nodes. The colors of the nodes indicate the layer to which they are local and nodes of the same color in different layers that are denoted vertically exactly one above the other are copies of the same node. Similarly, the colors of the edges indicate the layer to which they belong. We now describe the metrics that are of interest to us in this paper. A. Degree distributions 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining B. Shortest path length distributions The degree distribution of a single layer network is a well defined and well researched metric. Now, we have the challenge of coming up with a meaningful generalization of this metric to multiplex networks. We will describe several such generalizations, by coming up with various generalizations of degree in multiplex networks: 1) Global degree: The sum of a node s degrees across all layers. 2) Union degree: The degree of a node when we take the union of all edge sets and treat it as a single layer network. Consider a node u in a multiplex network. We define the local degree of u within layer l as deg l (u), and the global degree of u as global(u) = l deg l(u). For example, in Figure 1, the blue node (say u) with the highest degree in layer 1 has deg l (u) = 5 and global(u) = 7. We can generalize the global degree further, by examining the degree sequence of a specific node u across all layers. Define the degree sequence of u as the L-dimensional vector V (u), such that V l (i) = deg l (i). Any metrics which apply to vectors can be applied to this sequence. Our global degree metric now generalizes as V (u) 1. Other vector norms may be used to differentiate between nodes and their edge counts across layers. In the future, we will use V (u) 2 as the assumed norm when the subscript is not supplied. Given a node u, a metric of particular interest is the variance of its degree distribution. Consider the layer l as being randomly chosen uniformly. Then the degree deg l (u) becomes a random variable with expectation and variance. Both of these can be simply expressed as: E(deg l (u)) = V (u) 1 L var(deg l (u)) = V (u) 2 L V (u) 2 1 L 2 These measures are useful indicators of whether a node receives all of its edges from a single layer or multiple layers. In our model, we will find that super-hubs tend to have high variance, while spokes do not. In contrast, studies such as [12] only consider local degrees in each network of the MPN. 458 Path finding in multiplexes is an interesting problem with many applications. First, note that unless the multiplex is dense, it is unlikely that you can only use one layer to travel between nodes. In fact, this is the case in the EATN, that the majority of node pairs need at least two different layers in order to be connected. Second, we need a meaningful way of capturing both the path length and the way we use various layers to construct the path. Motivated by airline networks, we define two natural measures of cost for multiplex networks - transfer cost and usage cost. Consider a traveler wishing to go from London to Helsinki. She may use different airlines for different legs of the trip. However, switching from one airline to another has expenses associated with it such as waiting for baggage transfer, shuttling between terminals etc., and its these expenses that we model as the transfer cost. Independent of the legs of the flight there are also expenses associated with dealing with different airlines, such as payments processing, managing frequent flier accounts etc.; these expenses we model as the usage cost. In addition to the transfer and usage costs which are specific to the multiplex context, we also assume a nonnegative weight on each edge that captures the usual expenses such as the price of the flight or the time taken. More formally, the notion of an edge (i, j) in a single-layer network generalizes, in a multiplex network to a triple (i, j, l, where l denotes the layer. An s t path of length k in a multiplex network is a sequence of edges (s i, t i, l i ), 1 i k satisfying: s = s 1, t i = s i+1, 1 i k 1, t k = t. Given such a path, its transfer cost is defined to be {i l i l i+1, 1 i k 1} while its usage cost is defined to be { k l i}. Using these definitions of paths and path costs, we address two different problems. The first is path finding between pairs of nodes. Given two nodes, computing a path between them reduces to simple network path finding if we do not wish to consider how this path makes use of different layers. However, if we wish to compute paths that make use of few layers, then the task can be become difficult. The second problem is determining whether or not the network is connected. If we put no limits on the number of layers we use to achieve full connectivity in the network, then this reduces to the simple single layer case. However, if we wish to characterize the layers based on whether or not they are necessary for connectivity, or to minimize the number of layers used to achieve connectivity, then it is a non-trivial task. We will address these tasks and give an overview of complexity results in Section V. III. OBSERVATIONS ON A REAL MPN DATASET We use a real-world multiplex network data set from Cardillo et al [3], specifically a European Air Transportation Network (ATN) consisting of 37 different airlines operating on over 400 airports in Europe. As reported in [3], several key observations can be made. The multiplex has 450 distinct node labels, 37 layers, and 3588 edges. Some layers correspond to national airlines and have a clear hub and spoke structure. The number of hubs ranges from one to a few, depending

4 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining on the layer. The rest of the layers have a higher entropy in their degree distributions. These are not dominated by cities from one particular nation. Since important hubs in national layers (e.g., London, Paris, Frankfurt etc.) tend to be spokes in other layers, they end up gathering even higher degrees and become superhubs. Therefore, the growth process seems to be dependent on both the intra-network structure and the internetwork structure. In other words, the emergence of a hub in layer i makes it a good candidate for a spoke in layer j. The union of all layers has a power law degree distribution in the trunk. We note an interesting observed property of the EATN multiplex. This property describes the nature of the largest connected component (LCC) within each layer. Consider each layer as a simple graph, on the vertex set of all airport vertices that appear in the multiplex. Then, within this graph, if a node is not part of the LCC, it is isolated. Another way of viewing this is that if an airline operates out of two different airports, you are guaranteed to be able to get from the first airport to the second using only that airline. We define a layer having this property as being locally connected. This property is interesting to us for two reasons. First, this property motivates our generative model, which seeks to match this property, and in turn is able to match many other properties of the EATN multiplex. Secondly, multiplexes for which all layers are locally connected will admit polynomial time algorithms for several optimization problems which are more difficult in the general case. Using the insights gained from this data set, we develop a novel generative model BINBALL, described in the next section. IV. GENERATIVE MODEL FOR MPNS: BINBALL In this section, we describe BINBALL. Following Figure 1, the main goal of BINBALL is to postulate a set of rules that determine whether an edge of color c (in layer c) connects two nodes of color c or a node of color c and another node of a different color. Our proposed model iteratively builds a multiplex network one edge at a time, where each edge is put into a random layer, and chooses its end points based on both the local and global degrees of the nodes. Formally, our model is M(n, m, k, p, P, s, α), where the parameters are listed in Table I. There are several soft requirements for these parameters which should be observed for the model to make sense. First, we should have that m k ( n 2), as this is the max number of edges possible which can be packed into such a multiplex network. Second, we should have that s > 0, so that, when we define probability of a node being chosen as an end point of an edge, this probability is always a real number between 0 and 1. Our model starts by generating k empty graphs on the node set {1, 2,..., n}. Each layer is associated with a random layer type and a local node set. Each layer is assigned type ER with probability p and type PA with probability 1 p. Here, ER refers to the Erdos-Renyi random network model and PA refers to the Barabasi-Albert style preferential attachment without 459 node growth. model [11]. These layer types will be used to determine how edges grow within the layer. We assign each layer a local node set by randomly partitioning the node set N into sets of size approximately n/k, so that each layer has an equal number of local nodes. We will refer to a layer l s local node set as loc(l). We generate a new edge m-times. Each time, it picks a layer l uniformly at random to enter. If that layer has type ER, then the end point nodes are chosen at random. Else, the type is P A, and the nodes are chosen according to two preference metrics. Let deg l (u) be the degree of u in layer l: we will call this the local degree. If we sum over all layers in the multiplex, we will get a count of all edges touching u across all layers. We will define the global degree of u to be global(u) = l L deg l (u). An edge incoming to a P A layer will choose one end point according to its local degree, limiting its selection to nodes that are in the local node set loc(l). It will choose its other end point according to a linear combination of its global degree and its initial fitness defined by P. More precisely, the probability of a node u loc(l) being chosen as the first end point is: α deg l (u) + s u N α deg l (u ) + s And the probability of a node v being chosen as the second end point is: α global(v) + P (v) + s v N α glob(v ) + P (v ) + s In Section VI, we show that this model matches the EATN multiplex on the metrics that we described, while a simplistic multiplex where each layer is simple preferential attachment does not. In particular, it is the motif of each layer having a local node set which allows us to replicate the existence of super hubs with higher layer to layer variance. V. ALGORITHMS FOR MULTIPLEX NETWORKS Previously, we described two types of metrics which are non trivial to calculate in multiplexes: shortest path finding, and connectivity. For each property, we will give an overview of complexity results when generalized to the multiplex environment. We will also give algorithms which can compute or approximate these properties given certain assumptions. A. Path computation algorithms We continue to use the same definitions for multiplex paths and their costs from Section II. Given these cost metrics, we wish to analyze two versions of the shortest path problem on multiplex networks: budgeted layer transfer paths, and budgeted layer usage paths. The budgeted layer transfer problem is, given nodes s and t and a budget B, find the lowest weight s t path with layer transfer cost no more than B. Similarly, in the budgeted layer usage we wish to find the lowest weight s t path with layer usage cost no more than B. In addition to these, we will also consider the cases where the edges have weight 0. In these cases, we simply wish to find the smallest budget B for which a path exists. We will show that budgeted

5 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Table I LIST OF PARAMETERS USED BY BINBALL. n m k p P s α The number of nodes in the node set, shared across all layers. The total number of edges across all layers. Also the sum of the edges in each layer. The number of layers. The probability that a layer is designated an Erdos-Renyi type network. A mapping from the nodes to positive reals indicating a node s global weight. A base value added to all nodes weights when randomly choosing a node. A scaling factor mapping a node s degree to a weight. layer transfer paths admit polynomial time solutions, while budgeted layer usage paths are almost always NP-Complete. There is a simple case which is worth addressing, when each layer has a single large connected component. To clarify, we say that a layer is locally connected, if it has a connected component C, and for all nodes u, either deg(u) = 0, or u C. If this is the case, then we can easily determine how many layers are needed to connect two nodes in the multiplex. Theorem 1. If in a multiplex of N nodes and L layers, the layers are locally connected, then there exists an algorithm that determines whether or not there is a path from node s to node t that makes use of at most k layers in polynomial time. Proof. We create an unweighted graph G on L+2 nodes. The nodes will correspond to s, t, and the L layers. For each L i, let V i denote the respective local node set. In the new graph, add the edges (s, L i ) if s V i, (L i, t) if t V i, and (L i, L j ) if V i V j. Then in the multiplex, there is an s-t path that makes use of at most k layers if and only if the shortest path from s to t in G is less than or equal to k + 1, which is solvable in polynomial time. Suppose we find such a path s, L a1,..., L ak, t. Then the L ai nodes correspond to the layers whose union will connect s to t. In such a multiplex, if we are trying to minimize our layer transfers, we will never need to transfer to the same layer more than once. Thus it reduces to the case of minimizing layer transfer cost. In the above cases, we assumed that edges were weightless, and that our only metric to minimize was the layer usage or layer transfer cost. Next, we will consider when the edges have weights on them. In this scenario, we now have two metrics to minimize simultaneously: the layer usage / transfer cost, and the weight of the path. To deal with this, we will consider finding the shortest path under a budget B. That is, find the shortest s-t path with layer usage/transfer cost B. Theorem 2. Given a budget B, we can determine the shortest s t path with layer transfer cost B. Proof. We use dynamic programming to find shortest s-t path with layer transfer costs no more than 0, 1,...B. We then return the value for B. Let OP T (u, v, b) be the lowest weight u v path with transfer cost no more than b. Then, we have the following recursion: OP T (u, v, b) = min OP T (u, x, b 1) + OP T (x, v, 1) x V We also have the base case that, when our budget is 0, the optimal is the shortest path within a single layer: OP T (u, v, 0) = min l L SP l(u, v) To solve this recursion for all pairs u, v and values of b, we must first solve all pairs shortest paths for each layer. This takes time O(V 3 L) if we use Floyd-Warshall, or time O(V E) if we restrict our weights to integers [13], where E is the total number of edges across all layers. After we have done this, solving the recursion takes time O(BV 2 ). Note that the max value of B we need to try is L, as it is never advantageous to switch layers more than L times. Theorem 3. Given a budget B, it is NP Complete to determine the shortest s t path with layer usage cost B, regardless of whether or not the layers are locally connected. Proof. We show that the simple case, simply determining whether or not there is such a path, is reducible to 3-SAT, thus simultaneously showing that the weightless case is also NP Complete. Given an instance of 3SAT on N variables and M clauses, we will create a multiplex with N + M + 1 nodes and 2N layers. Each layer will correspond to an assignment of one of the variables, and will thus be labeled as layers x 1, x 1,..., x N, x N. First, we will add the node s which will be the start of our shortest path. Next we will add nodes var 1, var 2,..., var N. We add edges (s, var 1, x 1 ), (s, var 1, x 1 ) and edges (var i, var i+1, x i+1 ) and (var i, var i+1, x i+1 ) for 1 i < N. At this point, any path from s to var n must make use of N distinct layers. The choice of layers corresponds to an assignment of the variables. By setting a layer visitation budget of exactly N, we guarantee that the path will correspond to an assignment. Next, we add nodes c 1, c 2,..., c M, corresponding to the clauses. For each clause, we will add edges from the previous clause (or var N for clause c 1 ) which correspond to the layers mentioned in that clause. Since we have already exhausted our layer visitation budget, we may only make use of layers we visited from var 1 to var N. Hence, we may only pass through a clause node if we have selected one of the layers corresponding to an entry in that clause. Thus if there is an N budgeted layer visitation path from s to c M, then there is a satisfying assignment for the 3SAT instance. 460 B. Coverage algorithms Consider a multiplex with vertex set V and layers characterized by their edge sets {E i }. We say that the multiplex is

6 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining covered by layers E 1, E 2,..., E k if for any u, v V, there is a path between them which only makes use of layers 1 through k. (Note that we took the first k layers WLOG. If we have a specific ordering of the layers, this could be any k-sized subset of layers). Suppose we had the minimum sized subset of layers which achieved coverage of the node set. In the context of these layers, the other layers are redundant, as they do not increase the connectivity of the network. While redundancy is important in the context of being robust to edge failures and decreasing the shortest path lengths, it can be seen that in the context of connectivity, they do not add to the chosen layer set. Determining the minimum number of layers to achieve connectivity of a multiplex is a difficult algorithmic task. First, we will show that even if the layers are locally connected, it is NP-Complete to find the minimum number of layers to achieve connectivity. Then, we will compliment this result with an approximation algorithm which achieves an approximation ratio of O(logN) times the minimum number of layers needed, where N is the number of nodes in the node set. Theorem 4. Given a multiplex network on N nodes and L layers, it is NP-Complete to determine the minimum number of layers to achieve connectivity. Proof. First note that, it is possible to calculate the minimum number in exponential time, by considering all 2 L subsets of the layers, taking their union, and determining if the resulting single layer network is connected. We will show that if we had such an algorithm which determined minimum layer coverage, we could also solve instances of subset cover. We use the following formulation of subset cover: Given a global set S = {1, 2,..., N}, a family of subsets s 1, s 2,..., s m S, and a parameter k, determine if there exist a 1, a 2,...a k such that s ai = S. Given such an instance, we will construct the following multiplex network. Let V = {1, 2,..., N, N + 1}. For each subset s i = {b 1, b 2,..., b j }, we will add to the multiplex layer E i with vertex set V i = {b 1, b 2,...b j, b N+1 } and edges: E i = {(b 1, b 2 ), (b 2, b 3 ), (b j 1, b j ), (b j, N + 1)} The addition of the node N + 1 means that the union of any set of layers will be interconnected. Suppose we had calculated the minimum sized set of layers {E i }which achieved connectivity. If {E i } k, then this implies that there exists layers L 1, L 2,..., L k such that L i is connected. This is only possible if the union of their vertex sets is the entire vertex set... that is, V i = V. This also implies that we can remove the vertex N + 1 from these sets and have a 461 similar relation hold: (V i {N + 1}) = V {N + 1} s i = {1, 2,..., N} Thus we have found a solution to the original instance. Likewise, if {E i } k, then no such subset cover exists. Even when the layers are internally connected, determining minimum layers for coverage is difficult. We provide an approximation algorithm, which works regardless of whether the layers are internally connected or not. We provide an approximation algorithm for layer coverage which gives an H n approximation for the minimum number of layers to connect the graph. This result is mostly a specific case of submodular function optimization [10]. We restate the algorithm as it specifically applies to our case, and has great utility in multiplex metric algorithms. Algorithm 1: Approximation algorithm for minimum layer coverage Input : Layers L 1, L 2,..., L k, on vertex set V Output: Set of layers whose union connects V S = while l S l is not connected do Pick L i such that the number of connected components of l S L i is minimized Add L i to S end Return S as the set of layers Intuitively, we start with an empty graph. Our goal is to reduce the number of connected components from n to 1, making the union graph connected. We iteratively add layers one at a time, each time picking the layer which decreases the number of connected components the most. We end this process when we have only one connected component, which covers the entire vertex set. Theorem 5. Algorithm 1 gives an H n approximation to the minimum number of layers needed, where H n = n 1 i. Proof. We observe that the function mapping edge sets to the number of connected components in a graph is a submodular function. As a result, the greedy heuristic achieves an H n approximation due to the results from [10]. We also give an overview of these complexity results in Tables II and III. Nearly all of the results regarding transfer cost are covered by Theorem 2, while nearly all of the results regarding usage cost are covered by Theorem 3. The exception is when the layers are locally connected and we wish to minimize the associated layer cost, in which case both the transfer cost and usage cost are covered by Theorem 1.

7 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Global Degree Comparison 10 0 Union Degree Comparison P(deg > k) 10 2 P(deg > k) European ATN BINBALL Model Multiplex PA k 10 3 European ATN BINBALL Model Multiplex PA k Figure 2. A comparison of BINBALL to the EATN and a Multiplex PA: (a) global degree distribution; (b) union degree distribution Table II S-T TRANSFER COST COMPLEXITY Edge Type Layer Type Weightless Weighted Locally Connected P, Theorem 1 P, Theorem 2 General P, Theorem 2 P, Theorem 2 Table III S-T USAGE COST COMPLEXITY Edge Type Layer Type Weightless Weighted Locally Connected P, Theorem 1 NPC, Theorem 3 General NPC, Theorem 3 NPC, Theorem 3 VI. EXPERIMENTS WITH AIR TRANSPORTATION NETWORK DATA To evaluate how our model and how its generated networks compare to the EATN, we generated many instances, ran our algorithms and metrics on each one, and compared the averages to the same metrics and algorithms ran on the EATN. We generated 100 instances of BINBALL, and took the average of the statistics for our comparison. The metrics and algorithms we used for comparison were global degree distribution, smashed degree distribution, node variance across layers, and shortest path counts with given layer transfer costs. In addition to comparing the EATN to average runs of BINBALL, we also compare the node metrics on both to a Multiplex generated via preferential attachment without node growth. Specifically, we start with an empty multiplex, with the same node count and layer count as the EATN. Then, we add edges to the multiplex, until we have also matched the edge count of the EATN. We pick these edges by first picking a random layer, and then within that layer using preferential attachment to pick the edge end points. We generated 100 instances of such a multiplex, and took the average of the statistics for our comparison, as we did with our proposed model. For parameter choices, we match the node count, edge count, and layer count of the EATN. We set the probability of a layer being Erdos Renyi to 0, meaning that all layers gain edges according to our local and global fitness rules. For the initial population, we generate a random preferential attachment graph on the node set, with incoming nodes bringing one edge, and use the degrees as the initial populations of the nodes. For the initial local fitness, we used 0.09, as we tried values ranging from 0 to 1, and found that 0.09 was a good fit for the EATN. For the degree scaling factor, we use 1, meaning the degrees scale the same as the initial population measurement and initial fitness. In Figure 2, we observe that the degree distribution of the Multiplex PA has a different drop off from both BINBALL and the EATN. In particular, the Multiplex PA has both less nodes of a lower degree and less nodes of a high degree. This implies that a Multiplex PA is not able to accurately capture the hub and spoke nature of the EATN. However, BINBALL comes very close to matching the degree distribution of the flattened EATN. We do note that when considering global degree distribution, BINBALL drops off a bit faster than the EATN, indicating that there may be an even stronger hub nature to the EATN than what we capture with our model. In Figure 3, we can see that the variance of nodes in the Multiplex PA is quite low relative to that of both BINBALL and the EATN. This is expected, as the degree of hubs in PA graphs do not grow linearly with the edge count. As a result, within each layer of the Multiplex PA, the max degree is not very large. In BINBALL and the EATN, almost all layers exhibit a super-hub nature. These nodes, with very high degree in one layer, and average to small degree in the others, will have high variance, contributing to the shape of the degree variance CDF. We note that BINBALL has a slightly sharper tail drop in this plot, while the EATN displays more nodes with higher variance. In Figure 4, we observe that our model almost exactly captures the nature of the shortest paths with layer transfer costs. There is hardly any difference between using 2 or 3 462

8 P(var > k) IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Degree Variance Comparison European ATN BINBALL Model Multiplex PA k model, but many others should exist. Second, what sort of multiplex metrics can be efficiently calculated? For example, given a metric for single layer networks, we can generalize to multiplex networks by considering the expectation of the metric when k random layers are chosen and flattened together. This metric gives a view into how the layers combine to achieve the final metric value viewed on the flattened network, and was the main approach used when evaluating metrics on the EATN in [3]. However, it has the disadvantage of being difficult to compute exactly. It can be approximated by random sampling, but it is not clear for what metrics this sampling has low expected error, or even the same expected value. Figure 3. variance Path count A comparison of BINBALL to the EATN regarding node degree Layer Swaps 0 European ATN BINBALL Model Layer Swaps Path length Layer Swaps Layer Swaps Figure 4. A comparison of BINBALL to the EATN regarding shortest path counts. Each plot shows the path counts for layer transfer costs equal to 0, 1, 2, 3. Budgets of 2 and 3 are nearly identical. transfers. We exclude the plots for 4 transfers or more, as these are identical to the plot for 3 transfers. The observant reader might note that, from the integrals of the plots, we have counted less paths from BINBALL than from the EATN. There are two reasons for this. First, when calculating shortest paths, we wish to distinguish between nodes which are disconnected due to a lower layer transfer budget, from nodes that are disconnected due to the randomness of the model or noise in the EATN data. Thus, we only considered nodes which were within the largest connected component of the flattened multiplex. The second reason is that, when considering the first reason combined with averaging over independent runs of BINBALL, we will occasionally have slightly fewer nodes than the EATN. This small number scales up quadratically when we count paths between nodes, thus explaining the difference in path counts. VII. FUTURE WORK Having demonstrated and provided methods for various multiplex metrics, several questions remain. First, what other metrics are meaningful on multiplex networks? We have demonstrated several which were useful for evaluating our VIII. CONCLUSION In this paper, we proposed a generative model for multiplex networks based on a probabilistic preferential attachment rule on local and global degrees of nodes. We were able to generate random multiplex networks with similar properties to a standard real-world multiplex network, namely the European Airline Transportation Network. This is evidenced by the close match observed not only on the sets of global and layer-union degree distributions, but also on the (budgeted) shortest path length distributions. We also studied the algorithmic complexity of various budgeted shortest path and coverage problems in multiplex networks. While some problems can be solved in polynomial time, others are NP hard. Among the latter problems, some admit guaranteed non-trivial approximation factors, whereas others are inapproximable. REFERENCES [1] G. & Criado R. & del Genio C. I. & Gómez-Gardeñes J. & Romance M. & Sendiña-Nadal I. & Wang Z. & Zanin M. Boccaletti, S. & Bianconi. The structure and dynamics of multilayer networks. Physics Reports, 544(1):1 122, [2] R. & Paul G. & Stanley H. E. & Havlin-S. Buldyrev, S. V. & Parshani. Catastrophic cascade of failures in interdependent networks. Nature, 464: , April [3] Gómez-Gardeñes J. & Zanin M. & Romance M. & Papo D. & Pozo F. del & Boccaletti S. Cardillo, A. Emergence of network features from multiplexity. Sci. Rep., 3, [4] A. & Cozzo E. & Kivelä M. & Moreno Y. & Porter M. A. & Gómez-S. & Arenas A. De Domenico, M. & Solé-Ribalta. Mathematical formulation of multilayer networks. Phys. Rev. X, 3:041022, Dec [5] S. & Turtschi A. & Amaral L. A. N. Guimera, R. & Mossa. The worldwide air transportation network: Anomalous centrality, community structure, and cities global roles. Proceedings of the National Academy of Sciences, 102(22): , [6] K.-I. Kim, J. Y. & Goh. Coevolution and correlated multiplexity in multiplex networks. Phys. Rev. Lett., 111:058702, Jul [7] A. & Barthelemy M. & Gleeson J. P. & Moreno Y. & Porter M. A. Kivelä, M. & Arenas. Multilayer networks. Journal of Complex Networks, [8] P. Kurant, M. & Thiran. Layered complex networks. Phys. Rev. Lett., 96:138701, Apr [9] L. Magnani, M. & Rossi. The ml-model for multi-layer social networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on, pages 5 12, July [10] L.A. & Fisher M.L. Nemhauser, G.L. & Wolsey. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1): , [11] M. Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, [12] G. & Latora V. & Barthelemy M. Nicosia, V. & Bianconi. Growing multiplex networks. Phys. Rev. Lett., 111:058701, Jul [13] M. Thorup. Undirected single-source shortest paths with positive integer weights in linear time. J. ACM, 46(3): , May

A Multilayer Model of Computer Networks

A Multilayer Model of Computer Networks Andrey A. Shchurov Department of Telecommunications Engineering, Faculty of Electrical Engineering Czech Technical University in Prague, Czech Republic Abstract