ICT 2016: 23 rd International Conference on Telecommunications Hyperbolic Traffic Load Centrality for Large-Scale Complex Communications Networks National Technical University of Athens (NTUA) School of Electrical & Computer Engineering Network Management & Optimal Design Lab (NETMODE) Eleni Stai, Konstantinos Sotiropoulos, Vasileios Karyotis and Symeon Papavassiliou Thessaloniki Greece, Monday, May 16 2016
Outline Motivation, Aims & Contributions More efficient computation of path-based centralities (e.g., TLC, BC) via hyperbolic space network embedding Network Embedding in Hyperbolic Space Hyperbolic Traffic Load Centrality Definition Algorithm Complexity Analysis Numerical Evaluations & Comparisons with Traffic Load Centrality Synthetic Scale-free Graphs Real Graphs Conclusions Bonus results in various parts of the presentation 2
Motivation & Aims `Big network data' paradigm: analyze vast network (traffic) data & very large network topologies Converged (complex) future networks: numerous heterogeneous devices all mechanisms operate at very large scales, efficiently and with low complexity Computations of various network analyses efficient and resource-conserving methodologies Example (subject of this work): correctly identify most important nodes for routing operation (e.g., via path-based centralities TLC) improve routing & routing-dependent applications 3
Traffic Load Centrality Q. What is Traffic Load Centrality (TLC)? A. TLC: metric for identifying central nodes in terms of the traffic they handle/control with respect to the total network traffic 1. Each node sends a unit amount of some commodity (e.g., traffic) to any other node 2. The commodity is transferred from one node to its neighbor closer (hop-wise) to the destination 3. Commodity is equally divided among more than one possible available relays Originally, TLC uses shortest paths in terms of graph (i.e., hop) distances hard to compute for social networks with millions of nodes 4
State-of-the-art & Contribution State-of-the-art Most efficient exact algorithm (time & space) for TLC: Brandes Algorithm (BA) [2],[5] Approximations via sampling pairs of nodes randomly or adaptively for computing TLC Ego-centralities or k-hop centralities Network embedding for approximating graph (hop) distances most suitable space: Hyperbolic Hyperbolic Traffic Load Centrality (HTLC): alternative to TLC, based on network embedding in hyperbolic metric space and greedy routing over hyperbolic coordinates HTLC s twofold advantages: 1. Computation of shortest paths in large social and complex networks is cumbersome Greedy routing over hyperbolic coordinates demands only local knowledge at each node & only geometric space coordinates avoids shortest path computations New routing scheme new centrality metrics, i.e. HTLC (HBC) 2. Scale-free topologies (most social/complex networks of interest): greedy paths in hyperbolic space have approximately the same length as shortest paths! HTLC closely approximates TLC, while reducing computational complexity paper s contribution 5
Network Embedding in Hyperbolic Space Key for HTLC: choice of the network embedding algorithm in hyperbolic space Network embedding in hyperbolic space: Assigns hyperbolic coordinates to network nodes Greedy embedding in hyperbolic space ensures 100% success rate of greedy routing every graph has a greedy embedding in hyperbolic space which is not true for the twodimensional Euclidean space suitable for small network topologies with short diameter We employ the Rigel Embedding in this work: suitable for large-scale networks, allows parallelization in computation Rigel embedding is not greedy embedding, i.e., some greedy paths may fail scale-free networks: greedy routing based on hyperbolic coordinates/distances achieves a success rate close to 100% 6
Rigel Network Embedding in Hyperbolic Spaces Hyperboloid model of the n-dimensional hyperbolic space Distance function for two points x, y : The Rigel embedding: Multidimensional scaling: L<<N nodes are chosen as landmarks (usually high-degree nodes) Bootstrapping step: hyperbolic coordinates of the landmarks so that the distances between all pairs of landmarks in the Hyperboloid areas as close as possible to matching hop distances in original graph Final step: hyperbolic coordinates of rest of nodes calibrated with respect to fixed landmarks each node's hyperbolic distances to all landmarks are very close to corresponding hop distances in original graph Achieves low distance distortion error & answers to queries for node distances fast Tested for computation of graph & social analysis metrics such as radius, diameter and average path length, closeness centrality, resulting in values very close to the ground truth Not tested for path-based centrality metrics (this paper) 7
Hyperbolic Traffic Load Centrality - Definition DEFINITION OF HTLC: Assuming that a) each node sends a unit amount of some commodity (traffic) to each other node, b) from each node (except from the destination) the commodity is equally divided to its neighbors that reduce the hyperbolic distance to the destination, denoted as ``greedy neighbors", HTLC is defined as the total amount of commodity passing through a vertex via these exchanges Algorithmic Approach: Based on Routing Betweenness Centrality for source-oblivious routing 1. Computes a directed acyclic graph (DAG) 2. Collects load dependencies and computes centralities 8
Efficient TLC/BC Computation Dependencies Efficient computation of BC (employed for TLC slightly modified): Cubic number of pair-wise dependencies can be aggregated without computing them explicitly where σ(s,t) is the # of shortest paths (SPs) between nodes s, t σ(s,t v) is the # of SPs between nodes s, t that pass through node v One-sided dependencies: It was shown (Brandes, 2001) that The dependency of s on some v can be compiled from dependencies on vertices one edge farther away! BC is computed by iterating over all vertices s computing dependencies in two phases: The first phase is a BFS, in which distances and SP counts from s are determined. The second phase visits all vertices in reverse order of their discovery, i.e. those farthest from s first, to accumulate dependencies 9
Hyperbolic Traffic Load Centrality Algorithm (1) This serves at examining the nodes in the correct order when accumulating the load dependencies in Part II. if examining nodes in decreasing order of their hyperbolic distances towards the corresponding destination, a node that is already examined cannot be used again as a greedy neighbor of a next node 10
Hyperbolic Traffic Load Centrality Algorithm (2) Brandes for TLC For Comparison Difference in Part I 11
HTLC Complexity Analysis Complexity Part I Part II HTLC O(N 2 log N) O(N E ) TLC O(N E ) O(N E ) NOTE: # edges of a connected graph ( E ) N a (a real between (1,2)) Gives complexity of O(N N a ) for Part I of TLC and O(N 2 logn) for Part I of HTLC, where O(N a )>O(N logn). 12
Numerical Results on Synthetic Scale-free Graphs (1) Accuracy of approximating TLC with HTCL Time Complexity 13
Numerical Results on Synthetic Scale-free Graphs (2) Time Complexity: HTLC is computed at least 1.5 times faster than TLC for dense graphs. at least 2.5 times faster than TLC for less dense graphs. in denser graphs more greedy paths increase the computational time needed for Part II of HTLC restrict the number of greedy paths by considering as possible relays (i.e., greedy neighbors) only those neighbors of a node that reduce the hyperbolic distance to the destination more than a specified threshold (current work) Accuracy: HTLC achieves a precision of at least equal to 65%, while reaching up to 100%. # nodes of the network does not affect precision Expected behavior for scale-free topologies that exhibit hidden hyperbolic structure Lower precision in small-world and random graphs 14
Numerical Results on Real Graphs (1) Accuracy of approximating TLC with HTCL Time Complexity 15
Numerical Results on Real Graphs (2) RMSE: Measures fit to the power-law distribution of each real data set, i.e., how close it is to a scale-free graph HTLC improves computational time vs. TLC from 1.2 times (fb\_combined) to 4.1 times (as-caida20071105) for k>7 the precision achieved is over 57% for all data sets Note: High precision achieved for all data sets except from CA-Hepth For the CA-Hepth data set the power-law distribution does not provide a good fit (RMSE column) 16
In this paper: Summary Future Work Introduced HTLC metric for determining central nodes in a complex network when greedy routing in hyperbolic space is applied Proposed algorithm for computing HTLC, studied its complexity comparing it with TLC Shown improvement in computational complexity in favor of HTLC and high precision values achieved when approximating TLC for graphs with the scale-free property Ongoing work: Betweenness centrality variant on hyperbolic space, using greedy routing Study different embedding types and their impact on routing and metrics computations (i.e., accuracy) Develop a framework for path-based metrics computation & adaptive routing based on the above 17
Scale-free graphs Alleviating Routing Congestion under Hyperbolic Embedding (1) Left graph: TLC similar to the power-law distribution Right graph: HTLC Node 1: hub node Node 35: low-degree node Choosing node 35 balances better traffic load among nodes decongest the core of the scalefree network Dynamically transfer load congestion via simple distributed operations, towards nodes with lower degree in case of congestion at high-degree nodes 18
Example on random graphs Alleviating Routing Congestion under Greedy Hyperbolic Embedding (2) TLC: almost uniform HTLC under greedy embedding: nodes with significant higher HTLC values appear (the root and its neighbors) similar to TLC/HTLC for scale-free graphs (third graph) Possibility to drive traffic load towards nodes with less congestion Note!!! Such capability of dynamically alleviating routing congestion, emerge only under greedy embedding with minimum-depth spanning tree, not with the Rigel embedding 19
Thank you for your attention!??? Questions??? vassilis@netmode.ntua.gr 20